News Posts matching #CUDA

Return to Keyword Browsing

DirectX Coming to Linux...Sort of

Microsoft is preparing to add the DirectX API support to WSL (Windows Subsystem for Linux). The latest Windows Subsystem for Linux 2 will virtualize DirectX to Linux applications running on top of it. WSL is a translation layer for Linux apps to run on top of Windows. Unlike Wine, which attempts to translate Direct3D commands to OpenGL, what Microsoft is proposing is a real DirectX interface for apps in WSL, which can essentially talk to hardware (the host's kernel-mode GPU driver) directly.

To this effect, Microsoft introduced the Linux-edition of DXGkrnl, a new kernel-mode driver for Linux that talks to the DXGkrnl driver of the Windows host. With this, Microsoft is promising to expose the full Direct3D 12, DxCore, and DirectML. It will also serve as a conduit for third party APIs, such as OpenGL, OpenCL, Vulkan, and CUDA. Microsoft expects to release this feature-packed WSL out with WDDM 2.9 (so a future version of Windows 10).

AAEON Unveils AI and Edge Computing Solutions Powered by NVIDIA

AAEON, a leading developer of embedded AI and edge-computing solutions, today announced it is unveiling several new rugged embedded platforms—augmenting an already extensive lineup of AAEON AI edge-computing solutions powered by the NVIDIA Jetson platform. The new AAEON products provide key interfaces needed for edge computing in a small form factor, making it easier to build applications for all levels of users, from makers to more advanced developers for deployments in the field.

AAEON also introduced a new version of the popular BOXER-8120AI, now featuring the Jetson TX2 4 GB module, providing an efficient and cost-effective solution for AI edge computing with 256 CUDA cores delivering processing speeds up to 1.3 TFLOPS."Partnering with an AI and edge computing leader like NVIDIA supports our mission to deliver more diversified embedded products and solutions at higher quality standards," said Alex Hsueh, Senior Director of AAEON's System Platform Division. "These new offerings powered by the Jetson platform complement our existing lineup of rugged embedded products, providing an optimal combination of performance and price in a smaller form factor for customers to easily deploy across a full range of applications."

NVIDIA RTX Voice Modded to Work on Non-RTX GeForce GPUs

NVIDIA made headlines with the release of its RTX Voice free software, which gives your communication apps computational noise-cancellation, by leveraging AI. The software is very effective at what it does, but requires a GeForce RTX 20-series GPU. PC enthusiast David Lake, over at Guru3D Forums disagrees. With fairly easy modifications to its installer payload, Lake was able to remove its system requirements gate, and get it to install on his machine with a TITAN V graphics card, and find that the software works as intended.

Our first instinct was to point out that the "Volta" based TITAN V features tensor cores, and has hardware AI capabilities, until we found dozens of users across Guru3D forums, Reddit, and Twitter claiming that the mod gets RTX Voice to work on their GTX 16-series, "Pascal," "Maxwell," and even older "Fermi" hardware. So in all likelihood, RTX Voice uses a CUDA-based GPGPU codepath, rather than something fancy leveraging tensor cores. Find instructions on how to mod the RTX Voice installer in the Guru3D Forums thread here.

Three Unknown NVIDIA GPUs GeekBench Compute Score Leaked, Possibly Ampere?

(Update, March 4th: Another NVIDIA graphics card has been discovered in the Geekbench database, this one featuring a total of 124 CUs. This could amount to some 7,936 CUDA cores, should NVIDIA keep the same 64 CUDA cores per CU - though this has changed in the past, as when NVIDIA halved the number of CUDA cores per CU from Pascal to Turing. The 124 CU graphics card is clocked at 1.1 GHz and features 32 GB of HBM2e, delivering a score of 222,377 points in the Geekbench benchmark. We again stress that these can be just engineering samples, with conservative clocks, and that final performance could be even higher).

NVIDIA is expected to launch its next-generation Ampere lineup of GPUs during the GPU Technology Conference (GTC) event happening from March 22nd to March 26th. Just a few weeks before the release of these new GPUs, a Geekbench 5 compute score measuring OpenCL performance of the unknown GPUs, which we assume are a part of the Ampere lineup, has appeared. Thanks to the twitter user "_rogame" (@_rogame) who obtained a Geekbench database entry, we have some information about the CUDA core configuration, memory, and performance of the upcoming cards.
NVIDIA Ampere CUDA Information NVIDIA Ampere Geekbench

NVIDIA to Reuse Pascal for Mobility-geared MX300 Series

NVIDIA will apparently still be using Pascal when they launch their next generation of low-power discrete graphics solutions for mobile systems. The MX300 series will replace the current crop of MX200 series (segregated in three products in the form of the MX230, MX250 10 W and MX250 25 W). The new MX300 keeps the dual-tiered system, but ups the ante on the top of the line MX350. Even though it's still Pascal, on a 14 nm process, the MX350 should see an increase in CUDA cores to 640 (by using NVIDIA's Pascal GP107 chip) from the MX250's 384. Performance, then, should be comparable to the NVIDIA GTX 1050.

The MX330, on the other hand, will keep specifications of the MX250, which signals a tier increase from the 256 execution units in the MX230 to 384. This should translate to appreciable performance increases for the new MX300 series, despite staying on NVIDIA's Pascal architecture. The new lineup is expected to be announced on February.

Rumor: NVIDIA's Next Generation GeForce RTX 3080 and RTX 3070 "Ampere" Graphics Cards Detailed

NVIDIA's next-generation of graphics cards codenamed Ampere is set to arrive sometime this year, presumably around GTC 2020 which takes place on March 22nd. Before the CEO of NVIDIA, Jensen Huang officially reveals the specifications of these new GPUs, we have the latest round of rumors coming our way. According to VideoCardz, which cites multiple sources, the die configurations of the upcoming GeForce RTX 3070 and RTX 3080 have been detailed. Using the latest 7 nm manufacturing process from Samsung, this generation of NVIDIA GPU offers a big improvement from the previous generation.

For starters the two dies which have appeared have codenames like GA103 and GA104, standing for RTX 3080 and RTX 3070 respectively. Perhaps the biggest surprise is the Streaming Multiprocessor (SM) count. The smaller GA104 die has as much as 48 SMs, resulting in 3072 CUDA cores, while the bigger, oddly named, GA103 die has as much as 60 SMs that result in 3840 CUDA cores in total. These improvements in SM count should result in a notable performance increase across the board. Alongside the increase in SM count, there is also a new memory bus width. The smaller GA104 die that should end up in RTX 3070 uses a 256-bit memory bus allowing for 8/16 GB of GDDR6 memory, while its bigger brother, the GA103, has a 320-bit wide bus that allows the card to be configured with either 10 or 20 GB of GDDR6 memory. In the images below you can check out the alleged diagrams for yourself and see if this looks fake or not, however, it is recommended to take this rumor with a grain of salt.

NVIDIA Introduces DRIVE AGX Orin Platform

NVIDIA today introduced NVIDIA DRIVE AGX Orin, a highly advanced software-defined platform for autonomous vehicles and robots. The platform is powered by a new system-on-a-chip (SoC) called Orin, which consists of 17 billion transistors and is the result of four years of R&D investment. The Orin SoC integrates NVIDIA's next-generation GPU architecture and Arm Hercules CPU cores, as well as new deep learning and computer vision accelerators that, in aggregate, deliver 200 trillion operations per second—nearly 7x the performance of NVIDIA's previous generation Xavier SoC.

Orin is designed to handle the large number of applications and deep neural networks that run simultaneously in autonomous vehicles and robots, while achieving systematic safety standards such as ISO 26262 ASIL-D. Built as a software-defined platform, DRIVE AGX Orin is developed to enable architecturally compatible platforms that scale from a Level 2 to full self-driving Level 5 vehicle, enabling OEMs to develop large-scale and complex families of software products. Since both Orin and Xavier are programmable through open CUDA and TensorRT APIs and libraries, developers can leverage their investments across multiple product generations.

NVIDIA and Tech Leaders Team to Build GPU-Accelerated Arm Servers

NVIDIA today introduced a reference design platform that enables companies to quickly build GPU-accelerated Arm -based servers, driving a new era of high performance computing for a growing range of applications in science and industry.

Announced by NVIDIA founder and CEO Jensen Huang at the SC19 supercomputing conference, the reference design platform — consisting of hardware and software building blocks — responds to growing demand in the HPC community to harness a broader range of CPU architectures. It allows supercomputing centers, hyperscale-cloud operators and enterprises to combine the advantage of NVIDIA's accelerated computing platform with the latest Arm-based server platforms.

New NVIDIA EGX Edge Supercomputing Platform Accelerates AI, IoT, 5G at the Edge

NVIDIA today announced the NVIDIA EGX Edge Supercomputing Platform - a high-performance, cloud-native platform that lets organizations harness rapidly streaming data from factory floors, manufacturing inspection lines and city streets to securely deliver next-generation AI, IoT and 5G-based services at scale, with low latency.

Early adopters of the platform - which combines NVIDIA CUDA-X software with NVIDIA-certified GPU servers and devices - include Walmart, BMW, Procter & Gamble, Samsung Electronics and NTT East, as well as the cities of San Francisco and Las Vegas.

Primate Labs Introduces GeekBench 5, Drops 32-bit Support

Primate Labs, developers of the ubiquitous benchmarking application GeekBench, have announced the release of version 5 of the software. The new version brings numerous changes, and one of the most important (since if affects compatibility) is that it will only be distributed in a 64-bit version. Some under the hood changes include additions to the CPU benchmark tests (including machine learning, augmented reality, and computational photography) as well as increases in the memory footprint for tests so as to better gauge impacts of your memory subsystem on your system's performance. Also introduced are different threading models for CPU benchmarking, allowing for changes in workload attribution and the corresponding impact on CPU performance.

On the Compute side of things, GeekBench 5 now supports the Vulkan API, which joins CUDA, Metal, and OpenCL. GPU-accelerated compute for computer vision tasks such as Stereo Matching, and augmented reality tasks such as Feature Matching are also available. For iOS users, there is now a Dark Mode for the results interface. GeekBench 5 is available now, 50% off, on Primate Labs' store.

NVIDIA Brings CUDA to ARM, Enabling New Path to Exascale Supercomputing

NVIDIA today announced its support for Arm CPUs, providing the high performance computing industry a new path to build extremely energy-efficient, AI-enabled exascale supercomputers. NVIDIA is making available to the Arm ecosystem its full stack of AI and HPC software - which accelerates more than 600 HPC applications and all AI frameworks - by year's end. The stack includes all NVIDIA CUDA-X AI and HPC libraries, GPU-accelerated AI frameworks and software development tools such as PGI compilers with OpenACC support and profilers. Once stack optimization is complete, NVIDIA will accelerate all major CPU architectures, including x86, POWER and Arm.

"Supercomputers are the essential instruments of scientific discovery, and achieving exascale supercomputing will dramatically expand the frontier of human knowledge," said Jensen Huang, founder and CEO of NVIDIA. "As traditional compute scaling ends, power will limit all supercomputers. The combination of NVIDIA's CUDA-accelerated computing and Arm's energy-efficient CPU architecture will give the HPC community a boost to exascale."

NVIDIA's SUPER Tease Rumored to Translate Into an Entire Lineup Shift Upwards for Turing

NVIDIA's SUPER teaser hasn't crystallized into something physical as of now, but we know it's coming - NVIDIA themselves saw to it that our (singularly) collective minds would be buzzing about what that teaser meant, looking to steal some thunder from AMD's E3 showing. Now, that teaser seems to be coalescing into something amongst the industry: an entire lineup upgrade for Turing products, with NVIDIA pulling their chips up one rung of the performance chair across their entire lineup.

Apparently, NVIDIA will be looking to increase performance across the board, by shuffling their chips in a downward manner whilst keeping the current pricing structure. This means that NVIDIA's TU106 chip, which powered their RTX 2070 graphics card, will now be powering the RTX 2060 SUPER (with a reported core count of 2176 CUDA cores). The TU104 chip, which power the current RTX 2080, will in the meantime be powering the SUPER version of the RTX 2070 (a reported 2560 CUDA cores are expected to be onboard), and the TU102 chip which powered their top-of-the-line RTX 2080 Ti will be brought down to the RTX 2080 SUPER (specs place this at 8 GB GDDR6 VRAM and 3072 CUDA cores). This carves the way for an even more powerful SKU in the RTX 2080 Ti SUPER, which should be launched at a later date. Salty waters say the RTX 2080 Ti SUPER will feature and unlocked chip which could be allowed to convert up to 300 W into graphics horsepower, so that's something to keep an eye - and a power meter on - for sure. Less defined talks suggest that NVIDIA will be introducing an RTX 2070 Ti SUPER equivalent with a new chip as well.

Manli Introduces its GeForce GTX 1650 Graphics Card Lineup

Manli Technology Group Limited, the major Graphics Cards, and other components manufacturer, today announced the affordable new member within the 16 series family - Manli GeForce GTX 1650. Manli GeForce GTX 1650 is powered by award-winning NVIDIA Turing architecture. It is also equipped with 4 GB of GDDR5, 128-bit memory controller, and built-in 896 CUDA Cores with core frequency set at 1485 MHz which can dynamically boost up to 1665 MHz. Moreover, Manli GeForce GTX 1650 has less power consumption with only 75W, and no external power supply required.

NVIDIA Extends DirectX Raytracing (DXR) Support to Many GeForce GTX GPUs

NVIDIA today announced that it is extending DXR (DirectX Raytracing) support to several GeForce GTX graphics models beyond its GeForce RTX series. These include the GTX 1660 Ti, GTX 1660, GTX 1080 Ti, GTX 1080, GTX 1070 Ti, GTX 1070, and GTX 1060 6 GB. The GTX 1060 3 GB and lower "Pascal" models don't support DXR, nor do older generations of NVIDIA GPUs. NVIDIA has implemented real-time raytracing on GPUs without specialized components such as RT cores or tensor cores, by essentially implementing the rendering path through shaders, in this case, CUDA cores. DXR support will be added through a new GeForce graphics driver later today.

The GPU's CUDA cores now have to calculate BVR, intersection, reflection, and refraction. The GTX 16-series chips have an edge over "Pascal" despite lacking RT cores, as the "Turing" CUDA cores support concurrent INT and FP execution, allowing more work to be done per clock. NVIDIA in a detailed presentation listed out the kinds of real-time ray-tracing effects available by the DXR API, namely reflections, shadows, advanced reflections and shadows, ambient occlusion, global illumination (unbaked), and combinations of these. The company put out detailed performance numbers for a selection of GTX 10-series and GTX 16-series GPUs, and compared them to RTX 20-series SKUs that have specialized hardware for DXR.
Update: Article updated with additional test data from NVIDIA.

Details on GeForce GTX 1660 Revealed Courtesy of MSI - 1408 CUDA Cores, GDDR 5 Memory

Details on NVIDIA's upcoming mainstream GTX 1660 graphics card have been revealed, which will help put its graphics-cruncinh prowess up to scrutiny. The new graphics card from NVIDIA slots in below the recently released GTX 1660 Ti (which provides roughly 5% better performance than NVIDIA's previous GTX 1070 graphics card) and above the yet-to-be-released GTX 1650.

The 1408 CUDA cores in the design amount to a 9% reduction in computing cores compared to the GTX 1660 Ti, but most of the savings (and performance impact) likely comes at the expense of the 6 GB (8 Gbps) GDDR5 memory this card is outfitted with, compared to the 1660 Ti's still GDDR6 implementation. The amount of cut GPU resources form NVIDIA is so low that we imagine these chips won't be coming from harvesting defective dies as much as from actually fusing off CUDA cores present in the TU116 chip. Using GDDR5 is still cheaper than the GDDR6 alternative (for now), and this also avoids straining the GDDR6 supply (if that was ever a concern for NVIDIA).

NVIDIA Adds New Options to Its MX200 Mobile Graphics Solutions - MX250 and MX230

NVIDIA has added new SKUs to its low power mobility graphics lineup. the MX230 and MX250 come in to replace The GeForce MX130 and MX150, but... there's really not that much of a performance improvement to justify the increase in the series' tier. Both solutions are based on Pascal, so there are no Turing performance uplifts at the execution level.

NVIDIA hasn't disclosed any CUDA core counts or other specifics on these chips; we only know that they are paired with GDDR 5 memory and feature Boost functionality for increased performance in particular scenarios. The strange thing is that NVIDIA's own performance scores compare their MX 130, MX150, and now MX230 and MX250 to Intel's UHD620 IGP part... and while the old MX150 was reported by NVIDIA as offering an up to 4x performance uplift compared to that Intel part, the new MX250 now claims an improvement of 3.5x the performance. Whether this is because of new testing methodology, or some other reason, only NVIDIA knows.

NVIDIA Readies GeForce GTX 1660 Ti Based on TU116, Sans RTX

It looks like RTX technology won't make it to sub-$250 market segments as the GPUs aren't fast enough to handle real-time raytracing, and it makes little economic sense for NVIDIA to add billions of additional transistors for RT cores. The company is hence carving out a sub-class of "Turing" GPUs under the TU11x ASIC series, which will power new GeForce GTX family SKUs, such as the GeForce GTX 1660 Ti, and other GTX 1000-series SKUs. These chips offer "Turing Shaders," which are basically CUDA cores that have the IPC and clock-speeds rivaling existing "Turing" GPUs, but no RTX capabilities. To sweeten the deal, NVIDIA will equip these cards with GDDR6 memory. These GPUs could still have tensor cores which are needed to accelerate DLSS, a feature highly relevant to this market segment.

The GeForce GTX 1660 Ti will no doubt be slower than the RTX 2060, and be based on a new ASIC codenamed TU116. According to a VideoCardz report, this 12 nm chip packs 1,536 CUDA cores based on the "Turing" architecture, and the same exact memory setup as the RTX 2060, with 6 GB of GDDR6 memory across a 192-bit wide memory interface. The lack of RT cores and a lower CUDA core count could make the TU116 a significantly smaller chip than the TU106, and something NVIDIA can afford to sell at sub-$300 price-points such as $250. The GTX 1060 6 GB is holding the fort for NVIDIA in this segment, besides other GTX 10-series SKUs such as the GTX 1070 occasionally dropping below the $300 mark at retailers' mercy. AMD recently improved its sub-$300 portfolio with the introduction of Radeon RX 590, which convincingly outperforms the GTX 1060 6 GB.

NVIDIA Introduces RAPIDS Open-Source GPU-Acceleration Platform

NVIDIA today announced a GPU-acceleration platform for data science and machine learning, with broad adoption from industry leaders, that enables even the largest companies to analyze massive amounts of data and make accurate business predictions at unprecedented speed.

RAPIDS open-source software gives data scientists a giant performance boost as they address highly complex business challenges, such as predicting credit card fraud, forecasting retail inventory and understanding customer buying behavior. Reflecting the growing consensus about the GPU's importance in data analytics, an array of companies is supporting RAPIDS - from pioneers in the open-source community, such as Databricks and Anaconda, to tech leaders like Hewlett Packard Enterprise, IBM and Oracle.

VUDA is a CUDA-Like Programming Interface for GPU Compute on Vulkan (Open-Source)

GitHub developer jgbit has started an open-source project called VUDA, which takes inspiration from NVIDIA's CUDA API to bring an easily accessible GPU compute interface to the open-source world. VUDA is implemented as wrapper on top of the highly popular next-gen graphics API Vulkan, which provides low-level access to hardware. VUDA comes as header-only C++ library, which means it's compatible with all platforms that have a C++ compiler and that support Vulkan.

While the project is still young, its potential is enormous, especially due to the open source nature (using the MIT license). The page on GitHub comes with a (very basic) sample, that could be a good start for using the library.

Intel is Adding Vulkan Support to Their OpenCV Library, First Signs of Discrete GPU?

Intel has submitted the first patches with Vulkan support to their open-source OpenCV library, which is designed to accelerate Computer Vision. The library is widely used for real-time applications as it comes with 1st-class optimizations for Intel processors and multi-core x86 in general. With Vulkan support, existing users can immediately move their neural network workloads to the GPU compute space without having to rewrite their code base.

At this point in time, the Vulkan backend supports Convolution, Concat, ReLU, LRN, PriorBox, Softmax, MaxPooling, AvePooling, and Permute. According to the source code changes, this is just "a beginning work for Vulkan in OpenCV DNN, more layer types will be supported and performance tuning is on the way."

It seems that now, with their own GPU development underway, Intel has found new love for the GPU-accelerated compute space. The choice of Vulkan is also interesting as the API is available on a wide range of platforms, which could mean that Intel is trying to turn Vulkan into a CUDA killer. Of course there's still a lot of work needed to achieve that goal, since NVIDIA has had almost a decade of head start.

NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

The GeForce RTX 2080 Ti is indeed based on an ASIC codenamed "TU102." NVIDIA was referring to this 775 mm² chip when talking about the 18.5 billion-transistor count in its keynote. The company also provided a breakdown of its various "cores," and a block-diagram. The GPU is still laid out like its predecessors, but each of the 72 streaming multiprocessors (SMs) packs RT cores and Tensor cores in addition to CUDA cores.

The TU102 features six GPCs (graphics processing clusters), which each pack 12 SMs. Each SM packs 64 CUDA cores, 8 Tensor cores, and 1 RT core. Each GPC packs six geometry units. The GPU also packs 288 TMUs and 96 ROPs. The TU102 supports a 384-bit wide GDDR6 memory bus, supporting 14 Gbps memory. There are also two NVLink channels, which NVIDIA plans to later launch as its next-generation multi-GPU technology.

NVIDIA GeForce RTX 2000 Series Specifications Pieced Together

Later today (20th August), NVIDIA will formally unveil its GeForce RTX 2000 series consumer graphics cards. This marks a major change in the brand name, triggered with the introduction of the new RT Cores, specialized components that accelerate real-time ray-tracing, a task too taxing on conventional CUDA cores. Ray-tracing and DNN acceleration requires SIMD components to crunch 4x4x4 matrix multiplication, which is what RT cores (and tensor cores) specialize at. The chips still have CUDA cores for everything else. This generation also debuts the new GDDR6 memory standard, although unlike GeForce "Pascal," the new GeForce "Turing" won't see a doubling in memory sizes.

NVIDIA is expected to debut the generation with the new GeForce RTX 2080 later today, with market availability by end of Month. Going by older rumors, the company could launch the lower RTX 2070 and higher RTX 2080+ by late-September, and the mid-range RTX 2060 series in October. Apparently the high-end RTX 2080 Ti could come out sooner than expected, given that VideoCardz already has some of its specifications in hand. Not a lot is known about how "Turing" compares with "Volta" in performance, but given that the TITAN V comes with tensor cores that can [in theory] be re-purposed as RT cores; it could continue on as NVIDIA's halo SKU for the client-segment.

NVIDIA Releases GeForce 388.71 WHQL Drivers

NVIDIA today released the latest version of their GeForce software suite. Version 388.71 is a game-ready one, which brings the best performance profile for the phenomenon that is Player Unknown's BattleGrounds. For professionals, there's added support for CUDA 9.1, and Warframe SLI profiles have been updated. There are also many 3D Vision profiles that have been updated for this release, so make sure to check them out after the break, alongside other bug fixes and known issues.

As always, users can download these drivers right here on TechPowerUp. Just follow the link below.

NVIDIA Announces TITAN V "Volta" Graphics Card

NVIDIA in a shock move, announced its new flagship graphics card, the TITAN V. This card implements the "Volta" GV100 graphics processor, the same one which drives the company's Tesla V100 HPC accelerator. The GV100 is a multi-chip module, with the GPU die and three HBM2 memory stacks sharing a package. The card features 12 GB of HBM2 memory across a 3072-bit wide memory interface. The GPU die has been built on the 12 nm FinFET+ process by TSMC. NVIDIA TITAN V maxes out the GV100 silicon, if not its memory interface, featuring a whopping 5,120 CUDA cores, 640 Tensor cores (specialized units that accelerate neural-net building/training). The CUDA cores are spread across 80 streaming multiprocessors (64 CUDA cores per SM), spread across 6 graphics processing clusters (GPCs). The TMU count is 320.

The GPU core is clocked at 1200 MHz, with a GPU Boost frequency of 1455 MHz, and an HBM2 memory clock of 850 MHz, translating into 652.8 GB/s memory bandwidth (1.70 Gbps stacks). The card draws power from a combination of 6-pin and 8-pin PCIe power connectors. Display outputs include three DP and one HDMI connectors. With a wallet-scorching price of USD $2,999, and available exclusively through NVIDIA store, the TITAN V is evidence that with Intel deciding to sell client-segment processors for $2,000, it was a matter of time before GPU makers seek out that price-band. At $3k, the GV100's margins are probably more than made up for.

NVIDIA Announces SaturnV AI Supercomputer Powered by "Volta"

NVIDIA at the Supercomputing 2017 conference announced a major upgrade of its new SaturnV AI supercomputer, which when complete, the company claims, will be not just one of the world's top-10 AI supercomputers in terms of raw compute power; but will also the world's most energy-efficient. The SaturnV will be a cluster supercomputer with 660 NVIDIA DGX-1 nodes. Each such node packs eight NVIDIA GV100 GPUs, which takes the machine's total GPU count to a staggering 5,280 (that's GPUs, not CUDA cores). They add up to an FP16 performance that's scraping the ExaFLOP (1,000-petaFLOP or 10^18 FLOP/s) barrier; while its FP64 (double-precision) compute performance nears 40 petaFLOP/s (40,000 TFLOP/s).

SaturnV should beat Summit, a supercomputer being co-developed by NVIDIA and IBM, which in turn should unseat Sunway TaihuLight, that's currently the world's fastest supercomputer. This feat gains prominence as NVIDIA SaturnV and NVIDIA+IBM Summit are both machines built by the American private-sector, which are trying to beat a supercomputing leader backed by the mighty Chinese exchequer. The other claim to fame of SaturnV is its energy-efficiency. Before its upgrade, SaturnV achieved an energy-efficiency of a staggering 15.1 GFLOP/s per Watt, which was already the fourth "greenest." NVIDIA expects the upgraded SaturnV to take the number-one spot.
Return to Keyword Browsing