News Posts matching #A100

Return to Keyword Browsing

GPU Memory Latency Tested on AMD's RDNA 2 and NVIDIA's Ampere Architecture

Graphics cards have been developed over the years so that they feature multi-level cache hierarchies. These levels of cache have been engineered to fill in the gap between memory and compute, a growing problem that cripples the performance of GPUs in many applications. Different GPU vendors, like AMD and NVIDIA, have different sizes of register files, L1, and L2 caches, depending on the architecture. For example, the amount of L2 cache on NVIDIA's A100 GPU is 40 MB, which is seven times larger compared to the previous generation V100. That just shows how much new applications require bigger cache sizes, which is ever-increasing to satisfy the needs.

Today, we have an interesting report coming from Chips and Cheese. The website has decided to measure GPU memory latency of the latest generation of cards - AMD's RDNA 2 and NVIDIA's Ampere. By using simple pointer chasing tests in OpenCL, we get interesting results. RDNA 2 cache is fast and massive. Compared to Ampere, cache latency is much lower, while the VRAM latency is about the same. NVIDIA uses a two-level cache system consisting out of L1 and L2, which seems to be a rather slow solution. Data coming from Ampere's SM, which holds L1 cache, to the outside L2 is taking over 100 ns of latency.

NVIDIA Announces New DGX SuperPOD, the First Cloud-Native, Multi-Tenant Supercomputer, Opening World of AI to Enterprise

NVIDIA today unveiled the world's first cloud-native, multi-tenant AI supercomputer—the next-generation NVIDIA DGX SuperPOD featuring NVIDIA BlueField -2 DPUs. Fortifying the DGX SuperPOD with BlueField-2 DPUs—data processing units that offload, accelerate and isolate users' data—provides customers with secure connections to their AI infrastructure.

The company also announced NVIDIA Base Command, which enables multiple users and IT teams to securely access, share and operate their DGX SuperPOD infrastructure. Base Command coordinates AI training and operations on DGX SuperPOD infrastructure to enable the work of teams of data scientists and developers located around the globe.

Tianshu Zhixin Big Island GPU is a 37 TeraFLOP FP32 Computing Monster

Tianshu Zhixin, a Chinese startup company dedicated to designing advanced processors for accelerating various kinds of tasks, has officially entered the production of its latest GPGPU design. Called "Big Island" GPU, it is the company's entry into the GPU market, currently dominated by AMD, NVIDIA, and soon Intel. So what is so special about Tianshu Zhixin's Big Island GPU, making it so important? Firstly, it represents China's attempt of independence from the outside processor suppliers, ensuring maximum security at all times. Secondly, it is an interesting feat to enter a market that is controlled by big players and attempt to grab a piece of that cake. To be successful, the GPU needs to represent a great design.

And great it is, at least on paper. The specifications list that Big Island is currently being manufactured on TSMC's 7 nm node using CoWoS packaging technology, enabling the die to feature over 24 billion transistors. When it comes to performance, the company claims that the GPU is capable of crunching 37 TeraFLOPs of single-precision FP32 data. At FP16/BF16 half-precision, the chip is capable of outputting 147 TeraFLOPs. When it comes to integer performance, it can achieve 317, 147, and 295 TOPS in INT32, INT16, and IN8 respectively. There is no data on double-precision floating-point numbers, so the chip is optimized for single-precision workloads. There is also 32 GB of HBM2 memory present, and it has 1.2 TB of bandwidth. If we compare the chip to the competing offers like NVIDIA A100 or AMD MI100, the new Big Island GPU outperforms both at single-precision FP32 compute tasks, for which it is designed.
Tianshu Zhixin Big Island Tianshu Zhixin Big Island Tianshu Zhixin Big Island Tianshu Zhixin Big Island
Pictures of possible solutions follow.

NVIDIA Could Reuse Ampere GA100 GPU for CMP HX Cryptomining Series

When NVIDIA introduced its Ampere family of graphics cards, the GPU lineup's first product was the A100 GPU. While not being a GPU used for gaming, the model is designed with compute-heavy workloads in mind. Even NVIDIA says that "NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale to power the world's highest-performing elastic data centers for AI, data analytics, and HPC." However, it seems like the GA100 SKU, the base of the A100 GPU, could be used for another task that requires heavy computation and could benefit very much from the sheer core count that the biggest Ampere SKU offers.

According to a known leaker @kopite7kimi (Twitter), NVIDIA could repurpose the GA100 GPU SKU and launch it as a part of the CMP HX crypto mining series of graphics cards. As a reminder, the CMP series is specifically designed for the sole purpose of mining cryptocurrency, and CMP products have no video outputs. According to Kopite, the repurposed GPU SKU could be a "mining monster", which is not too hard to believe given the huge core count the SKU has and the fact that it was made for heavy computation workloads. While we do not the exact specifications of the rumored CMP HX SKU, you can check out the A100 GPU specifications here.

NVIDIA Unveils AI Enterprise Software Suite to Help Every Industry Unlock the Power of AI

NVIDIA today announced NVIDIA AI Enterprise, a comprehensive software suite of enterprise-grade AI tools and frameworks optimized, certified and supported by NVIDIA, exclusively with VMware vSphere 7 Update 2, separately announced today.

Through a first-of-its-kind industry collaboration to develop an AI-Ready Enterprise platform, NVIDIA teamed with VMware to virtualize AI workloads on VMware vSphere with NVIDIA AI Enterprise. The offering gives enterprises the software required to develop a broad range of AI solutions, such as advanced diagnostics in healthcare, smart factories for manufacturing, and fraud detection in financial services.
NVIDIA AI Enterprise Software Suite

GIGABYTE Releases 2U Server: G262-ZR0 with NVIDIA HGX A100 4-GPU

GIGABYTE Technology, (TWSE: 2376), an industry leader in high-performance servers and workstations, today announced the G262-ZR0 for HPC, AI, and data analytics. Designed to support the highest-level of performance in GPU computing, the G262-ZR0 incorporates fast PCIe 4.0 throughput in addition to NVIDIA HGX technologies and NVIDIA NVLink to provide industry leading bandwidth performance.

GPU Shortage Hits Data Centers: NVIDIA A100 GPU Supply Insufficient

GPU supply has been one of the most interesting things this year. With a huge demand for the new GPU generations like NVIDIA's Ampere and AMD's RDNA 2 "Big Navi" graphics cards, everyone is trying to grab a card for themselves. Besides the huge demand, there is also a big problem. The supply of these GPUs is just too low to satisfy the demand, driving up the prices, and increasing the scarcity of them. Companies like NVIDIA have their priorities set: all of the major production will go for the data center expansion and data center customers. However, even that plan is proving not to be good enough.

The scarcity of GPUs has now hit data centers, with NVIDIA unable to satisfy the demand for its A100 GPUs designed for high-performance computing. "It is going to take several months to catch up some of the demand," said Ian Buck, vice president of Accelerated Computing Business Unit at Nvidia. That is an indicator of just how huge the demand for these accelerators is. With the recent company announcement of A100 GPU with 80 GB memory, partners expect to have the first cards in their systems in the first half of 2021. That means that this situation and inadequate supply will hopefully resolve sometime around that timeframe.

TOP500 Expands Exaflops Capacity Amidst Low Turnover

The 56th edition of the TOP500 saw the Japanese Fugaku supercomputer solidify its number one status in a list that reflects a flattening performance growth curve. Although two new systems managed to make it into the top 10, the full list recorded the smallest number of new entries since the project began in 1993.

The entry level to the list moved up to 1.32 petaflops on the High Performance Linpack (HPL) benchmark, a small increase from 1.23 petaflops recorded in the June 2020 rankings. In a similar vein, the aggregate performance of all 500 systems grew from 2.22 exaflops in June to just 2.43 exaflops on the latest list. Likewise, average concurrency per system barely increased at all, growing from 145,363 cores six months ago to 145,465 cores in the current list.

NVIDIA Announces the A100 80GB GPU for AI Supercomputing

NVIDIA today unveiled the NVIDIA A100 80 GB GPU—the latest innovation powering the NVIDIA HGX AI supercomputing platform—with twice the memory of its predecessor, providing researchers and engineers unprecedented speed and performance to unlock the next wave of AI and scientific breakthroughs. The new A100 with HBM2E technology doubles the A100 40 GB GPU's high-bandwidth memory to 80 GB and delivers over 2 terabytes per second of memory bandwidth. This allows data to be fed quickly to A100, the world's fastest data center GPU, enabling researchers to accelerate their applications even faster and take on even larger models and datasets.

"Achieving state-of-the-art results in HPC and AI research requires building the biggest models, but these demand more memory capacity and bandwidth than ever before," said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. "The A100 80 GB GPU provides double the memory of its predecessor, which was introduced just six months ago, and breaks the 2 TB per second barrier, enabling researchers to tackle the world's most important scientific and big data challenges."

AMD Eyes Mid-November CDNA Debut with Instinct MI100, "World's Fastest FP64 Accelerator"

AMD is eyeing a mid-November debut for its CDNA compute architecture with the Instinct MI100 compute accelerator card. CDNA is a fork of RDNA for headless GPU compute accelerators with large SIMD resources. An Aroged report pins the launch of the MI100 at November 16, 2020, according to leaked AMD documents it dug up. The Instinct MI100 will eye a slice of the same machine intelligence pie NVIDIA is seeking to dominate with its A100 Tensor Core compute accelerator.

It appears like the first MI100 cards will be built in the add-in-board form-factor with PCI-Express 4.0 x16 interfaces, although older reports do predict AMD creating a socketed variant of its Infinity Fabric interconnect for machines with larger numbers of these compute processors. In the leaked document, AMD claims that the Instinct MI100 is the "world's highest double-precision accelerator for machine learning, HPC, cloud compute, and rendering systems." This is an especially big claim given that the A100 Tensor Core features FP64 CUDA cores based on the "Ampere" architecture. Then again, given that AMD claims that the RDNA2 graphics architecture is clawing back at NVIDIA with performance at the high-end, the competitiveness of the Instinct MI100 against the A100 Tensor Core cannot be discounted.

TechPowerUp GPU-Z v2.35.0 Released

TechPowerUp today released the latest version of TechPowerUp GPU-Z, the popular graphics sub-system information and diagnostic utility. Version 2.35.0 adds support for new GPUs, and fixes a number of bugs. To begin with, GPU-Z adds support for AMD Radeon RX 6000 series GPUs based on the "Navi 21" silicon. Support is also added for Intel DG1 GPU. BIOS extraction and upload for NVIDIA's RTX 30-series "Ampere" GPUs has finally been introduced. Memory size reporting on the RTX 3090 has been fixed. The latest Windows 10 Insider Build (20231.1000) made some changes to DirectML, which caused GPU-Z to report it as unavailable, this has been fixed.

TechPowerUp GPU-Z 2.35.0 also makes various improvements to fake GPU detection for cards based on NVIDIA GT216 and GT218 ASICs. Hardware detection for AMD Radeon Pro 5600M based on "Navi 12" has been fixed. Among the other GPUs for which support was added with this release are NVIDIA A100 Tensor Core PCIe, Intel UHD Gen9.5 graphics on the i5-10200H, and Radeon HD 8210E and Barco MXRT-6700. Grab GPU-Z from the link below.

DOWNLOAD: TechPowerUp GPU-Z 2.35.0
The change-log follows.

NVIDIA Reportedly Moving Ampere to 7 nm TSMC in 2021

A report straight from DigiTimes claims that NVIDIA is looking to upgrade their Ampere consumer GPUs from Samsung's 8 nm to TSMC's 7 nm. According to the source, the volume of this transition should be "very large", but most likely wouldn't reflect the entirety of Ampere's consumer-facing product stack. The report claims that TSMC has become more "friendly" to NVIDIA. This could be because TSMC now has available manufacturing capacity in 7 nm due to some of its clients moving to the company's 5 nm node, or simply because TSMC hadn't believed NVIDIA to consider Samsung as a viable foundry alternative - which it now does - and has thus lowered pricing.

There are various reasons being leveraged at this, none with substantial grounds other than "reported from industry sources". NVIDIA looking for better yields is one of the appointed reasons, as is its history as a TSMC customer. NVIDIA shouldn't have too high a cost porting its manufacturing to TSMC in terms of design changes to the silicon level so as to cater to different characteristics of TSMC's 7 nm, because the company's GA100 GPU (Ampere for the non-consumer market) is already manufactured at TSMC. The next part of this post is mere (relatively informed) speculation, so take that with a saltier disposition than what came before.

NVIDIA Building UK's Most Powerful Supercomputer, Dedicated to AI Research in Healthcare

NVIDIA today announced that it is building the United Kingdom's most powerful supercomputer, which it will make available to U.K. healthcare researchers using AI to solve pressing medical challenges, including those presented by COVID-19.

Expected to come online by year end, the "Cambridge-1" supercomputer will be an NVIDIA DGX SuperPOD system capable of delivering more than 400 petaflops of AI performance and 8 petaflops of Linpack performance, which would rank it No. 29 on the latest TOP500 list of the world's most powerful supercomputers. It will also rank among the world's top 3 most energy-efficient supercomputers on the current Green500 list.

AMD Radeon MI100 "Arcturus" Alleged Specification Listed, the GPU Could be Coming in December

AMD has been preparing to launch its MI100 accelerator and fight NVIDIA's A100 Ampere GPU in machine learning and AI horizon, and generally compute-intensive workloads. According to some news sources over at AdoredTV, the GPU alleged specifications were listed, along with some slides about the GPU which should be presented at the launch. So to start, this is what we have on the new Radeon MI100 "Arcturus" GPU based on CDNA architecture. The alleged specifications mention that the GPU will feature 120 Compute Units (CUs), meaning that if the GPU keeps the 64-core per CU configuration, we are looking at 7680 cores powered by CDNA architecture.

The leaked slide mentions that the GPU can put out as much as 42 TeraFLOPs of FP32, single-precision compute. This makes it more than twice as fast compared to NVIDIA's A100 GPU at FP32 workloads. To achieve that, the card would need to have all of its 7680 cores running at 2.75 GHz, which would be a bit high number. On the same slide, the GPU is claimed to have 9.5 TeraFLOPs of FP64 dual-precision performance, while the FP16 power is going to be around 150 TeraFLOPs. For comparison, the A100 GPU from NVIDIA features 9.7 TeraFLOPS of FP64, 19.5 TeraFLOPS of FP32, and 312 (or 634 with sparsity enabled) TeraFLOPs of FP16 compute. AMD GPU is allegedly only more powerful for FP32 workloads, where it outperforms the NVIDIA card by 2.4 times. And if that is really the case, AMD has found its niche in the HPC sector, and it plans to dominate there. According to AdoredTV sources, the GPU could be coming in December of this year.

NVIDIA A100 Ampere GPU Benchmarked on MLPerf

When NVIDIA announced its Ampere lineup of the graphics cards, the A100 GPU was there to represent the higher performance of the lineup. The GPU is optimized for heavy computing workloads as well as machine learning and AI tasks. Today, NVIDIA has submitted the MLPerf results on the A100 GPU to the MLPerf database. What is MLPerf and why it matters you might think? Well, MLPerf is a system benchmark designed to test the capability of a system for machine learning tasks and enable comparability between systems. The A100 GPU got benchmarked in the latest 0.7 version of the benchmark.

The baseline for the results was the previous generation king, V100 Volta GPU. The new A100 GPU is average 1.5 to 2.5 times faster compared to V100. So far A100 GPU system beats all offers available. It is worth pointing out that not all competing systems have been submitted, however, so far the A100 GPU is the fastest.
The performance results follow:

NVIDIA Ampere A100 GPU Gets Benchmark and Takes the Crown of the Fastest GPU in the World

When NVIDIA introduced its Ampere A100 GPU, it was said to be the company's fastest creation yet. However, we didn't know how fast the GPU exactly is. With the whopping 6912 CUDA cores, the GPU can pack all that on a 7 nm die with 54 billion transistors. Paired with 40 GB of super-fast HBM2E memory with a bandwidth of 1555 GB/s, the GPU is set to be a good performer. And how fast it exactly is you might wonder? Well, thanks to the Jules Urbach, the CEO of OTOY, a software developer and maker of OctaneRender software, we have the first benchmark of the Ampere A100 GPU.

Scoring 446 points in OctaneBench, a benchmark for OctaneRender, the Ampere GPU takes the crown of the world's fastest GPU. The GeForce RTX 2080 Ti GPU scores 302 points, which makes the A100 GPU up to 47.7% faster than Turing. However, the fastest Turing card found in the benchmark database is the Quadro RTX 8000, which scored 328 points, showing that Turing is still holding well. The result of Ampere A100 was running with RTX turned off, which could yield additional performance if RTX was turned on and that part of the silicon started working.

NVIDIA to Build Fastest AI Supercomputer in Academia

The University of Florida and NVIDIA Tuesday unveiled a plan to build the world's fastest AI supercomputer in academia, delivering 700 petaflops of AI performance. The effort is anchored by a $50 million gift: $25 million from alumnus and NVIDIA co-founder Chris Malachowsky and $25 million in hardware, software, training and services from NVIDIA.

"We've created a replicable, powerful model of public-private cooperation for everyone's benefit," said Malachowsky, who serves as an NVIDIA Fellow, in an online event featuring leaders from both the UF and NVIDIA. UF will invest an additional $20 million to create an AI-centric supercomputing and data center.

NVIDIA GeForce RTX 3070 and RTX 3070 Ti Rumored Specifications Appear

NVIDIA is slowly preparing to launch its next-generation Ampere graphics cards for consumers after we got the A100 GPU for data-centric applications. The Ampere lineup is getting more and more leaks and speculations every day, so we can assume that the launch is near. In the most recent round of rumors, we have some new information about the GPU SKU and memory of the upcoming GeForce RTX 3070 and RTX 3070 Ti. Thanks to Twitter user kopite7kimi, who had multiple confirmed speculations in the past, we have information that GeForce RTX 3070 and RTX 3070 Ti use a GA104 GPU SKU, paired with GDDR6 memory. The cath is that the Ti version of GPU will feature a new GDDR6X memory, which has a higher speed and can reportedly go up to 21 Gbps.

The regular RTX 3070 is supposed to have 2944 CUDA cores on GA104-400 GPU die, while its bigger brother RTX 3070 Ti is designed with 3072 CUDA cores on GA104-300 die. Paired with new technologies that Ampere architecture brings, with a new GDDR6X memory, the GPUs are set to be very good performers. It is estimated that both of the cards would reach a memory bandwidth of 512 GB/s. So far that is all we have. NVIDIA is reportedly in Design Validation Test (DVT) phase with these cards and is preparing for mass production in August. Following those events is the official launch which should happen before the end of this year, with some speculations indicating that it is in September.

GIGABYTE Introduces a Broad Portfolio of G-series Servers Powered by NVIDIA A100 PCIe

GIGABYTE, an industry leader in high-performance servers and workstations, announced its G-series servers' validation plan. Following the NVIDIA A100 PCIe GPU announcement today, GIGABYTE has completed the compatibility validation of the G481-HA0 / G292-Z40 and added the NVIDIA A100 to the support list for these two servers. The remaining G-series servers will be divided into two waves to complete their respective compatibility tests soon. At the same time, GIGABYTE also launched a new G492 series server based on the AMD EPYC 7002 processor family, which provides PCIe Gen4 support for up to 10 NVIDIA A100 PCIe GPUs. The G492 is a server with the highest computing power for AI models training on the market today. GIGABYTE will offer two SKUs for the G492. The G492-Z50 will be at a more approachable price point, whereas the G492-Z51 will be geared towards higher performance.

The G492 is GIGABYTE's second-generation 4U G-series server. Based on the first generation G481 (Intel architecture) / G482 (AMD architecture) servers, the user-friendly design and scalability have been further optimized. In addition to supporting two 280 W 2nd Gen AMD EPYC 7002 processors, the 32 DDR4 memory slots support up to 8 TB of memory and maintain data transmission at 3200 MHz. The G492 has built-in PCIe Gen4 switches, which can provide more PCIe Gen4 lanes. PCIe Gen4 has twice the I/O performance of PCIe Gen3 and fully enables the computing power of the NVIDIA A100 Tensor Core GPU, or it can be applied to PCIe storage to help provide a storage upgrade path that is native to the G492.

NVIDIA Announces A100 PCIe Tensor Core Accelerator Based on Ampere Architecture

NVIDIA and partners today announced a new way for interested users to partake in the AI-training capabilities of their Ampere graphics architecture in the form of the A100 PCIe. Diving a little deeper, and as the name implies, this solution differs from the SXM form-factor in that it can be deployed through systems' existing PCIe slots. The change in interface comes with a reduction in TDP from 400 W down to 250 W in the PCIe version - and equivalent reduced performance.

NVIDIA says peak throughput is the same across the SXM and PCIe version of their A100 accelerator. The difference comes in sustained workloads, where NVIDIA quotes the A100 as delivering 10% less performance compared to its SXM brethren. The A100 PCIe comes with the same 2.4 Gbps, 40 GB HBM2 memory footprint as the SXM version, and all other chip resources are the same. We're thus looking at the same 862 mm² silicon chip and 6,192 CUDA cores across both models. The difference is that the PCIe accelerator can more easily be integrated in existing server infrastructure.

ASUS Announces SC4000A-E10 GPGPU Server with NVIDIA A100 Tensor Core GPUs

ASUSTek, the leading IT Company in server systems, server motherboards and workstations today announced the new NVIDIA A100-powered server - ESC4000A E10 to accelerate and optimize data centers for high utilization and low total cost of ownership with the PCIe Gen 4 expansions, OCP 3.0 networking, faster compute and better GPU performance. ASUS continues building a strong partnership with NVIDIA to deliver unprecedented acceleration and flexibility to power the world's highest-performing elastic data centers for AI, data analytics, and HPC applications.

ASUS ESC4000A-E10 is a 2U server powered by the AMD EPYC 7002 series processors that deliver up to 2x the performance and 4x the floating point capability in a single socket versus the previous 7001 generation. Targeted for AI, HPC and VDI applications in data center or enterprise environments which require powerful CPU cores, more GPUs support, and faster transmission speed, ESC4000A E10 focuses on delivering GPU-optimized performance with support for up to four double-deck high performance or eight single-deck GPUs including the latest NVIDIA Ampere-architecture V100, Tesla, and Quadro. This also benefits on virtualization to consolidate GPU resources in to shared pool for users to utilize resources in more efficient ways.
Return to Keyword Browsing