News Posts matching #A100

Return to Keyword Browsing

AMD Radeon MI100 "Arcturus" Alleged Specification Listed, the GPU Could be Coming in December

AMD has been preparing to launch its MI100 accelerator and fight NVIDIA's A100 Ampere GPU in machine learning and AI horizon, and generally compute-intensive workloads. According to some news sources over at AdoredTV, the GPU alleged specifications were listed, along with some slides about the GPU which should be presented at the launch. So to start, this is what we have on the new Radeon MI100 "Arcturus" GPU based on CDNA architecture. The alleged specifications mention that the GPU will feature 120 Compute Units (CUs), meaning that if the GPU keeps the 64-core per CU configuration, we are looking at 7680 cores powered by CDNA architecture.

The leaked slide mentions that the GPU can put out as much as 42 TeraFLOPs of FP32, single-precision compute. This makes it more than twice as fast compared to NVIDIA's A100 GPU at FP32 workloads. To achieve that, the card would need to have all of its 7680 cores running at 2.75 GHz, which would be a bit high number. On the same slide, the GPU is claimed to have 9.5 TeraFLOPs of FP64 dual-precision performance, while the FP16 power is going to be around 150 TeraFLOPs. For comparison, the A100 GPU from NVIDIA features 9.7 TeraFLOPS of FP64, 19.5 TeraFLOPS of FP32, and 312 (or 634 with sparsity enabled) TeraFLOPs of FP16 compute. AMD GPU is allegedly only more powerful for FP32 workloads, where it outperforms the NVIDIA card by 2.4 times. And if that is really the case, AMD has found its niche in the HPC sector, and it plans to dominate there. According to AdoredTV sources, the GPU could be coming in December of this year.

NVIDIA A100 Ampere GPU Benchmarked on MLPerf

When NVIDIA announced its Ampere lineup of the graphics cards, the A100 GPU was there to represent the higher performance of the lineup. The GPU is optimized for heavy computing workloads as well as machine learning and AI tasks. Today, NVIDIA has submitted the MLPerf results on the A100 GPU to the MLPerf database. What is MLPerf and why it matters you might think? Well, MLPerf is a system benchmark designed to test the capability of a system for machine learning tasks and enable comparability between systems. The A100 GPU got benchmarked in the latest 0.7 version of the benchmark.

The baseline for the results was the previous generation king, V100 Volta GPU. The new A100 GPU is average 1.5 to 2.5 times faster compared to V100. So far A100 GPU system beats all offers available. It is worth pointing out that not all competing systems have been submitted, however, so far the A100 GPU is the fastest.
The performance results follow:

NVIDIA Ampere A100 GPU Gets Benchmark and Takes the Crown of the Fastest GPU in the World

When NVIDIA introduced its Ampere A100 GPU, it was said to be the company's fastest creation yet. However, we didn't know how fast the GPU exactly is. With the whopping 6912 CUDA cores, the GPU can pack all that on a 7 nm die with 54 billion transistors. Paired with 40 GB of super-fast HBM2E memory with a bandwidth of 1555 GB/s, the GPU is set to be a good performer. And how fast it exactly is you might wonder? Well, thanks to the Jules Urbach, the CEO of OTOY, a software developer and maker of OctaneRender software, we have the first benchmark of the Ampere A100 GPU.

Scoring 446 points in OctaneBench, a benchmark for OctaneRender, the Ampere GPU takes the crown of the world's fastest GPU. The GeForce RTX 2080 Ti GPU scores 302 points, which makes the A100 GPU up to 47.7% faster than Turing. However, the fastest Turing card found in the benchmark database is the Quadro RTX 8000, which scored 328 points, showing that Turing is still holding well. The result of Ampere A100 was running with RTX turned off, which could yield additional performance if RTX was turned on and that part of the silicon started working.

NVIDIA to Build Fastest AI Supercomputer in Academia

The University of Florida and NVIDIA Tuesday unveiled a plan to build the world's fastest AI supercomputer in academia, delivering 700 petaflops of AI performance. The effort is anchored by a $50 million gift: $25 million from alumnus and NVIDIA co-founder Chris Malachowsky and $25 million in hardware, software, training and services from NVIDIA.

"We've created a replicable, powerful model of public-private cooperation for everyone's benefit," said Malachowsky, who serves as an NVIDIA Fellow, in an online event featuring leaders from both the UF and NVIDIA. UF will invest an additional $20 million to create an AI-centric supercomputing and data center.

NVIDIA GeForce RTX 3070 and RTX 3070 Ti Rumored Specifications Appear

NVIDIA is slowly preparing to launch its next-generation Ampere graphics cards for consumers after we got the A100 GPU for data-centric applications. The Ampere lineup is getting more and more leaks and speculations every day, so we can assume that the launch is near. In the most recent round of rumors, we have some new information about the GPU SKU and memory of the upcoming GeForce RTX 3070 and RTX 3070 Ti. Thanks to Twitter user kopite7kimi, who had multiple confirmed speculations in the past, we have information that GeForce RTX 3070 and RTX 3070 Ti use a GA104 GPU SKU, paired with GDDR6 memory. The cath is that the Ti version of GPU will feature a new GDDR6X memory, which has a higher speed and can reportedly go up to 21 Gbps.

The regular RTX 3070 is supposed to have 2944 CUDA cores on GA104-400 GPU die, while its bigger brother RTX 3070 Ti is designed with 3072 CUDA cores on GA104-300 die. Paired with new technologies that Ampere architecture brings, with a new GDDR6X memory, the GPUs are set to be very good performers. It is estimated that both of the cards would reach a memory bandwidth of 512 GB/s. So far that is all we have. NVIDIA is reportedly in Design Validation Test (DVT) phase with these cards and is preparing for mass production in August. Following those events is the official launch which should happen before the end of this year, with some speculations indicating that it is in September.

GIGABYTE Introduces a Broad Portfolio of G-series Servers Powered by NVIDIA A100 PCIe

GIGABYTE, an industry leader in high-performance servers and workstations, announced its G-series servers' validation plan. Following the NVIDIA A100 PCIe GPU announcement today, GIGABYTE has completed the compatibility validation of the G481-HA0 / G292-Z40 and added the NVIDIA A100 to the support list for these two servers. The remaining G-series servers will be divided into two waves to complete their respective compatibility tests soon. At the same time, GIGABYTE also launched a new G492 series server based on the AMD EPYC 7002 processor family, which provides PCIe Gen4 support for up to 10 NVIDIA A100 PCIe GPUs. The G492 is a server with the highest computing power for AI models training on the market today. GIGABYTE will offer two SKUs for the G492. The G492-Z50 will be at a more approachable price point, whereas the G492-Z51 will be geared towards higher performance.

The G492 is GIGABYTE's second-generation 4U G-series server. Based on the first generation G481 (Intel architecture) / G482 (AMD architecture) servers, the user-friendly design and scalability have been further optimized. In addition to supporting two 280 W 2nd Gen AMD EPYC 7002 processors, the 32 DDR4 memory slots support up to 8 TB of memory and maintain data transmission at 3200 MHz. The G492 has built-in PCIe Gen4 switches, which can provide more PCIe Gen4 lanes. PCIe Gen4 has twice the I/O performance of PCIe Gen3 and fully enables the computing power of the NVIDIA A100 Tensor Core GPU, or it can be applied to PCIe storage to help provide a storage upgrade path that is native to the G492.

NVIDIA Announces A100 PCIe Tensor Core Accelerator Based on Ampere Architecture

NVIDIA and partners today announced a new way for interested users to partake in the AI-training capabilities of their Ampere graphics architecture in the form of the A100 PCIe. Diving a little deeper, and as the name implies, this solution differs from the SXM form-factor in that it can be deployed through systems' existing PCIe slots. The change in interface comes with a reduction in TDP from 400 W down to 250 W in the PCIe version - and equivalent reduced performance.

NVIDIA says peak throughput is the same across the SXM and PCIe version of their A100 accelerator. The difference comes in sustained workloads, where NVIDIA quotes the A100 as delivering 10% less performance compared to its SXM brethren. The A100 PCIe comes with the same 2.4 Gbps, 40 GB HBM2 memory footprint as the SXM version, and all other chip resources are the same. We're thus looking at the same 862 mm² silicon chip and 6,192 CUDA cores across both models. The difference is that the PCIe accelerator can more easily be integrated in existing server infrastructure.

ASUS Announces SC4000A-E10 GPGPU Server with NVIDIA A100 Tensor Core GPUs

ASUSTek, the leading IT Company in server systems, server motherboards and workstations today announced the new NVIDIA A100-powered server - ESC4000A E10 to accelerate and optimize data centers for high utilization and low total cost of ownership with the PCIe Gen 4 expansions, OCP 3.0 networking, faster compute and better GPU performance. ASUS continues building a strong partnership with NVIDIA to deliver unprecedented acceleration and flexibility to power the world's highest-performing elastic data centers for AI, data analytics, and HPC applications.

ASUS ESC4000A-E10 is a 2U server powered by the AMD EPYC 7002 series processors that deliver up to 2x the performance and 4x the floating point capability in a single socket versus the previous 7001 generation. Targeted for AI, HPC and VDI applications in data center or enterprise environments which require powerful CPU cores, more GPUs support, and faster transmission speed, ESC4000A E10 focuses on delivering GPU-optimized performance with support for up to four double-deck high performance or eight single-deck GPUs including the latest NVIDIA Ampere-architecture V100, Tesla, and Quadro. This also benefits on virtualization to consolidate GPU resources in to shared pool for users to utilize resources in more efficient ways.
Return to Keyword Browsing