• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,275 (0.92/day)
NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.



View at TechPowerUp Main Site | Source
 
Joined
Aug 22, 2007
Messages
3,466 (0.57/day)
Location
CA, US
System Name :)
Processor Intel 13700k
Motherboard Gigabyte z790 UD AC
Cooling Noctua NH-D15
Memory 64GB GSKILL DDR5
Video Card(s) Gigabyte RTX 4090 Gaming OC
Storage 960GB Optane 905P U.2 SSD + 4TB PCIe4 U.2 SSD
Display(s) Alienware AW3423DW 175Hz QD-OLED + Nixeus 27" IPS 1440p 144Hz
Case Fractal Design Torrent
Audio Device(s) MOTU M4 - JBL 305P MKII w/2x JL Audio 10 Sealed --- X-Fi Titanium HD - Presonus Eris E5 - JBL 4412
Power Supply Silverstone 1000W
Mouse Roccat Kain 122 AIMO
Keyboard KBD67 Lite / Mammoth75
VR HMD Reverb G2 V2
Software Win 11 Pro
At those prices, isnit cheaper for researchers to buy the actual systems.
Not really, the electricity costs, HVAC, and maintenance etc would surpass the price of the systems.
Having your own little data center is expensive.


Also forgot to mention the real estate.
 
Top