Tuesday, April 9th 2024

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

Apr 9th, 2024 13:59 Discuss (14 Comments)

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

Sources: Intel, Tom's Hardware

Add your own comment

14 Comments on Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

Space Lynx

Astronaut

Shares a production line on TSMC? lol bad move Intel. Nvidia already bought all the production time from TSMC. Vaporware.

thesmokingman

Pat is begging Wallstreet to believe... lmao.

ScaLibBDP

Simply to note: Intel evaluates performance of its latest hardware with already outdated line of NVIDIA H100 accelerators.

Minus Infinity

Hardly a single person in the AI field believes Intel can be trusted to support the hardware long-term and then there is also the question of their software SYCL no one uses. ROCm on the other hand is well liked. Nearly all analysts says only AMD is is competitor to Nvidia.

Scrizz

Minus InfinityHardly a single person in the AI field believes Intel can be trusted to support the hardware long-term and then there is also the question of their software SYCL no one uses. ROCm on the other hand is well liked. Nearly all analysts says only AMD is is competitor to Nvidia.

Analysts lmao

Tomorrow

Space LynxShares a production line on TSMC? lol bad move Intel. Nvidia already bought all the production time from TSMC. Vaporware.

Intel will use 5nm. Nvidia Blackwell will use N4P.

stimpy88

If only there was an open-source alternative to CUDA. nGreedia would be back to "for the gamers" in a heartbeat!

Space Lynx

Astronaut

TomorrowIntel will use 5nm. Nvidia Blackwell will use N4P.

won't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.

Tomorrow

Space Lynxwon't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.

Both nodes are confirmed and tho they are both technically "5nm class" then i doubt that Intel will have capacity issues at TSMC.
I mean it also depends on demand. I doubt that the demand for Intel's products will rise that sharply.

Actually Intel using 5nm is a sign that they have not managed to secure more advanced nodes from TSMC. Nvidia already confirmed that they will use N4P and i suspect AMD will the same or similar. A top tier AI product going into volume production in Q3 2024 is generally expected to be made on 4nm or even 3nm already, not 5nm.

#10

Soul_

Space Lynxwon't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.

And you think Intel just started their supply chain activities? Sold out to who? I think the answer is in your question itself.

#11

Fouquin

People don't understand how important those efficiency numbers are over H100. Right now it's not always a matter of how fast the core is, it's a matter of how many can be running at once. Even if Intel's chip was slower, if they offered more performance per watt than NVIDIA or AMD they would be a better buy. In the US at least we are literally hitting the limit of how many of these massive AI clusters we can have operating. A single state's power grid can only sustain maybe 100,000 H100 systems before capacity is exceeded. This creates an incredibly massive network bottleneck as all these clusters have to be sharing the load across great distances to train new models. If Intel comes in and says, "Hey, we can put 180,000 units into your maximum power budget with higher performance," that's a big win for them.

The other factor is availability, which we have seen referenced a few times now in regards to NVIDIA. NVIDIA has been delaying orders, withholding systems, or outright limiting purchase quantities with many of their clients that can't wait for the long lead times on getting H100 systems in hand and running. Intel and AMD are in prime position to take those clients from NVIDIA, and if Intel shows they can outright beat the H100s that these clients have already likely been trying to buy, and can offer shorter delivery windows, they become the defacto choice for those clients.

#12

Minus Infinity

ScrizzAnalysts lmao

As in people working in the field. You butt hurt Intel isn't a player?

#13

Scrizz

Minus InfinityAs in people working in the field. You butt hurt Intel isn't a player?

I like how you turn this into a personal attack. I know what an analyst is... something you don't based on your reply of "As in people working in the field."
I never mentioned Intel and don't have a horse in the race. I merely laughed at using analysts as a source of truth. Analysts often times have a vested interest in steering people's opinion (and money) in certain directions.

#14

Ahhzz

This is not the thread, the section, or the forums for personal attacks. Keep it aimed at the topic and not each other. Only warning.

Add your own comment

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

14 Comments on Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

Related News

14 Comments on Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts