Huawei CloudMatrix 384 System Outperforms NVIDIA GB200 NVL72

AleksandarK · Apr 16, 2025

Huawei announced its CloudMatrix 384 system super node, which the company touts as its own domestic alternative to NVIDIA's GB200 NVL72 system, with more overall system performance but worse per-chip performance and higher power consumption. While NVIDIA's GB200 NVL72 uses 36 Grace CPUs paired with 72 "Blackwell" GB200 GPUs, the Huawei CloudMatrix 384 system employs 384 Huawei Ascend 910C accelerators to beat NVIDIA's GB200 NVL72 system. It takes roughly five times more Ascend 910C accelerators to deliver nearly twice the GB200 NVL system performance, which is not good on per-accelerator bias, but excellent on per-system level of deployment. SemiAnalysis argues that Huawei is a generation behind in chip performance but ahead of NVIDIA in scale-up system design and deployment.

When you look at individual chips, NVIDIA's GB200 NVL72 clearly outshines Huawei's Ascend 910C, delivering over three times the BF16 performance (2,500 TeraFLOPS vs. 780 TeraFLOPS), more on‑chip memory (192 GB vs. 128 GB), and faster bandwidth (8 TB/s vs. 3.2 TB/s). In other words, NVIDIA has the raw power and efficiency advantage at the chip level. But flip the switch to the system level, and Huawei's CloudMatrix CM384 takes the lead. It cranks out 1.7× the overall PetaFLOPS, packs in 3.6× more total HBM capacity, and supports over five times the number of GPUs and the associated bandwidth of NVIDIA's NVL72 cluster. However, that scalability does come with a trade‑off, as Huawei's setup draws nearly four times more total power. A single GB200 NVL72 draws 145 kW of power, while a single Huawei CloudMatrix 384 draws ~560 kW. So, NVIDIA is your go-to if you need peak efficiency in a single GPU. If you're building a massive AI supercluster where total throughput and interconnect speed matter most, Huawei's solution actually makes a lot of sense. Thanks to its all-to-all topology, Huawei has delivered an AI training and inference system worth purchasing. When SMIC, the maker of Huawei's chips, gets to a more advanced manufacturing node, the efficiency of these systems will also increase.

View at TechPowerUp Main Site | Source

kondamin · Apr 16, 2025

now I’m wondering how those things would do in terms of folding and other workloads that actually mean something

the54thvoid · Apr 16, 2025

For the layperson, this makes little sense (and all sense). If system A uses more parts and more energy but produces more output than a smaller more efficient system B, you'd ask: can't system B just be doubled up? I'm guessing the system itself is a holistic unit, and it can't be made to interconnect (without penalty) to another 'sister' unit?

john_ · Apr 16, 2025

AleksandarK said:
It takes roughly five times more Ascend 910C accelerators to deliver nearly twice the GB200 NVL system performance, which is not good on per-accelerator bias, but excellent on per-system level of deployment.

That says nothing without power consumption and price per chip.

AleksandarK said:
However, that scalability does come with a trade‑off, as Huawei's setup draws nearly four times more total power. A single GB200 NVL72 draws 145 kW of power, while a single Huawei CloudMatrix 384 draws ~560 kW.

And here is where Nvidia is ahead. But is this efficiency advantage because of architecture or because of node advantage?

AleksandarK said:
So, NVIDIA is your go-to if you need peak efficiency in a single GPU.

So, only Nvidia and Huawei exist in the AI market. No AMD, no Intel, no Broadcom, no Google, nobody else.

remekra · Apr 16, 2025

So it doesn't really outperform it but CCP needs something to brag about so they spin it that if you just use more of Huawei chips it will eventually perform better. Amazing. Now add more Nvidia chips and see what happens.

AleksandarK · Apr 16, 2025

the54thvoid said:
For the layperson, this makes little sense (and all sense). If system A uses more parts and more energy but produces more output than a smaller more efficient system B, you'd ask: can't system B just be doubled up? I'm guessing the system itself is a holistic unit, and it can't be made to interconnect (without penalty) to another 'sister' unit?

We are past scaling from nodes, we are at the point where systems are the unit of computing. So delivering a better system == better solution. If you have lots of bandwidth and enough compute, a higher power consumptionis nothing to worry about. China can absorb electricity requirements far better than US can. Read the SemiAnalysis source, its very interesting.

john_ said:
No AMD, no Intel, no Broadcom, no Google, nobody else.

Google TPU is a league of its own. AMD doesn't have an equivalent to the NVL72 system, yet, IIRC.

the54thvoid · Apr 16, 2025

remekra said:
So it doesn't really outperform it but CCP needs something to brag about so they spin it that if you just use more of Huawei chips it will eventually perform better. Amazing. Now add more Nvidia chips and see what happens.

Clearly you missed the point (like me). See reply #6, especially:

we are at the point where systems are the unit of computing. So delivering a better system == better solution

remekra · Apr 16, 2025

the54thvoid said:
Clearly you missed the point (like me). See reply #6, especially:

Yes I know about what AleksandarK wrote, thanks to China not really caring about current eco craze they can just burn more coal (which they actively do) and get the electricity needed for the power hungry Huawei system and scale it up as needed. I still wouldn't call that Huawei win over Nvidia or call it a better system.

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	3 desktop systems: Gaming / Internet / HTPC
Processor	Ryzen 5 7600 / Ryzen 5 4600G / Ryzen 5 5500
Motherboard	X670E Gaming Plus WiFi / MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2)
Cooling	Snowman / Segotep T4 / Νoctua U12S
Memory	Kingston FURY Beast 32GB DDR5 6000 / 16GB JUHOR / 32GB G.Skill RIPJAWS 3600 + Aegis 3200
Video Card(s)	ASRock RX 6600+GTX 1660 / Vega 7 integrated / Radeon RX 580+GTX 1050
Storage	NVMes, ONLY NVMes / NVMes, SATA Storage / NVMe, SATA, external storage
Display(s)	Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) / 19'' HP monitor + BlitzWolf BW-V5
Case	Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s)	onboard
Power Supply	Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software	Windows 10 / Windows 10&Windows 11 / Windows 10

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

Huawei CloudMatrix 384 System Outperforms NVIDIA GB200 NVL72

AleksandarK

News Editor

kondamin

the54thvoid

Super Intoxicated Moderator

john_

remekra

AleksandarK

News Editor

the54thvoid

Super Intoxicated Moderator

remekra