DeepSeek R2 Leak Reveals 512 PetaFLOPS Push on Domestic AI Accelerator Infrastructure

AleksandarK · Apr 26, 2025

DeepSeek, a company that took the AI world by storm with its R1 model, is preparing a new and reportedly much improved DeepSeek R2 model release, according to a well-known AI insider @iruletheworldmo on X. Powered by Huawei's Ascend 910B chip clusters, a possible Huawei Atlas 900, and DeepSeek's in-house distributed training framework, R2 pushes these accelerators to an impressive 82% utilization, translating to 512 PetaFLOPS of FP16 performance—half an exaFLOP in computing power. According to Huawei lab data, that's roughly 91% of what NVIDIA's older A100 clusters deliver, yet DeepSeek claims it cuts per-unit training costs by a remarkable 97.3%. Behind DeepSeek R2 is a carefully cultivated ecosystem of partners. Tuowei Information, a leading OEM in the Ascend family, manages over half of DeepSeek's supercomputing hardware orders, while Sugon provides liquid-cooled server racks capable of handling up to 40 kW per unit. To keep power consumption in check, Innolight's silicon-photonics transceivers shave off another 35% compared to traditional solutions.

Geographically, operations are split across major hubs: Runjian Shares runs the South China supercomputing center under contracts exceeding ¥5 billion annually, and Zhongbei Communications maintains a 1,500-PetaFLOP reserve in the Northwest for peak demands. On the software side, DeepSeek R2 already supports private deployment and fine-tuning, powering smart-city initiatives in 15 provinces through the Yun Sai Zhilian platform. North China's node, overseen by Hongbo Shares' Yingbo Digital, adds another 3,000 PetaFLOPS to the mix. If computing power is scarce, Huawei is ready to deploy its CloudMatrix 384 system, which is positioned as a domestic alternative to NVIDIA's GB200 NVL72. It features 384 Ascend 910C accelerators to achieve 1.7× the overall petaFLOPS and 3.6× the total HBM capacity of the NVL72 cluster—yet it lags significantly in per-chip performance and consumes nearly four times more power. Nonetheless, the R2 model launch is expected to come smoothly online, and we are waiting for the official launch and benchmarks to see its performance.

View at TechPowerUp Main Site | Source

igormp · Apr 27, 2025

AleksandarK said:
R2 pushes these accelerators to an impressive 82% utilization

That's really impressive.
To put into perspective, most other GPU clusters for ML training usually achieve utilization in the range of 20~40%

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

DeepSeek R2 Leak Reveals 512 PetaFLOPS Push on Domestic AI Accelerator Infrastructure

AleksandarK

News Editor

igormp