• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

DeepSeek R2 Leak Reveals 512 PetaFLOPS Push on Domestic AI Accelerator Infrastructure

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
3,066 (1.08/day)
DeepSeek, a company that took the AI world by storm with its R1 model, is preparing a new and reportedly much improved DeepSeek R2 model release, according to a well-known AI insider @iruletheworldmo on X. Powered by Huawei's Ascend 910B chip clusters, a possible Huawei Atlas 900, and DeepSeek's in-house distributed training framework, R2 pushes these accelerators to an impressive 82% utilization, translating to 512 PetaFLOPS of FP16 performance—half an exaFLOP in computing power. According to Huawei lab data, that's roughly 91% of what NVIDIA's older A100 clusters deliver, yet DeepSeek claims it cuts per-unit training costs by a remarkable 97.3%. Behind DeepSeek R2 is a carefully cultivated ecosystem of partners. Tuowei Information, a leading OEM in the Ascend family, manages over half of DeepSeek's supercomputing hardware orders, while Sugon provides liquid-cooled server racks capable of handling up to 40 kW per unit. To keep power consumption in check, Innolight's silicon-photonics transceivers shave off another 35% compared to traditional solutions.

Geographically, operations are split across major hubs: Runjian Shares runs the South China supercomputing center under contracts exceeding ¥5 billion annually, and Zhongbei Communications maintains a 1,500-PetaFLOP reserve in the Northwest for peak demands. On the software side, DeepSeek R2 already supports private deployment and fine-tuning, powering smart-city initiatives in 15 provinces through the Yun Sai Zhilian platform. North China's node, overseen by Hongbo Shares' Yingbo Digital, adds another 3,000 PetaFLOPS to the mix. If computing power is scarce, Huawei is ready to deploy its CloudMatrix 384 system, which is positioned as a domestic alternative to NVIDIA's GB200 NVL72. It features 384 Ascend 910C accelerators to achieve 1.7× the overall petaFLOPS and 3.6× the total HBM capacity of the NVL72 cluster—yet it lags significantly in per-chip performance and consumes nearly four times more power. Nonetheless, the R2 model launch is expected to come smoothly online, and we are waiting for the official launch and benchmarks to see its performance.



View at TechPowerUp Main Site | Source
 
R2 pushes these accelerators to an impressive 82% utilization
That's really impressive.
To put into perspective, most other GPU clusters for ML training usually achieve utilization in the range of 20~40%
 
Back
Top