• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Cerebras Systems' Wafer Scale Engine is a Trillion Transistor Processor in a 12" Wafer

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.33/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
This news isn't properly today's, but it's relevant and interesting enough that I think warrants a news piece on our page. My reasoning is this: in an era where Multi-Chip Modules (MCM) and a chiplet approach to processor fabrication has become a de-facto standard for improving performance and yields, a trillion-transistor processor that eschews those modular design philosophies is interesting enough to give pause.

The Wafer Scale engine has been developed by Cerebras Systems to face the ongoing increase in demand for AI-training engines. However, in workloads where latency occur a very real impact in training times and a system's capability, Cerebras wanted to design a processor that avoided the need for a communication lane for all its cores to communicate - the system is only limited, basically, by transistors' switching times. Its 400,000 cores communicate seamlessly via interconnects, etched on 42,225 square millimeters of silicon (by comparison, NVIDIA's largest GPU is 56.7 times smaller at "just" 815 square millimeters).





However, in a world where silicon wafer manufacturing still has occurrences of manufacturing defects that can render whole chips inoperative, how did Cerebras manage to build such a large processor and keep it from having such defects that it can't actually deliver on the reported specs and performance? The answer is an old one, mainly: redundancy, paired with some additional magical engineering powders achieved in conjunction with the chips' manufacturer, TSMC. The chip is built on TSMC's 16 nm node - a more refined process with proven yields, cheaper than a cutting-edge 7 nm process, and with less areal density - this would make it even more difficult to properly cool those 400,000 cores, as you may imagine.

Cross-reticle connectivity, yield, power delivery, and packaging improvements have all been researched and deployed by Cerebras in solving the scaling problems associated with such large chips. moreover, the chips is built with redundant features that should ensure that even if some defects arise in various parts of the silicon chip, the areas that have been designed as "overprovisioning" can cut in an pick up the slack, routing and processing data without skipping a beat. Cerebras says any given component (cores, SRAM, etc) of the chip features 1%, 1.5% of additional overprovisioning capability that enables any manufacturing defects to be just a negligible speedbump instead of a silicon-waster.



The inter-core communication solution is one of the most advanced ever seen, with a fine-grained, all-hardware, on-chip mesh-connected communication network dubbed Swarm that delivers an aggregate bandwidth of 100 petabits per second.. this is paired with 18 Gb of local, distributed, superfast SRAM memory as the one and only level of the memory hierarchy - delivering memory bandwidth in the realm of 9 petabytes per second.

The 400,000 cores are custom-designed for AI workload acceleration. Named SLAC for Sparse Linear Algebra Cores, these are flexible, programmable, and optimized for the sparse linear algebra that underpins all neural network computation (think of these as FPGA-like, programmable arrays of cores). SLAC's programmability ensures cores can run all neural network algorithms in the constantly changing machine learning field - this is a chip that can adapt to different workloads and AI-related problem solving and training - a requirement for such expensive deployments as the Wafer Scale Engine will surely pose.



The entire chip and its accompanying deployment apparatus had to be developed in-house. As founder and CEO Andrew Feldman puts it, there were no packaging, printed circuit boards, connectors, cold plates, tools or any software that could be adapted towards the manufacturing and deployment of the Wafer Scale Engine. This means that Cerebras Systems' and its team of 173 engineers had to develop not only the chip, but almost everything else that is needed to make sure it actually works. The Wafer Scale Engine consumes 15 kilowatts of power to operate - a prodigious amount of power for an individual chip, although relatively comparable to a modern-sized AI cluster. This is a cluster, in essence, but deployed in a solo chip with none of the latency and inter-chip communication hassles that plague clusters.

In an era where companies are looking towards chiplet design and inter-chip communication solutions as ways to tackle the increasing challenges of manufacturing density and decreasing yields, Cerebras' effort proves that there is still a way of developing monolithic chips that place performance above all other considerations.

View at TechPowerUp Main Site
 
Joined
Jan 8, 2017
Messages
8,944 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Impressive but still, putting these things in the same category with other monolithic GPUs and CPUs is a stretch.
 
Joined
Dec 15, 2006
Messages
1,703 (0.27/day)
Location
Oshkosh, WI
System Name ChoreBoy
Processor 8700k Delided
Motherboard Gigabyte Z390 Master
Cooling 420mm Custom Loop
Memory CMK16GX4M2B3000C15 2x8GB @ 3000Mhz
Video Card(s) EVGA 1080 SC
Storage 1TB SX8200, 250GB 850 EVO, 250GB Barracuda
Display(s) Pixio PX329 and Dell E228WFP
Case Fractal R6
Audio Device(s) On-Board
Power Supply 1000w Corsair
Software Win 10 Pro
Benchmark Scores A million on everything....
Can it play Crysis?
 

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,232 (0.91/day)
Truly impressive.

I do wonder how will system integration work, however. The chip is quite large and integrating something like that on a PCB would be difficult. Also, expansion of the chip is quite possible due to the huge amount of heat. Can't wait to see how will they solve those problems
 
Joined
Jul 21, 2018
Messages
773 (0.37/day)
Location
Germany
System Name FATTYDOVE-R-SPEC
Processor Intel i9 10980XE
Motherboard EVGA X299 Dark
Cooling Water (1x 240mm, 1x 280mm, 1x 420mm + 2x Mo-Ra 360 external radiator)
Memory 64GB DDR4
Video Card(s) RTX 2080 Super / RTX 3090
Storage Crucial MX500
Display(s) 24", 1440p, freesync, 144hz
Case Open Benchtable (OBT)
Audio Device(s) beyerdynamic MMX 300
Power Supply EVGA Supernova T2 1600W
Mouse OG steelseries Sensei
Keyboard steelseries 6Gv2
Software Windows 10
Truly impressive.

I do wonder how will system integration work, however. The chip is quite large and integrating something like that on a PCB would be difficult. Also, expansion of the chip is quite possible due to the huge amount of heat. Can't wait to see how will they solve those problems
From what I have read they are already in use and they had to make power delivery with vertical copper planes because a flat pcb can not support the current within thermal specs. The cooling comes from several, also vertical high pressure water streams.
 
Joined
Mar 13, 2012
Messages
277 (0.06/day)
This is truly an advancement, managing to do something everyone has been trying to crack since dawn of wafer manufacturing.

And it is not a simple solution either since they not only had to solve the problem at hand but also design new advanced tools and software to actually pull it off.

They also already manufactured wafers and are ready to introduce their manufacturing process to the world.

Often when you hear about new stuff like this it is only a working theory on the drawing board with 10-15 years work before final product.

15 kilowatt is a little hot BUT imagine this tech on 5nm in the future with 3 kilowatt.

Bet they already working in 3D stacking these monsters
 
Last edited:
Joined
Sep 10, 2015
Messages
498 (0.16/day)
System Name My Addiction
Processor AMD Ryzen 7950X3D
Motherboard ASRock B650E PG-ITX WiFi
Cooling Alphacool Core Ocean T38 AIO 240mm
Memory G.Skill 32GB 6000MHz
Video Card(s) Sapphire Pulse 7900XTX
Storage Some SSDs
Display(s) 42" Samsung TV + 22" Dell monitor vertically
Case Lian Li A4-H2O
Audio Device(s) Denon + Bose
Power Supply Corsair SF750
Mouse Logitech
Keyboard Glorious
VR HMD None
Software Win 10
Benchmark Scores None taken
Funny thing is, tha cooling of this chip will be the easyer part. Since this is a totally custom solution, they just integrate whatever cooling solution they want into the package. Let it be water or gas. I would do it with a gass solution with compressor and an option to use the excess heat-energy to actually heat the building.
 
Joined
Sep 17, 2014
Messages
20,949 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Very impressive
 
Joined
Nov 1, 2018
Messages
583 (0.29/day)
And so, Skynet was born.

A bit bigger in size than what we've seen on the Big Screen, but give it time and it will fit in a T-800's head.
 
Joined
Sep 6, 2013
Messages
2,978 (0.77/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / Segotep T4 / Snowman M-T6
Memory 16GB G.Skill RIPJAWS 3600 / 16GB G.Skill Aegis 3200 / 16GB Kingston 2400MHz (DDR3)
Video Card(s) ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage NVMes, NVMes everywhere / NVMes, more NVMes / Various storage, SATA SSD mostly
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / Sharkoon Rebel 9 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / Coolermaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / Coolermaster Devastator / Logitech
Software Windows 10 / Windows 10 / Windows 7
There are so many companies creating chips for AI, that I wonder if Nvidia really has a future in this with GPUs, because GPUs are not specifically made for AI. I don't mean a 2-3 years future, but 5-10 years.
 
Joined
Sep 10, 2015
Messages
498 (0.16/day)
System Name My Addiction
Processor AMD Ryzen 7950X3D
Motherboard ASRock B650E PG-ITX WiFi
Cooling Alphacool Core Ocean T38 AIO 240mm
Memory G.Skill 32GB 6000MHz
Video Card(s) Sapphire Pulse 7900XTX
Storage Some SSDs
Display(s) 42" Samsung TV + 22" Dell monitor vertically
Case Lian Li A4-H2O
Audio Device(s) Denon + Bose
Power Supply Corsair SF750
Mouse Logitech
Keyboard Glorious
VR HMD None
Software Win 10
Benchmark Scores None taken
And so, Skynet was born.

A bit bigger in size than what we've seen on the Big Screen, but give it time and it will fit in a T-800's head.

Skynet is not fitting in anything, because it's not a hardware. You can't actually see the Skynet, all the movies featuring merely the instruments it can controll.

By the story of the 3rd episode, the problem happens when Skynet is "geting out" to the internet, gaining a huge amount of compute power by "infecting" all connected devices and becoming self-conscious.
 
Joined
Jan 8, 2017
Messages
8,944 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Nvidia really has a future in this with GPUs, because GPUs are not specifically made for AI. I don't mean a 2-3 years future, but 5-10 years.

Nvidia is already prototyping their own dedicated AI chips. That ought to answer your question.

 
Last edited:
Joined
Dec 15, 2006
Messages
1,703 (0.27/day)
Location
Oshkosh, WI
System Name ChoreBoy
Processor 8700k Delided
Motherboard Gigabyte Z390 Master
Cooling 420mm Custom Loop
Memory CMK16GX4M2B3000C15 2x8GB @ 3000Mhz
Video Card(s) EVGA 1080 SC
Storage 1TB SX8200, 250GB 850 EVO, 250GB Barracuda
Display(s) Pixio PX329 and Dell E228WFP
Case Fractal R6
Audio Device(s) On-Board
Power Supply 1000w Corsair
Software Win 10 Pro
Benchmark Scores A million on everything....
Joined
Nov 13, 2007
Messages
10,234 (1.70/day)
Location
Austin Texas
Processor 13700KF Undervolted @ 5.6/ 5.5, 4.8Ghz Ring 200W PL1
Motherboard MSI 690-I PRO
Cooling Thermalright Peerless Assassin 120 w/ Arctic P12 Fans
Memory 48 GB DDR5 7600 MHZ CL36
Video Card(s) RTX 4090 FE
Storage 2x 2TB WDC SN850, 1TB Samsung 960 prr
Display(s) Alienware 32" 4k 240hz OLED
Case SLIGER S620
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse Xlite V2
Keyboard RoyalAxe
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
how does one feed data to such a monster...

interested to see how they will provide the bandwidth this needs in order to process data at capacity.
 
Joined
Mar 23, 2016
Messages
4,839 (1.64/day)
Processor Ryzen 9 5900X
Motherboard MSI B450 Tomahawk ATX
Cooling Cooler Master Hyper 212 Black Edition
Memory VENGEANCE LPX 2 x 16GB DDR4-3600 C18 OCed 3800
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500 GB, 870 QVO 1 TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S (PH-EC416PS)
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse SteelSeries Rival 310
Keyboard Logitech G G413 Silver
Software Windows 10 Professional 64-bit v22H2
how does one feed data to such a monster...

interested to see how they will provide the bandwidth this needs in order to process data at capacity.
The enourmous bandwidth to feed the cores stays on die.
this is paired with 18 Gb of local, distributed, superfast SRAM memory as the one and only level of the memory hierarchy - delivering memory bandwidth in the realm of 9 petabytes per second.
 
Joined
Nov 13, 2007
Messages
10,234 (1.70/day)
Location
Austin Texas
Processor 13700KF Undervolted @ 5.6/ 5.5, 4.8Ghz Ring 200W PL1
Motherboard MSI 690-I PRO
Cooling Thermalright Peerless Assassin 120 w/ Arctic P12 Fans
Memory 48 GB DDR5 7600 MHZ CL36
Video Card(s) RTX 4090 FE
Storage 2x 2TB WDC SN850, 1TB Samsung 960 prr
Display(s) Alienware 32" 4k 240hz OLED
Case SLIGER S620
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse Xlite V2
Keyboard RoyalAxe
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
The enourmous bandwidth to feed the cores stays on die.

But how do you feed the die? Once it's in the die it's fine... but at 9 petabytes per second and only 18GB - something is gotta connect to it. Would be interesting to see what that is.
 
Joined
Mar 23, 2012
Messages
777 (0.18/day)
Location
Norway
System Name Games/internet/usage
Processor I7 5820k 4.2 Ghz
Motherboard ASUS X99-A2
Cooling custom water loop for cpu and gpu
Memory 16GiB Crucial Ballistix Sport 2666 MHz
Video Card(s) Radeon Rx 6800 XT
Storage Samsung XP941 500 GB + 1 TB SSD
Display(s) Dell 3008WFP
Case Caselabs Magnum M8
Audio Device(s) Shiit Modi 2 Uber -> Matrix m-stage -> HD650
Power Supply beQuiet dark power pro 1200W
Mouse Logitech MX518
Keyboard Corsair K95 RGB
Software Win 10 Pro
But how do you feed the die? Once it's in the die it's fine... but at 9 petabytes per second and only 18GB - something is gotta connect to it. Would be interesting to see what that is.
Remember that the 9 petabyte is internally on the die.

At the moment AI research may be done in a GPU with 8 GiB to 24 GiB Ram, the complete dataset might not fit in the GPU ram, so it will be done in batches.
The same way the data sets might be loaded into the internal 18 GiB memory for the new beast.

To compare a Radeon VII, it has 3840 shading units, and 1 TB/s memory access to its 16 GiB on-board Ram. This new chip has basically moved all that onto one chip, with 9 000 x the access speed and 100 x the number of cores.
A modern day GPU doing AI would be feed by the PCIe bus, a gen 4 at 16 x would be capable of 128 GB/s, since this is a basic data dump (from system memory if you wish to sustain that speed for all of the 16 GB to the GPU) it requires little to no computation and approximately 125 ms of write time.

The same way, to fill the 18 GiB of on-board memory could be accomplished in less than 5 seconds from a PCIe x 4 gen 4 NVME drive. If your computation takes 20 minutes that is not the big problem.
 
Joined
Feb 18, 2012
Messages
2,715 (0.61/day)
System Name MSI GP76
Processor intel i7 11800h
Cooling 2 laptop fans
Memory 32gb of 3000mhz DDR4
Video Card(s) Nvidia 3070
Storage x2 PNY 8tb cs2130 m.2 SSD--16tb of space
Display(s) 17.3" IPS 1920x1080 240Hz
Power Supply 280w laptop power supply
Mouse Logitech m705
Keyboard laptop keyboard
Software lots of movies and Windows 10 with win 7 shell
Benchmark Scores Good enough for me
Can it run a Prius?
 
Joined
Nov 4, 2005
Messages
11,691 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs and over 10TB spinning
Display(s) 56" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
This is the new future of computing, all on a single die, I'm sure a lot of those transistors are fast math accelerated paths. A few of these and we will have AI that is closer to human than supercomputing.
 
Top