• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Meta's Grand Teton Brings NVIDIA Hopper to Its Data Centers

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,264 (0.92/day)
Meta today announced its next-generation AI platform, Grand Teton, including NVIDIA's collaboration on design. Compared to the company's previous generation Zion EX platform, the Grand Teton system packs in more memory, network bandwidth and compute capacity, said Alexis Bjorlin, vice president of Meta Infrastructure Hardware, at the 2022 OCP Global Summit, an Open Compute Project conference.

AI models are used extensively across Facebook for services such as news feed, content recommendations and hate-speech identification, among many other applications. "We're excited to showcase this newest family member here at the summit," Bjorlin said in prepared remarks for the conference, adding her thanks to NVIDIA for its deep collaboration on Grand Teton's design and continued support of OCP.



Designed for Data Center Scale
Named after the 13,000-foot mountain that crowns one of Wyoming's two national parks, Grand Teton uses NVIDIA H100 Tensor Core GPUs to train and run AI models that are rapidly growing in their size and capabilities, requiring greater compute.

The NVIDIA Hopper architecture, on which the H100 is based, includes a Transformer Engine to accelerate work on these neural networks, which are often called foundation models because they can address an expanding set of applications from natural language processing to healthcare, robotics and more.

The NVIDIA H100 is designed for performance as well as energy efficiency. H100-accelerated servers, when connected with NVIDIA networking across thousands of servers in hyperscale data centers, can be 300x more energy efficient than CPU-only servers.

"NVIDIA Hopper GPUs are built for solving the world's tough challenges, delivering accelerated computing with greater energy efficiency and improved performance, while adding scale and lowering costs," said Ian Buck, vice president of hyperscale and high performance computing at NVIDIA. "With Meta sharing the H100-powered Grand Teton platform, system builders around the world will soon have access to an open design for hyperscale data center compute infrastructure to supercharge AI across industries."

Mountain of a Machine
Grand Teton sports 2x the network bandwidth and 4x the bandwidth between host processors and GPU accelerators compared to Meta's prior Zion system, Meta said.

The added network bandwidth enables Meta to create larger clusters of systems for training AI models, Bjorlin said. It also packs more memory than Zion to store and run larger AI models.

Simplified Deployment, Increased Reliability
Packing all these capabilities into one integrated server "dramatically simplifies deployment of systems, allowing us to install and provision our fleet much more rapidly, and increase reliability," said Bjorlin.

View at TechPowerUp Main Site
 
Joined
Oct 19, 2022
Messages
19 (0.03/day)
Location
Sweden
Processor Ryzen 5 5600
Motherboard MSI B350M Mortar
Memory 2x8 Gb DDR4 HyperX Black 2133 @ 3200 CL16 + 2x8Gb Corsair DDR4 Vengeance @ 3200 CL16
Video Card(s) Asus Dual Radeon RX 6700 XT
Storage 1x Crucial P2 512Gb, 1x WD "old" Blue 1Tb 7.2k, 1x Seagate ST2000 2Tb 7.2k
Display(s) AOC 24G2 144Hz
Case Fractal Pop Air
Mouse Kone Aimo modded with JPN switches
Keyboard Logitech G 413 or Steelseries 6Gv2 depending on the mood
I just registered for this comment (but I've read TPU for ages)

Do they even check their projects naming?!? :eek:
 
Low quality post by Tartaros
Joined
Oct 10, 2009
Messages
786 (0.15/day)
Location
Madrid, Spain
System Name Rectangulote
Processor Core I9-9900KF
Motherboard Asus TUF Z390M
Cooling Alphacool Eisbaer Aurora 280 + Eisblock RTX 3090 RE + 2 x 240 ST30
Memory 32 GB DDR4 3600mhz CL16 Crucial Ballistix
Video Card(s) KFA2 RTX 3090 SG
Storage WD Blue 3D 2TB + 2 x WD Black SN750 1TB
Display(s) 2 x Asus ROG Swift PG278QR / Samsung Q60R
Case Corsair 5000D Airflow
Audio Device(s) Evga Nu Audio + Sennheiser HD599SE + Trust GTX 258
Power Supply Corsair RMX850
Mouse Razer Naga Wireless Pro / Logitech MX Master
Keyboard Keychron K4 / Dierya DK61 Pro
Software Windows 11 Pro
I just registered for this comment (but I've read TPU for ages)

Do they even check their projects naming?!? :eek:
Hey guys, let's name our big AI project "Grand Big Ass Tit".
 
Joined
Oct 17, 2021
Messages
807 (0.85/day)
Location
People's Republic of Banania
Processor Threadripper 3955WX
Motherboard M12SWA-TF
Cooling Arctic Freezer 4U SP3
Memory G.Skill Trident Z DDR4-3733 (2x8GB)
Video Card(s) 5700XT + 3x RX 590
Storage A lot
Display(s) ViewSonic G225fB
Case Corsair 760T
Audio Device(s) Sound Blaster Z SE
Power Supply be quiet! DPP12 1500W
Keyboard IBM F122
Software 10 LTSC
#3 was hidden but that's unironically what it means in Spanish.
 
Joined
Feb 6, 2020
Messages
195 (0.12/day)
Location
O-Town, USA
System Name Regular PC | Server HP Z440
Processor 9700k | E5-2698v3
Motherboard Gigabyte Z390 Gaming X-CF | Stock mobo
Cooling Scythe Mugen 5 rev. B | Stock HS
Memory 32 GB (8x4) | 112 GB (8x2 + 16x6)
Video Card(s) RTX 2070 Super | K4000
Storage 970 EVO+ 1TB | 860 1TB x2
Display(s) XV340CK x2, 1080p x2
Power Supply Corsair RM750x | Corsair RM750e
Software Windows | Proxmox 7
I just registered for this comment (but I've read TPU for ages)

Do they even check their projects naming?!? :eek:
It appears their projects are after national parks in the west. There's Zion national park and the Grand Teton national park. And yes, Grand Teton in its native language is kind of hilarious for the younger folks.
 
Joined
Oct 19, 2022
Messages
19 (0.03/day)
Location
Sweden
Processor Ryzen 5 5600
Motherboard MSI B350M Mortar
Memory 2x8 Gb DDR4 HyperX Black 2133 @ 3200 CL16 + 2x8Gb Corsair DDR4 Vengeance @ 3200 CL16
Video Card(s) Asus Dual Radeon RX 6700 XT
Storage 1x Crucial P2 512Gb, 1x WD "old" Blue 1Tb 7.2k, 1x Seagate ST2000 2Tb 7.2k
Display(s) AOC 24G2 144Hz
Case Fractal Pop Air
Mouse Kone Aimo modded with JPN switches
Keyboard Logitech G 413 or Steelseries 6Gv2 depending on the mood
It appears their projects are after national parks in the west. There's Zion national park and the Grand Teton national park. And yes, Grand Teton in its native language is kind of hilarious for the younger folks.

I didn't notice that it was mentioned in the article, shame on me... :shadedshu:

So I went reading about it, very interesting.

This said, back to the news in itself, I hope they won't meet the same difficulties as AMD's exascale. Increased complexity often leads to unexpected stability and poor reliability. Any idead on how many H100 will be used?
 
Joined
Oct 27, 2009
Messages
1,134 (0.21/day)
Location
Republic of Texas
System Name [H]arbringer
Processor 4x 61XX ES @3.5Ghz (48cores)
Motherboard SM GL
Cooling 3x xspc rx360, rx240, 4x DT G34 snipers, D5 pump.
Memory 16x gskill DDR3 1600 cas6 2gb
Video Card(s) blah bigadv folder no gfx needed
Storage 32GB Sammy SSD
Display(s) headless
Case Xigmatek Elysium (whats left of it)
Audio Device(s) yawn
Power Supply Antec 1200w HCP
Software Ubuntu 10.10
Benchmark Scores http://valid.canardpc.com/show_oc.php?id=1780855 http://www.hwbot.org/submission/2158678 http://ww
I didn't notice that it was mentioned in the article, shame on me... :shadedshu:

So I went reading about it, very interesting.

This said, back to the news in itself, I hope they won't meet the same difficulties as AMD's exascale. Increased complexity often leads to unexpected stability and poor reliability. Any idead on how many H100 will be used?
They aren't deploying them in a supercomputer fashion but rather spread across their datacenters as pools of gpu compute.
A lot of AMD's problem is software ecosystem as well as using Slingshot rather than Mellanox.... Mellanox is super great at tuning scale out networks...
 
Top