• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

Joined
Aug 25, 2021
Messages
1,074 (1.06/day)
That would make ARM's Neoverse V1 the father of Zen 4c o_O
Agreed. It's a spiritual father of Bergamo. AMD found out a few years ago what Amazon was planning to develop with Graviton CPUs on Neoverse platforms, including highest performing Zeus. They realised that the only way to stay competitive in hyperscalers segment, while staying on x86, was to develop a new efficiency core with rebalanced performance, power consumption, size and lower cost.

Also, they wanted to create a versatile efficient core, but did not want a castrated, Atom-style core that Intel conceived for Alder Lake and Siera Forrest. Intel was so stubborn in this pursuit that they had to shut down AVX-512 instructions on client products. It was a price to pay for big-little choice and core inflation approach... Guess why Sapphire Rapids do not have e-cores? AVX-512 will have to come back in one form or another in client segment, but more importantly in cloud CPU with 144 e-cores. Will it work? We shall see next year.

The "riotous child", Dionysus core, is going to rock the boat. Dionysus is young, energetic and adventurous. The first iteration is 8 c-core CCX in 16 c-core CCD, but next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores. These decisions save a lot on design, manufacturing and packaging side, by using existing solutions with a few tweaks.

Bergamo will set a new pace in cloud server this year, but Turin dense will compete with Siera Forrest SKUs and next gen Grawiton, AmpereOne and Grace SKUs. It does look like AMD has got here the best of both worlds. They keep x86 core with full instructions, while deploying bespoke performance/watt efficiency against Atom and ARM cores.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,148 (2.91/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
You know, datacenter use cases is what's driving all of this. If you take cloud computing for example, you have a lot of different things going on. The goal is to maximize throughput and increasing the number of cores at the cost of some features that consume a lot of die space makes sense. If I consider the product that I oversee the engineering efforts for and how this relates to it, it's basically being able to squeeze more service VMs (think AWS ECS Fargate,) in a smaller area for tasks that most definitely don't need things like AVX-512. Most of the stuff the application does (being a SaaS product,) is integer workload heavy, so something like this would make a whole lot of sense if they start cutting things like vector extensions. The nice thing is that you could run it on either and just let the JVM intrinsics do its magic, but at the end of the day the business cares about 2 things, customer retention and cost of doing business.

With that said, I see a single CCD option being a really nice entry/budget option for servers. There is a whole lot to like here if you're working on software that'll be running on a server, but I honestly don't see AMD doing the hybrid thing. I could be wrong, but this looks like another move to placate to the server market, not to your ordinary consumer.
 
Joined
Jan 3, 2021
Messages
2,823 (2.26/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores.
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
 
Joined
May 3, 2018
Messages
2,406 (1.08/day)
This is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
 
Joined
Aug 25, 2021
Messages
1,074 (1.06/day)
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
AMD is the least worried about memory, considering they are the only vendor currently with 12 channels for cloud workloads and what other vendors would offer next year.
Intel - Sierra Forest could offer 768-bit bus (12 channels x64-bit), but on 144 e-cores
Apple - M2 Ultra offers 1024-bit bus (8 channels x128-bit), still below 1TB/s
ARM - Indian chip C-DAC AUM could offer up to 512-bit bus (16 channels x32-bit)
RISC-V - Tenstorrent CPU will max out at roughly 256-bit bus (8 channels x32-bit)

Turin dense could offer V cache too on -X SKUs, just like Genoa-X does, which brings above 1.1TB/s throughput
Plus, it will support CXL memory expanders, so customers could widen memory throughput as they please on 64 PCIe 5.0 lanes
 
Joined
Apr 7, 2023
Messages
60 (0.14/day)
I think you forgot to link to the source. While it's behind a paywall, the first part covering the physical design is free to read. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits
View attachment 299768


The source goes over this in considerable detail.
Great job by AMD. Now it's time to do it with the Ryzen laptops, I don't understand the OEMs having the best quality/price processor, the Zen 4 Phoenix does not come out more laptops
 
Joined
Nov 26, 2021
Messages
1,372 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
 
Joined
Jul 7, 2019
Messages
861 (0.48/day)
Idly, I wonder if AMD will also bring SM4 to the enterprise sector (turning 1 16c into 64t monstrosity), or maybe capitalize more on their Xilinx division for some serious FPGA circuitry that can shift roles on the fly as-needed when-needed.
 
Joined
May 3, 2018
Messages
2,406 (1.08/day)
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
 
Joined
Nov 26, 2021
Messages
1,372 (1.49/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
Scheduling a thread onto Zen4c instead of Zen 4 won't impact performance as much as scheduling a thread onto an E core instead of a P core in Intel's SOCs. However, Zen 4c will be slower than Zen 4 due to the lower clock speed; therefore, the OS will still prioritize Zen 4.
 
Top