AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

Tek-Check · Jun 7, 2023

AnotherReader said:
That would make ARM's Neoverse V1 the father of Zen 4c

Agreed. It's a spiritual father of Bergamo. AMD found out a few years ago what Amazon was planning to develop with Graviton CPUs on Neoverse platforms, including highest performing Zeus. They realised that the only way to stay competitive in hyperscalers segment, while staying on x86, was to develop a new efficiency core with rebalanced performance, power consumption, size and lower cost.

Also, they wanted to create a versatile efficient core, but did not want a castrated, Atom-style core that Intel conceived for Alder Lake and Siera Forrest. Intel was so stubborn in this pursuit that they had to shut down AVX-512 instructions on client products. It was a price to pay for big-little choice and core inflation approach... Guess why Sapphire Rapids do not have e-cores? AVX-512 will have to come back in one form or another in client segment, but more importantly in cloud CPU with 144 e-cores. Will it work? We shall see next year.

The "riotous child", Dionysus core, is going to rock the boat. Dionysus is young, energetic and adventurous. The first iteration is 8 c-core CCX in 16 c-core CCD, but next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores. These decisions save a lot on design, manufacturing and packaging side, by using existing solutions with a few tweaks.

Bergamo will set a new pace in cloud server this year, but Turin dense will compete with Siera Forrest SKUs and next gen Grawiton, AmpereOne and Grace SKUs. It does look like AMD has got here the best of both worlds. They keep x86 core with full instructions, while deploying bespoke performance/watt efficiency against Atom and ARM cores.

Aquinus · Jun 8, 2023

You know, datacenter use cases is what's driving all of this. If you take cloud computing for example, you have a lot of different things going on. The goal is to maximize throughput and increasing the number of cores at the cost of some features that consume a lot of die space makes sense. If I consider the product that I oversee the engineering efforts for and how this relates to it, it's basically being able to squeeze more service VMs (think AWS ECS Fargate,) in a smaller area for tasks that most definitely don't need things like AVX-512. Most of the stuff the application does (being a SaaS product,) is integer workload heavy, so something like this would make a whole lot of sense if they start cutting things like vector extensions. The nice thing is that you could run it on either and just let the JVM intrinsics do its magic, but at the end of the day the business cares about 2 things, customer retention and cost of doing business.

With that said, I see a single CCD option being a really nice entry/budget option for servers. There is a whole lot to like here if you're working on software that'll be running on a server, but I honestly don't see AMD doing the hybrid thing. I could be wrong, but this looks like another move to placate to the server market, not to your ordinary consumer.

Wirko · Jun 8, 2023

Tek-Check said:
next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores.

Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).

Minus Infinity · Jun 8, 2023

AnotherReader said:
This is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.

But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice

Tek-Check · Jun 8, 2023

Wirko said:
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).

AMD is the least worried about memory, considering they are the only vendor currently with 12 channels for cloud workloads and what other vendors would offer next year.
Intel - Sierra Forest could offer 768-bit bus (12 channels x64-bit), but on 144 e-cores
Apple - M2 Ultra offers 1024-bit bus (8 channels x128-bit), still below 1TB/s
ARM - Indian chip C-DAC AUM could offer up to 512-bit bus (16 channels x32-bit)
RISC-V - Tenstorrent CPU will max out at roughly 256-bit bus (8 channels x32-bit)

Turin dense could offer V cache too on -X SKUs, just like Genoa-X does, which brings above 1.1TB/s throughput
Plus, it will support CXL memory expanders, so customers could widen memory throughput as they please on 64 PCIe 5.0 lanes

david salsero · Jun 8, 2023

AnotherReader said:
I think you forgot to link to the source. While it's behind a paywall, the first part covering the physical design is free to read. It's an impressive feat of physical design.
TLDR:

reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density

a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs

lower clock speed target allows denser circuits

View attachment 299768

The source goes over this in considerable detail.

Great job by AMD. Now it's time to do it with the Ryzen laptops, I don't understand the OEMs having the best quality/price processor, the Zen 4 Phoenix does not come out more laptops

AnotherReader · Jun 8, 2023

Minus Infinity said:
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice

You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.

TechLurker · Jun 9, 2023

Idly, I wonder if AMD will also bring SM4 to the enterprise sector (turning 1 16c into 64t monstrosity), or maybe capitalize more on their Xilinx division for some serious FPGA circuitry that can shift roles on the fly as-needed when-needed.

Minus Infinity · Jun 9, 2023

AnotherReader said:
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.

I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.

AnotherReader · Jun 9, 2023

Minus Infinity said:
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.

Scheduling a thread onto Zen4c instead of Zen 4 won't impact performance as much as scheduling a thread onto an E core instead of a P core in Intel's SOCs. However, Zen 4c will be slower than Zen 4 due to the lower clock speed; therefore, the OS will still prioritize Zen 4.

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.3.1

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

Resident Wat-man