• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD EPYC "Bergamo" Uses 16-core Zen 4c CCDs, Barely 10% Larger than Regular Zen 4 CCDs

That would make ARM's Neoverse V1 the father of Zen 4c o_O
Agreed. It's a spiritual father of Bergamo. AMD found out a few years ago what Amazon was planning to develop with Graviton CPUs on Neoverse platforms, including highest performing Zeus. They realised that the only way to stay competitive in hyperscalers segment, while staying on x86, was to develop a new efficiency core with rebalanced performance, power consumption, size and lower cost.

Also, they wanted to create a versatile efficient core, but did not want a castrated, Atom-style core that Intel conceived for Alder Lake and Siera Forrest. Intel was so stubborn in this pursuit that they had to shut down AVX-512 instructions on client products. It was a price to pay for big-little choice and core inflation approach... Guess why Sapphire Rapids do not have e-cores? AVX-512 will have to come back in one form or another in client segment, but more importantly in cloud CPU with 144 e-cores. Will it work? We shall see next year.

The "riotous child", Dionysus core, is going to rock the boat. Dionysus is young, energetic and adventurous. The first iteration is 8 c-core CCX in 16 c-core CCD, but next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores. These decisions save a lot on design, manufacturing and packaging side, by using existing solutions with a few tweaks.

Bergamo will set a new pace in cloud server this year, but Turin dense will compete with Siera Forrest SKUs and next gen Grawiton, AmpereOne and Grace SKUs. It does look like AMD has got here the best of both worlds. They keep x86 core with full instructions, while deploying bespoke performance/watt efficiency against Atom and ARM cores.
 
You know, datacenter use cases is what's driving all of this. If you take cloud computing for example, you have a lot of different things going on. The goal is to maximize throughput and increasing the number of cores at the cost of some features that consume a lot of die space makes sense. If I consider the product that I oversee the engineering efforts for and how this relates to it, it's basically being able to squeeze more service VMs (think AWS ECS Fargate,) in a smaller area for tasks that most definitely don't need things like AVX-512. Most of the stuff the application does (being a SaaS product,) is integer workload heavy, so something like this would make a whole lot of sense if they start cutting things like vector extensions. The nice thing is that you could run it on either and just let the JVM intrinsics do its magic, but at the end of the day the business cares about 2 things, customer retention and cost of doing business.

With that said, I see a single CCD option being a really nice entry/budget option for servers. There is a whole lot to like here if you're working on software that'll be running on a server, but I honestly don't see AMD doing the hybrid thing. I could be wrong, but this looks like another move to placate to the server market, not to your ordinary consumer.
 
next year we will see even more adventurous Zen5 Turin dense, a unified 16 c-core CCX/CCD, packaged in a CPU with 12 CCDs, just like Genoa is, but with 192 c-cores.
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
 
This is for cloud workloads so it sacrifices clock speed for throughput. If it ever comes to the desktop, then its function would be analogous to Intel's E cores as Zen 4c should be slower than Zen 4 for single threaded or low threaded loads.
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
 
Memory bandwidth would hold the 192 cores back too much, unless AMD expands the 12-channel interface to 16 channels and/or introduces multiplexed ranks (MCR in Intel's speak).
AMD is the least worried about memory, considering they are the only vendor currently with 12 channels for cloud workloads and what other vendors would offer next year.
Intel - Sierra Forest could offer 768-bit bus (12 channels x64-bit), but on 144 e-cores
Apple - M2 Ultra offers 1024-bit bus (8 channels x128-bit), still below 1TB/s
ARM - Indian chip C-DAC AUM could offer up to 512-bit bus (16 channels x32-bit)
RISC-V - Tenstorrent CPU will max out at roughly 256-bit bus (8 channels x32-bit)

Turin dense could offer V cache too on -X SKUs, just like Genoa-X does, which brings above 1.1TB/s throughput
Plus, it will support CXL memory expanders, so customers could widen memory throughput as they please on 64 PCIe 5.0 lanes
 
I think you forgot to link to the source. While it's behind a paywall, the first part covering the physical design is free to read. It's an impressive feat of physical design.
TLDR:
  1. reducing the number of timing critical regions to just 4 from well over 10 in Zen 4 as seen in the diagram below: this sacrifices clock speed for density
  2. a new SRAM bitcell developed by TSMC for memories outside L2. As a 6T design, it saves area compared to the usual 8T designs
  3. lower clock speed target allows denser circuits
View attachment 299768


The source goes over this in considerable detail.
Great job by AMD. Now it's time to do it with the Ryzen laptops, I don't understand the OEMs having the best quality/price processor, the Zen 4 Phoenix does not come out more laptops
 
But 4c supports SMT, unlike Gracemont and from what I've read 4c would at worst be 30% slower than 4. 8850x with 8 Zen4 and 4/6 Zen 4c would be nice
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
 
Idly, I wonder if AMD will also bring SM4 to the enterprise sector (turning 1 16c into 64t monstrosity), or maybe capitalize more on their Xilinx division for some serious FPGA circuitry that can shift roles on the fly as-needed when-needed.
 
You're right about the low performance difference, but I meant that it would fulfill the same role as an E core for an OS. The regular Zen 4 cores would be preferred.
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
 
I was reading the Zen 4c cores will not prove problematic for the OS like e-cores, and will be treated the same as full fat cores. Same architecture just lower clocks/caches.
Scheduling a thread onto Zen4c instead of Zen 4 won't impact performance as much as scheduling a thread onto an E core instead of a P core in Intel's SOCs. However, Zen 4c will be slower than Zen 4 due to the lower clock speed; therefore, the OS will still prioritize Zen 4.
 
Back
Top