Tuesday, August 30th 2022

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.

Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.
The Zen 4 CCD is slightly smaller than the Zen 3 CCD despite the higher transistor-counts, thanks to the switch to 5 nm (TSMC N5 process). The CCD measures 70 mm², in comparison to the 83 mm² "Zen 3" CCD. The transistor-count of the "Zen 4" CCD is 6.57 billion, a whopping 58 percent increase from that of the "Zen 3" CCD and its 4.15 billion transistor-count.

The cIOD (client I/O die) sees a big chunk of innovation. It's built on the 6 nm (TSMC N6) node, which is a big leap from the GlobalFoundries 12 nm node that the cIOD of Ryzen 5000 series processors were made on. It also incorporates certain power-management features from the Ryzen 6000 "Rembrandt" processors. This cIOD packs an iGPU based on the RDNA2 graphics architecture, besides the DDR5 memory controllers, and a PCI-Express Gen 5 root complex. The new 6 nm cIOD measures 124.7 mm², compared to the slightly larger 124.9 mm² cIOD of the Ryzen 5000 series.

The "Raphael" multi-chip module has one CCD for the 6-core and 8-core SKUs, and two CCDs for the 12-core and 16-core SKUs. "Raphael" is built in the Socket AM5 package. AMD is rumored to be readying a thin BGA package of "Raphael" for high-performance notebook platforms, which it's codenamed "Dragon Range." These processors will come in various 45 W, 55 W, and 65 W TDP points, powering high-end gaming notebooks.
Sources: Chiakokhua (Twitter), Skyjuice (Twitter), Skyjuice (Angstronomics)
Add your own comment

41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

#1
Gungar
58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.
Posted on Reply
#2
ModEl4
Didn't Mark Papermaster said also that Zen4c core is around half the area of Zen4 core? (4nm vs 5nm comparison) Also that the architecture is optimised for lower frequency and with higher efficiency vs Zen4.
I'm really curious what gaming performance difference Zen4c will have vs Zen3+
Although AMD slides are showing Strix Point with Zen5 cores my first thought was Zen5c due to mobile segment.
Posted on Reply
#3
Wirko
btarunrthe company has enlarged the micro-op cache of the core from 4 KB to 6.75 KB
It's probably the number of entries, not bytes. Same goes for BTB.
It would be interesting to know the sizes of various internal data structures in bytes/bits, though.
Posted on Reply
#4
Daven
ModEl4Didn't Mark Papermaster said also that Zen4c core is around half the area of Zen4 core? (4nm vs 5nm comparison) Also that the architecture is optimised for lower frequency and with higher efficiency vs Zen4.
I'm really curious what gaming performance difference Zen4c will have vs Zen3+
Although AMD slides are showing Strix Point with Zen5 cores my first thought was Zen5c due to mobile segment.
Zen 4c is Epyc only. Not meant for gaming at all. Its for cloud instances. Not everything made by humans is for gaming.
Posted on Reply
#5
Oberon
Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.
A significant chunk of that went into implementing support for AVX-512. In workloads that actually make use of those transistors, the performance increase will average many times that 13% number.
Posted on Reply
#6
HD64G
OberonA significant chunk of that went into implementing support for AVX-512. In workloads that actually make use of those transistors, the performance increase will average many times that 13% number.
Indeed, they showed 2,5X performance vs Zen3 in such a case.
Posted on Reply
#7
LuxZg
Well, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die
Posted on Reply
#8
Oberon
LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die
At least you can use those transistors when necessary, unlike with Alder Lake...

You also have to remember that part of AMD's product strategy is to reuse the same CCD across the majority of their desktop and server SKUs, so there will be some tradeoffs for value in one segment or another (and server/HPC gets priority since it brings in more money.)
Posted on Reply
#9
ncrs
LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die
Don't forget that the primary focus of AMD is not the desktop, but server markets. Sharing one chiplet design between many markets was always a strength of Zen.
For servers/workstaitons AVX-512 is a welcome addition. It will be interesting to see if laptop Zen4 will keep it as well.
That silicon most likely is also usable for non-AVX-512 tasks due to register renaming/reuse and similar modern CPU optimizations.
Posted on Reply
#10
Niarod
So integrated graphics is confirmed? I didn't hear anyone from AMD mention it during the presentation..
Posted on Reply
#11
The_Enigma
LuxZgWell, too bad that AVX-512 won't get much use. IMHO spending 58% area for 13% IPC in average across 99% workloads, and justifying it with AVX-512 that will be used in <1% of workloads doesn't seem sensible. I'd rather if we got 58% more cores at same price points and no AVX. Imagine - 10 core for 299$, 12 core at 399$, 20 cores at 549, and 24 cores at 699$. Would open space for 199$ 6-core and 259$ 8-core. Nah, AVX-512 isn't worth 58% die
Avx512 is more than just increasing register width for a new instruction though. It brings tons of improvements to the entire AVX lineup of instructions and can accelerate AVX and AVX2 even more with no clock speed penalty to those older instructions using the newer features.

Additionally, it greatly speeds up emulation. Not just PS2 emulator, but also ARM emulation. Someone posted on another site that the new features in Avx512 allow them to cut the instructions needed to do certain parts of the ARM emulation down anywhere from 5-10x

So it does have good uses even today, uses you probably use some of without realizing it, but it also sets this gen as a baseline for support going into the future. As time goes on more software will make use of them and that has to have hardware support at some point getting to the masses to drive the software adoption.
Posted on Reply
#12
LuxZg
Yeah, you're both right, I disregarded server market... Still seems sad ror us consumers :-( Now I wonder if Zen4c is actually that lart without AVX-512..
Posted on Reply
#13
ncrs
LuxZgYeah, you're both right, I disregarded server market... Still seems sad ror us consumers :-( Now I wonder if Zen4c is actually that lart without AVX-512..
Rumor has it that it's just limited cache sizes. I don't think it would make sense to cut AVX-512 from a part specifically tailored to cloud vendors.
Posted on Reply
#14
Punkenjoy
Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.
AVX indeed increase the number of transistor, but there are many others needs. There is also the law of diminishing returns that every cpu vendor have to fight. You have to throw more and more transistors at a problem to increase performance. But anyway, there was never a 1 to 1 match.

Also, AMD probably added a bunch of stage to increase the clock frequency.

It also maybe look worst on paper because of the CCD. On a monolithic CPU, the uncore parts grow way slower so in the end it hide the real transistors growth of the cores and caches.

Like people said, at least AMD can use AVX512. It's quite stupid that Intel ship it with but it have to be disabled because of the damn e-cores. IF at least the E-cores could run AVX-512 codes but slower, that would make more sense. I wonder how those CPU would perform if they were able to use the area used by AVX512 for something else.
Posted on Reply
#15
Oberon
I would bet my life that Zen 4c is extension-compatible with Zen 4 (to avoid ADL-like issues.)
Posted on Reply
#16
TheLostSwede
News Editor
NiarodSo integrated graphics is confirmed? I didn't hear anyone from AMD mention it during the presentation..
Yes it is, it's on the AMD spec pages.
Posted on Reply
#17
AnotherReader
Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.
Performance isn't solely about IPC. They have increased clocks by 13% too. That is a 27% performance increase. Pollack's rule states that performance for a single core increases by the square root of its proportional area. Square root of 1.58 is 1.26 which is very close to the performance increase from 7950X to the 5950X. To summarize, the extra transistors probably went to:
  1. AVX-512
  2. increased clock speeds
  3. larger front-end
  4. larger L2
I wouldn't rule out future IPC increases; Zen 4 is an incremental update from Zen 3, reusing the same microarchitecture. If they run out of steam for Zen 5, then we can say that they are running out of tricks.
Posted on Reply
#18
DemonicRyzen666
Slow avx 512, dual load 256 avx, again amd didn't learn their lesson with see4.1 and avx on bulldozer. There is a ton of problems with that implementation.
Posted on Reply
#19
Eternalightwithin
Would it have been better to implement a full size AVX512 and combine 2 AVX256 instructions? Or separate modules for 512 and 256. That would increase die size more though
Posted on Reply
#20
defaultluser
EternalightwithinWould it have been better to implement a full size AVX512 and combine 2 AVX256 instructions? Or separate modules for 512 and 256. That would increase die size more though
This will seed software devs to continue supporting Intel's now dead vector arch; it will have to wait for a future die rev before AMD doubles execution width agan (remember zen 2?)
Posted on Reply
#21
ModEl4
DavenZen 4c is Epyc only.
So we watched the same event?
DavenNot meant for gaming at all. Its for cloud instances. Not everything made by humans is for gaming.
wtf is this?
Did i say it was designed for gaming or that everything made by humans is for gaming?
Goldmont core (Gemini Lake) or Tremont core (Jasper Lake) for example was also not designed for gaming and many people that were interested for low cost platforms was curious about this lower core performance vs regular skylake.
Posted on Reply
#22
thegnome
Hopefully for Zen 5 L3 will get a boost again wtihout having to rely on V-Cache (they are still seperate for Zen 5), and having 3 chiplets. 1 chiplet models for budget like the current 6 and 8 core, but also a second Zen 5c chiplet for 4/8/16 e-cores (like on intel) at the midrange, with the full dual ccd models also having the Zen 5c chiplet. Would make AMD much more competitive in terms of core count again without having to raise the regular CCD's core count.
Posted on Reply
#23
defaultluser
thegnomeHopefully for Zen 5 L3 will get a boost again wtihout having to rely on V-Cache (they are still seperate for Zen 5), and having 3 chiplets. 1 chiplet models for budget like the current 6 and 8 core, but also a second Zen 5c chiplet for 4/8/16 e-cores (like on intel) at the midrange, with the full dual ccd models also having the Zen 5c chiplet. Would make AMD much more competitive in terms of core count again without having to raise the regular CCD's core count.
they need to bump core count if they want to compete with whatever succeeds raptor lake - maybe they can handle 12 cores per chiplet?
Posted on Reply
#24
thegnome
defaultluserthey need to bump core count if they want to compete with whatever succeeds raptor lake - maybe they can handle 12 cores per chiplet?
True, but I wouldn't mind seeing it either way. I suppose they could always just chonk a third regular ccd on there.
Posted on Reply
#25
Dirt Chip
Gungar58% transistors increase for 13% IPC increase. They are having a hard time increasing IPC.
IGP and AVX takes it tall.
It will get better in zen4+ and zen5 for sure
Posted on Reply
Add your own comment
May 13th, 2024 09:23 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts