Tuesday, August 30th 2022

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.

Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.
The Zen 4 CCD is slightly smaller than the Zen 3 CCD despite the higher transistor-counts, thanks to the switch to 5 nm (TSMC N5 process). The CCD measures 70 mm², in comparison to the 83 mm² "Zen 3" CCD. The transistor-count of the "Zen 4" CCD is 6.57 billion, a whopping 58 percent increase from that of the "Zen 3" CCD and its 4.15 billion transistor-count.

The cIOD (client I/O die) sees a big chunk of innovation. It's built on the 6 nm (TSMC N6) node, which is a big leap from the GlobalFoundries 12 nm node that the cIOD of Ryzen 5000 series processors were made on. It also incorporates certain power-management features from the Ryzen 6000 "Rembrandt" processors. This cIOD packs an iGPU based on the RDNA2 graphics architecture, besides the DDR5 memory controllers, and a PCI-Express Gen 5 root complex. The new 6 nm cIOD measures 124.7 mm², compared to the slightly larger 124.9 mm² cIOD of the Ryzen 5000 series.

The "Raphael" multi-chip module has one CCD for the 6-core and 8-core SKUs, and two CCDs for the 12-core and 16-core SKUs. "Raphael" is built in the Socket AM5 package. AMD is rumored to be readying a thin BGA package of "Raphael" for high-performance notebook platforms, which it's codenamed "Dragon Range." These processors will come in various 45 W, 55 W, and 65 W TDP points, powering high-end gaming notebooks.
Sources: Chiakokhua (Twitter), Skyjuice (Twitter), Skyjuice (Angstronomics)
Add your own comment

41 Comments on AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

#26
Slesreth
NiarodSo integrated graphics is confirmed? I didn't hear anyone from AMD mention it during the presentation..
TheLostSwedeYes it is, it's on the AMD spec pages.
The same specs are listed on all four processors.
[INDENT][/INDENT]
[INDENT]Graphics Capabilities[/INDENT]
[INDENT]Graphics Model: AMD Radeon™ Graphics[/INDENT]
[INDENT]Graphics Core Count: 2[/INDENT]
[INDENT]Graphics Frequency: 2200 MHz[/INDENT]
[INDENT]GPU Base: 400 MHz[/INDENT]
Posted on Reply
#27
defaultluser
SlesrethThe same specs are listed on all four processors.
[INDENT][/INDENT]
[INDENT]Graphics Capabilities[/INDENT]
[INDENT]Graphics Model: AMD Radeon™ Graphics[/INDENT]
[INDENT]Graphics Core Count: 2[/INDENT]
[INDENT]Graphics Frequency: 2200 MHz[/INDENT]
[INDENT]GPU Base: 400 MHz[/INDENT]
does anyone have a comparison versus 5700g yet? will that ALSO HAVE TO WAIT UNTIL THE END OF THE MONTH?
Posted on Reply
#28
Count von Schwalbe
I read somewhere that APU's (monolithic?) Will still be a separate part of the product stack. Also, the IGP of the main series is essentially for display output, not 3D performance.

Based in the info above, it should have around 560 GFLOPS of compute.

Vega 11, in the 5700G, has 2.048 TFLOPS. Expect around 1/4 the performance.

Obviously, the move to DDR5 could offset the reduction slightly, but it is best to wait for the APU's.
Posted on Reply
#29
Minus Infinity
DavenZen 4c is Epyc only. Not meant for gaming at all. Its for cloud instances. Not everything made by humans is for gaming.
Possibly, but Zen 5c is reportedly coming to the new hybrid Zen 5 cpu's as the .little cores. I see no reason why AMD couldn't offer Zen 4c on desktop. It's basically a stripped down Zen 4 core, less cache etc and in insiders say it's performance is around 10-30% less.
Posted on Reply
#30
Sabotaged_Enigma
Official website says it's 71 mm² Zen 4 chiplet and 122 mm² I/O die.
Posted on Reply
#31
HD64G
If Zen5 has both Zen5+Zen4c cores in the same CPU, it could be a great combo for workstations (both desktops and notebooks) but not of any use to the average PC user. So, let's focus to Zen4 atm.
Posted on Reply
#32
LuxZg
Dirt ChipIGP and AVX takes it tall.
It will get better in zen4+ and zen5 for sure
IGP is not part of compute die, it's in I/O die.
Posted on Reply
#33
Wirko
HD64GSo, let's focus to Zen4 atm.
That will be harder than it seems. In this thread I can see that many TPU members have a sharp focus on distant future.
Posted on Reply
#34
TheoneandonlyMrK
defaultluserdoes anyone have a comparison versus 5700g yet? will that ALSO HAVE TO WAIT UNTIL THE END OF THE MONTH?
This 2 unit igp is for office use, and work offload, the 5700G would trounce it, that's like comparing a 6400X gpu to a 5600G
Posted on Reply
#35
marios15
People arguing about 58% transistor increase on a 8(EIGHT CORE) die, then complaining about that translating into 13% IPC, which is measured on 1(ONE) core.
EDIT: logic fail
If you take 58% divide by 8 gives 7,25%
So 7% transistor increase per core for 13% IPC per core increase?
That's 186% gain per transistor increase
.


Didn't they say ~50% multi-core perf increase over last gen at same power? Sounds pretty reasonable if you include the clockspeed improvements, double AVX execution, increased cache.
Posted on Reply
#36
LuxZg
marios15People arguing about 58% transistor increase on a 8(EIGHT CORE) die, then complaining about that translating into 13% IPC, which is measured on 1(ONE) core.
If you take 58% divide by 8 gives 7,25%
So 7% transistor increase per core for 13% IPC per core increase?
That's 186% gain per transistor increase.

Didn't they say ~50% multi-core perf increase over last gen at same power? Sounds pretty reasonable if you include the clockspeed improvements, double AVX execution, increased cache.
Your math and logic skills are lacking. If 8 core vs 8 core uses 58% more transistors, then 1 vs 1 core still uses 58% more transistors.

But you are somewhat correct that we shouldn't look at IPC vs transistors, because transistors also enable higher clocks, and possibly better multithreaded perf, so we should look at that ~50% overall performance boost.

That's still just half the truth because it's not on same production node, so part of frequency uptake is also due to 5nm.

So 58% transistors probably equate to ~30% performance.

But it certainly doesn't work the way you suggest, 58/8=7.25 is correct, but doing so with % is incorrect.
Posted on Reply
#37
marios15
LuxZgYour math and logic skills are lacking. If 8 core vs 8 core uses 58% more transistors, then 1 vs 1 core still uses 58% more transistors.

But you are somewhat correct that we shouldn't look at IPC vs transistors, because transistors also enable higher clocks, and possibly better multithreaded perf, so we should look at that ~50% overall performance boost.

That's still just half the truth because it's not on same production node, so part of frequency uptake is also due to 5nm.

So 58% transistors probably equate to ~30% performance.

But it certainly doesn't work the way you suggest, 58/8=7.25 is correct, but doing so with % is incorrect.
Yeah I missed a logic step there.

The process node helps but iirc the design needs to be changed to actually take advantage of the new node VF curve, otherwise you only gain density improvements
Posted on Reply
#38
DemonicRyzen666
DemonicRyzen666Slow avx 512, dual load 256 avx, again amd didn't learn their lesson with see4.1 and avx on bulldozer. There is a ton of problems with that implementation.
I see I was right
It is slow in CinebenchR23 because of this desgin choice. Although I remeber a lontime ago Intel's AVX has the ablity to push AVX's up from the old to highest newest version in the cpu to improve speed, athough it was also in that same pdf I read that it would take SSE4.1 to AVX. Mysterly after I asked a question abou that to a fameous intel Engineer the PDF disappered off from view.
Posted on Reply
#39
Oberon
I don't know if you know this (actually, I'm quite confident that you don't), but Cinebench doesn't make use of AVX-512, so its implementation in Zen 4 has no direct impact on the architecture's performance in that benchmark.
Posted on Reply
#40
marios15
DemonicRyzen666I see I was right
It is slow in CinebenchR23 because of this desgin choice. Although I remeber a lontime ago Intel's AVX has the ablity to push AVX's up from the old to highest newest version in the cpu to improve speed, athough it was also in that same pdf I read that it would take SSE4.1 to AVX. Mysterly after I asked a question abou that to a fameous intel Engineer the PDF disappered off from view.
It's not completely slower...
www.mersenneforum.org/showthread.php?p=614191
Posted on Reply
Add your own comment
May 23rd, 2024 17:26 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts