• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Doubles L3 Cache Per CCX with Zen 2 "Rome"

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,277 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
A SiSoft SANDRA results database entry for a 2P AMD "Rome" EPYC machine sheds light on the lower cache hierarchy. Each 64-core EPYC "Rome" processor is made up of eight 7 nm 8-core "Zen 2" CPU chiplets, which converge at a 14 nm I/O controller die, which handles memory and PCIe connectivity of the processor. The result mentions cache hierarchy, with 512 KB dedicated L2 cache per core, and "16 x 16 MB L3." Like CPU-Z, SANDRA has the ability to see L3 cache by arrangement. For the Ryzen 7 2700X, it reads the L3 cache as "2 x 8 MB L3," corresponding to the per-CCX L3 cache amount of 8 MB.

For each 64-core "Rome" processor, there are a total of 8 chiplets. With SANDRA detecting "16 x 16 MB L3" for 64-core "Rome," it becomes highly likely that each of the 8-core chiplets features two 16 MB L3 cache slices, and that its 8 cores are split into two quad-core CCX units with 16 MB L3 cache, each. This doubling in L3 cache per CCX could help the processors cushion data transfers between the chiplet and the I/O die better. This becomes particularly important since the I/O die controls memory with its monolithic 8-channel DDR4 memory controller.



View at TechPowerUp Main Site
 
Joined
Jun 19, 2010
Messages
397 (0.08/day)
Location
Germany (Euregio)
Processor Ryzen 5600X
Video Card(s) RTX 3050
Software Win11
In fact, it shows something diffrent.
A single-die contains 2 Quad-Core CCX, like Zen1 and Zen1+ before.
Those are showing 2x 8MB L3-Cache in SiSoft Sandra, today. (1700, 1700X, 1800X, 2700, 2700X)

This leads to Zen2 having 2x 16MB L3-Cache per die/chiplet,
SiSoft shows the 2P hint before the CPU-Name, but thereafter the Single-CPU-Stats,
resulting in 2P, each 64 Cores and 16x 16MB (256MB L3) (2x 16MB per 8-Core-Chiplet)
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,277 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
In fact, it shows something diffrent.
A single-die contains 2 Quad-Core CCX, like Zen1 and Zen1+ before.
Those are showing 2x 8MB L3-Cache in SiSoft Sandra, today. (1700, 1700X, 1800X, 2700, 2700X)

This leads to Zen2 having 2x 16MB L3-Cache per die/chiplet,
SiSoft shows the 2P hint before the CPU-Name, but thereafter the Single-CPU-Stats,
resulting in 2P, each 64 Cores and 16x 16MB (256MB L3) (2x 16MB per 8-Core-Chiplet)

You are absolutely correct. I've revised the article.
 
Joined
Sep 2, 2015
Messages
90 (0.03/day)
Location
Nova Scotia
System Name Old Old Old.
Processor AMD X2 5200+ 2.6Ghz @5665+ 2.83GHz 1.4v
Motherboard ASUS M2NPV-VM
Memory Corsair XMS2 PC6400 Dual channel 1GBx2 CL5-5-5-15-20 @ 944MHz DDR2
Video Card(s) ATI Radeon 2600XT 256MB core@857MHz ram@1179MHz GDDR4
Storage alot of 'em
Display(s) ASUS 23" VC239H 1920x1080 IPS 5ms
Audio Device(s) Diamond 5.1
Power Supply Enermax Liberty 400w dual rail
Mouse Logitech MX518
Keyboard Logitech G11
In fact, it shows something diffrent.
A single-die contains 2 Quad-Core CCX, like Zen1 and Zen1+ before.
Those are showing 2x 8MB L3-Cache in SiSoft Sandra, today. (1700, 1700X, 1800X, 2700, 2700X)

This leads to Zen2 having 2x 16MB L3-Cache per die/chiplet,
SiSoft shows the 2P hint before the CPU-Name, but thereafter the Single-CPU-Stats,
resulting in 2P, each 64 Cores and 16x 16MB (256MB L3) (2x 16MB per 8-Core-Chiplet)
Thats interesting, i see what you mean, with only 16 clusters for 64 cores. You are right.
 
Joined
Jul 3, 2018
Messages
847 (0.40/day)
Location
Haswell, USA
System Name Bruh
Processor 10700K 5.3Ghz 1.35v| i7 7920HQ 3.6Ghz -180Mv |
Motherboard Z490 TUF Wifi | Apple QMS180 |
Cooling EVGA 360MM | Laptop HS |
Memory DDR4 32GB 3600Mhz CL16 | LPDDR3 16GB 2133Mhz CL20 |
Video Card(s) Asus ROG Strix 3080 (2100Mhz/18Ghz)|Radeon Pro 560 (1150Mhz/1655Mhz)|
Storage Many SSDs, ~24TB HDD/8TB SSD
Display(s) S2719DGF, HP Z27i, Z24n| 1800P 15.4" + ZR30W + iPad Pro 10.5 2017
Case NR600 | MBP 2017 15" Silver | MSI GE62VR | Elite 120 Advanced
Audio Device(s) Lol imagine caring about audio
Power Supply 850GQ | Apple 87W USB-C |
Mouse Whatever I have on hand + trackpads (Lanchead TE)
Keyboard HyperX Origins Alloy idk
Software W10 20H2|W10 1903 LTSC/MacOS 11
Benchmark Scores No.
Imagine replacing half the CPU cores with GPU cores, you could fit a RX 570 in there.
 

mumar1

New Member
Joined
Jun 20, 2018
Messages
3 (0.00/day)
Taking the architectural changes on Zen 2 that are already confirmed by AMD into account I would not bet 1 € on SiSoft SANDRA being able to detect the correct cache-configuration.
 
Joined
Sep 17, 2014
Messages
20,780 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Joined
Apr 8, 2008
Messages
328 (0.06/day)
Imagine replacing half the CPU cores with GPU cores, you could fit a RX 570 in there.

You can't put such a powerful GPU without worrying about memory bandwidth. In theory you could use HBM there also but you need to manage space for it also like using smaller IO die. But again packaging will be a big issue here specially with the silicon interposer and different Z-Heights.
 
Joined
Dec 28, 2012
Messages
3,475 (0.85/day)
System Name Skunkworks
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software openSUSE tumbleweed/Mint 21.2
Imagine replacing half the CPU cores with GPU cores, you could fit a RX 570 in there.
You already have that, its called the 2400G, and its already bandwidth limited.
 
Joined
Jun 12, 2017
Messages
136 (0.05/day)
So Rome comes with 256MB L3 cache? not the previously rumored 128MB L3?

This is getting increasingly interesting, in the aspect of cache hierarchy. 256MB L3 = definitely no L4 as LLC on the IO chip, because IO chip definitely is not large enough to cram in 512MB L4. So how will they arrange and manage these L3 cache?
 
Last edited:
Joined
Aug 23, 2013
Messages
453 (0.12/day)
There are some speculation reagrding the double l3 cash. It's possible that the IO die has a duplication of the L3 and SiSoftSandra doesn't read it correctly. This is maybe to keep latency low when a cores that need something from L3 on different chiplet will only make one hop to the IO, not 2 hops without it.
 
Joined
Feb 12, 2015
Messages
1,104 (0.33/day)
There are some speculation reagrding the double l3 cash. It's possible that the IO die has a duplication of the L3 and SiSoftSandra doesn't read it correctly. This is maybe to keep latency low when a cores that need something from L3 on different chiplet will only make one hop to the IO, not 2 hops without it.

Yes, that is a speculated explanation made by a few Techtubers and rumor guys a week or two ago.

In fact I also remember prevalent rumors that AMD has completely done away with their current NUMA design, and yet this new architecture is supposed to gain IPC while arguably spreading out the resources more than before. This puzzled me until now.

Doubling the L3 cache might allow them to uniformly design all of their product lines (AM4, TR, EPYC) in a manner that effectively works in the same way (as opposed to now where AM4 uses only one die, but TR and EPYC use multiple dies). The exciting prospect of this is that no longer would their be ANY need to "localize" memory for certain games that only use 4 cores, and Threadripper would have the same gaming IPC as AM4 chips. It would just work.
 
Last edited:
Joined
Jun 19, 2010
Messages
397 (0.08/day)
Location
Germany (Euregio)
Processor Ryzen 5600X
Video Card(s) RTX 3050
Software Win11
Of course Sisoft could be mistaking by staticly dividing every four Cores, yes.
But i think the OS should get correct reports about the segmentation etc., so its unlikely but in deed possible.

So there is little chance that every Chiplet is One big 8-Core CCX with 32MB L3-Cache.

@btarunr "you are very welcome"
 

Nkd

Joined
Sep 15, 2007
Messages
364 (0.06/day)
You are absolutely correct. I've revised the article.

I saw this on reddit last week. Other theory is Adored tv might have been on to something. It could be 8core ccx and IO die has copy of L3 cache to improve latency. He mentioned that in his video, that would make sense too. He said it would make too much sense to reduce latency between cores and IO die being massive. Its either what adore was saying in his video or 4 core ccx a massive IO chip. But I am honestly leaning towards 8 core ccx with copy of l3 caceh on IO Die. The die seems massive and there has to be something going on there.

There are some speculation reagrding the double l3 cash. It's possible that the IO die has a duplication of the L3 and SiSoftSandra doesn't read it correctly. This is maybe to keep latency low when a cores that need something from L3 on different chiplet will only make one hop to the IO, not 2 hops without it.
Yep! I am leaning on this, Adored guy was mentioning the same thing. He said it made so much sense given IO die is so massive and they can improve latency by copying l3 cache. The guy knows his shit about chips, he said he was speculating but it made too much sense.

Of course Sisoft could be mistaking by staticly dividing every four Cores, yes.
But i think the OS should get correct reports about the segmentation etc., so its unlikely but in deed possible.

So there is little chance that every Chiplet is One big 8-Core CCX with 32MB L3-Cache.

@btarunr "you are very welcome"

It doesn't have to be. THe massive IO die could basically have a copy of each l3 cache to reduce latency and sandra might just be reading it wrong.
 
Last edited:
Joined
Feb 13, 2012
Messages
522 (0.12/day)
It doesn't have to be. THe massive IO die could basically have a copy of each l3 cache to reduce latency and sandra might just be reading it wrong.


I was just thinking this. Who knows if any of the L3 cache is even on the chiplets at all with that massive IO die...
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
46,277 (7.69/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
I saw this on reddit last week. Other theory is Adored tv might have been on to something. It could be 8core ccx and IO die has copy of L3 cache to improve latency. He mentioned that in his video, that would make sense too. He said it would make too much sense to reduce latency between cores and IO die being massive. Its either what adore was saying in his video or 4 core ccx a massive IO chip. But I am honestly leaning towards 8 core ccx with copy of l3 caceh on IO Die. The die seems massive and there has to be something going on there.

Not sure you need caches on both ends to reduce latencies (it should in turn increase latencies). The caches will almost never be coherent. If that concept worked, they'd have placed caches on discrete northbridges a long time ago.
 
Joined
Jun 19, 2010
Messages
397 (0.08/day)
Location
Germany (Euregio)
Processor Ryzen 5600X
Video Card(s) RTX 3050
Software Win11
Without the I/O- and DRAM synchronization, maybe the CCX-talking is now more like pure CPU instructions and data synchoniziation, with way more bandwidth.
How the R/W-buffering is maintained on the I/O-die will be interesting to see, i don´t thing there will be any big compromise.
Maybe this whole layout is even better by default to use something like the infinity-fabric, maybe Zen1 was only to look if it´s even possible.

Anandtech measured the powerconsumption-ratio cores vs. fabric, intel mesh vs. Zen.
In conclusion, the next battle in servers is not the efficiency of the better core, it´s the fabric that counts.
The charts showed mesh is better at low load, but gets beaten by the Epyc at higher load.

Now remember the whole possibilities of the Zen2 layout ? Chiplet power-gating anyone, I/O-die segment power-gating. Thats only possible if you have no MC or IO etc on the chiplets.
----------------------------------------------------------------
I/O-die will be something like this,
without GPU, Multimedia and Display

This thing will be big, smart and very fast

 
Last edited:
Joined
Jul 19, 2016
Messages
476 (0.17/day)
I saw this on reddit last week. Other theory is Adored tv might have been on to something. It could be 8core ccx and IO die has copy of L3 cache to improve latency. He mentioned that in his video, that would make sense too. He said it would make too much sense to reduce latency between cores and IO die being massive. Its either what adore was saying in his video or 4 core ccx a massive IO chip. But I am honestly leaning towards 8 core ccx with copy of l3 caceh on IO Die. The die seems massive and there has to be something going on there.


Yep! I am leaning on this, Adored guy was mentioning the same thing. He said it made so much sense given IO die is so massive and they can improve latency by copying l3 cache. The guy knows his shit about chips, he said he was speculating but it made too much sense.



It doesn't have to be. THe massive IO die could basically have a copy of each l3 cache to reduce latency and sandra might just be reading it wrong.

I wouldn't listen to that guy much regarding Zen 2 after watching his latest video on his 'predictions' for Ryzen 3000-series. He said he believes the Ryzen 3000-series flagship will have a base clock of 4.4Ghz. Base clock that high would be absolutely mental and is not happening. So I question whether he understands what he's talking about half the time. That prediction was absurd but no-one in the comments seemed to question it.
 
Joined
Sep 15, 2007
Messages
3,944 (0.65/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
I wouldn't listen to that guy much regarding Zen 2 after watching his latest video on his 'predictions' for Ryzen 3000-series. He said he believes the Ryzen 3000-series flagship will have a base clock of 4.4Ghz. Base clock that high would be absolutely mental and is not happening. So I question whether he understands what he's talking about half the time. That prediction was absurd but no-one in the comments seemed to question it.

Makes perfect sense if you mean average clock. Intel claims 3.6 base clock and NOT A SINGLE ONE even on TDP limited thin machines run at it. Mature 7nm with EUV sounds like a solid bet for high clocks on lower core count parts, so it could very well be a base clock for 8-12 cores (boost similar to current intel) depending on the wall with a refresh. There's plenty of power to be wasted at 95w/125w tdp when power reduction is so good. For first spin silicon...idk. 4.0 base?
 
Last edited:
Joined
Jul 19, 2016
Messages
476 (0.17/day)
Makes perfect sense if you mean average clock. Intel claims 3.6 base clock and NOT A SINGLE ONE even on TDP limited thin machines run at it. Mature 7nm with EUV sounds like a solid bet for high clocks on lower core count parts, so it could very well be a base clock for 8-12 cores (boost similar to current intel) depending on the wall with a refresh. There's plenty of power to be wasted at 95w/125w tdp when power reduction is so good. For first spin silicon...idk. 4.0 base?

4.4Ghz base clock he said and it makes no sense at all. It's a ludicrous prediction. Even 4Ghz would be a sky-high base clock for the 3700X/3800X.

Remember the 2700X has a base clock of 3.7Ghz. So he thinks 7nm will allow AMD to just slap 700Mhz on top of that!
 
Joined
Feb 13, 2012
Messages
522 (0.12/day)
I wouldn't listen to that guy much regarding Zen 2 after watching his latest video on his 'predictions' for Ryzen 3000-series. He said he believes the Ryzen 3000-series flagship will have a base clock of 4.4Ghz. Base clock that high would be absolutely mental and is not happening. So I question whether he understands what he's talking about half the time. That prediction was absurd but no-one in the comments seemed to question it.
Technically you cant discredit someone's predictions without facts(when ryzen 3000 series is announced/released).
4.4ghz base is about 20% higher than ryzen 7 2700x base clock(3.7g. Which is not impossible from a higher level especially when we already know that zen+ can clock up tp 4.3-4.4
The challenge is to scale max clocks. Also 14nm glofo/samsung had decent density but didn't scale well on higher voltage. So here u r going from a 14nm process that has its efficiency sweet spot in lower voltages, to a 7nm high performance pro3.7ghz)
 
Joined
Jul 19, 2016
Messages
476 (0.17/day)
Technically you cant discredit someone's predictions without facts(when ryzen 3000 series is announced/released).
4.4ghz base is about 20% higher than ryzen 7 2700x base clock(3.7g. Which is not impossible from a higher level especially when we already know that zen+ can clock up tp 4.3-4.4
The challenge is to scale max clocks. Also 14nm glofo/samsung had decent density but didn't scale well on higher voltage. So here u r going from a 14nm process that has its efficiency sweet spot in lower voltages, to a 7nm high performance pro3.7ghz)

Oh come on. What's the point of posting all that - just to disagree for the sake of it? You CAN and SHOULD discredit such a prediction unless you are stupid. There's no chance any CPU from AMD released next year will have a base clock of 4.4Ghz. Trying to justify it shows you have a complete lack of understanding of CPUs.
 
Joined
Dec 12, 2012
Messages
711 (0.17/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
Sad to see the 4-core CCX design again. Gaming performance will still be affected, and it will be even worse if desktop chips get a separate I/O die as well (which is almost certain if they want to put 16 cores on AM4).
 
Joined
Feb 13, 2012
Messages
522 (0.12/day)
Oh come on. What's the point of posting all that - just to disagree for the sake of it? You CAN and SHOULD discredit such a prediction unless you are stupid. There's no chance any CPU from AMD released next year will have a base clock of 4.4Ghz. Trying to justify it shows you have a complete lack of understanding of CPUs.
Bulldozer had a base clock of 4ghz on 32nm, what makes you think 4.4ghz on 7nm is impossible? Now again im not saying i agree or disagree, just that its not as impossible as u make it sound like. What is more likely to happen is to have a lower base clock but with an all core turbo sustaining constant 4.4ghz+ frequency.
 
Top