• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Ryzen 7 4700GE Memory Benchmarked: Extremely Low Latency Explains Tiny L3 Caches

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,683 (7.42/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD's 7 nm "Renoir" APU silicon, which features eight "Zen 2" CPU cores, has only a quarter of the L3 cache of the 8-core "Zen 2" CCD used in "Matisse," "Rome," and "Castle Peak" processors, with each of its two quad-core compute complexes (CCXs) featuring just 4 MB of it (compared to 16 MB per CCX on the 8-core "Zen 2" CCD). Chinese-language tech publication TecLab pubished a quick review of an alleged Ryzen 7 4700GE socket AM4 processor based on the "Renoir" silicon, and discovered that the chip offers significantly lower memory latencies than "Matisse," posting just 47.6 ns latency when paired with DDR4-4233 dual-channel memory.

In comparison, a Ryzen 9 3900X with these kinds of memory clocks typically posts 60-70 ns latencies, owing to the MCM design of "Matisse," where the CPU cores and memory controllers sit on separate dies, which is one of the key reasons AMD is believed to have doubled the L3 cache amount per CCX compared to previous-generation "Zeppelin" dies. TecLab tested the alleged 4700GE engineering sample on a ROG Crosshair VIII Impact X570 motherboard that has 1 DIMM per channel (the best possible memory topology).



View at TechPowerUp Main Site
 
I dont know man, 4333CL 14-13-13-28 doesnt really show us much ,except that IF fabric speed can go higher.

Current ryzen 3000 series desktop cpus would probably go super close to that if it wouldnt desync the fclk with the others

Would be more interesting to see what it does on 3200cl14 for example, or 3600 cl 14 at least
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low
 
I dont know man, 4333CL 14-13-13-28 doesnt really show us much ,except that IF fabric speed can go higher.

Current ryzen 3000 series desktop cpus would probably go super close to that if it wouldnt desync the fclk with the others

Would be more interesting to see what it does on 3200cl14 for example, or 3600 cl 14 at least
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low

about 5ns lower latency at jedec cl22 3200 vs matisse in my testing.
 
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
 
Actually, going above 3800MHz on a Ryzen 3000 CPU would end up somewhere around 80ns+
 
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
The lower latency can't be all about that. That would mean that AMD actually made a huge mistake with regular zen2 and effectivelly reduced the gaming performance with the "game cache"
 
The lower latency can't be all about that. That would mean that AMD actually made a huge mistake with regular zen2 and effectivelly reduced the gaming performance with the "game cache"

Nah, cache size will always be more beneficial than slightly lower memory access time.
 
Actually, going above 3800MHz on a Ryzen 3000 CPU would end up somewhere around 80ns+
Maybe there is 2100Mhz fclock ,1:1 mclock:uclock and this latency number is possible.
 
They're comparing an APU to a normal CPU, and it's the low power version too (GE).
 
Last edited:
The cache latencies arent dramatcly lower, but expected for the cache size. The memory latencies I think is just for the memcontroller beeing so close to the CPU and 7nm as well.
Here are my 3600 4.2Ghz results, with the best mem stable mem settings that matisse can do.
latencies.png
 
Now think of Zen3 having L3cache of Zen2 size with latencies matching or better than those of Renoir and clock speeds close to 5GHz.
 
Looking at the latency charts in TPU Forums (https://www.techpowerup.com/forums/...-go-memory-latency-competition-aida64.263929/) very noticeable improvment but does not seem to quite catch Intel's memory latency yet.

The closest and most comparable results to the 47.6 on the screenshot seem to be:
4200CL18 on 9600KF at 44.5
4266CL15 on 9900K at 33.6
(Keep in mind that compared to 4233CL14, 4266CL15 should be about 6% slower and 4200CL18 almost 30% slower in raw latency)
 
Last edited:
+500MHz FCLK on top of those Anandtech results makes a difference...
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low
Most b-die kits will do around 4000-4400 with CAS14, but that would be at benching voltages (1.7-1.8v, iirc 1.8v is the max DRAM voltage Asus non crosshair/maximus etc boards) with maxmem. Just about any decent bin of b-die does 3666-3800 at CAS14, 14 ticks at 3800 is equivalent to 16 ticks at 4333 in terms of latency.

The frequency depends a bit more on the motherboard but many newer 8Gbit ICs don't struggle to run into the mid 4000s on recent motherboards. Stuff like Rev E, DJR, and D-die for example... I expect with normal voltages for these to land around 10ns quicker than what is currently being done on Matisse.
Nah, cache size will always be more beneficial than slightly lower memory access time.
Depends on the access patterns of the program. Ryzen's L3 also gets used differently than Intel's skylake/xcove L3 because of Ryzen using exclusive victim caching while intel has been using inclusive (to L2).
 
Last edited:
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
Completely wrong.
Matisse and Renoir have the same L3$ associativity, that means L3$ tag check has the same latency.
 
Completely wrong.
Matisse and Renoir have the same L3$ associativity, that means L3$ tag check has the same latency.
I said generally, the larger the cache and the more lines there are the more tags need to be checked.
 
I said generally, the larger the cache and the more lines there are the more tags need to be checked.
Number of tags are need to be checked depends on its associativity only. Renoir and Matisse have 16-way L3$.
Also both chips have the same 10ns L3$ access latency, it means dram access penalty is the same too.
 
The cache latencies arent dramatcly lower, but expected for the cache size. The memory latencies I think is just for the memcontroller beeing so close to the CPU and 7nm as well.
Here are my 3600 4.2Ghz results, with the best mem stable mem settings that matisse can do.
View attachment 160720

The physical difference have no major impact to memory latencies.
It's interconnect and purely interconnect which matters (Yes there is a physical difference in delay but who's counting 0.2ns or so)
however, the cpu and memory controller on the same die may allow the frequency of said interconnect at higher frequency as it's not going across a substrate to another chip and thus why it clocks higher.

Just a tiny correction, and information as many thing physical distance matters for latency and no it does not it does have massive implications to power consumption which is the drawback of chiplets :).
 
Yes there is a physical difference in delay but who's counting 0.2ns or so

AMD is definitely counting those or anyone else that's making a chip. When you're accessing a cache millions of times a second you're going to start and feel those 0.2 of a nanosecond.
 
Nah, cache size will always be more beneficial than slightly lower memory access time.
Exactly being out of memory is far worse between the two. I'm wager we'll step into the 32GB minimum requirement for system memory on games before the next console generation is over and possibly cross into 64GB requirements in certain scenario's high resolutions and high AA/AF that's bound to happen. Hopefully we'll have some 64GB GPU cards by then at least the workstation level I'd anticipate it and the low end card will probably have 16GB by that point in time.
 
Back
Top