Tuesday, June 30th 2020

AMD Ryzen 7 4700GE Memory Benchmarked: Extremely Low Latency Explains Tiny L3 Caches

AMD's 7 nm "Renoir" APU silicon, which features eight "Zen 2" CPU cores, has only a quarter of the L3 cache of the 8-core "Zen 2" CCD used in "Matisse," "Rome," and "Castle Peak" processors, with each of its two quad-core compute complexes (CCXs) featuring just 4 MB of it (compared to 16 MB per CCX on the 8-core "Zen 2" CCD). Chinese-language tech publication TecLab pubished a quick review of an alleged Ryzen 7 4700GE socket AM4 processor based on the "Renoir" silicon, and discovered that the chip offers significantly lower memory latencies than "Matisse," posting just 47.6 ns latency when paired with DDR4-4233 dual-channel memory.

In comparison, a Ryzen 9 3900X with these kinds of memory clocks typically posts 60-70 ns latencies, owing to the MCM design of "Matisse," where the CPU cores and memory controllers sit on separate dies, which is one of the key reasons AMD is believed to have doubled the L3 cache amount per CCX compared to previous-generation "Zeppelin" dies. TecLab tested the alleged 4700GE engineering sample on a ROG Crosshair VIII Impact X570 motherboard that has 1 DIMM per channel (the best possible memory topology).
Sources: TecLab (Bilibili), Komachi Ensaka (Twitter)
Add your own comment

19 Comments on AMD Ryzen 7 4700GE Memory Benchmarked: Extremely Low Latency Explains Tiny L3 Caches

#2
Axaion
I dont know man, 4333CL 14-13-13-28 doesnt really show us much ,except that IF fabric speed can go higher.

Current ryzen 3000 series desktop cpus would probably go super close to that if it wouldnt desync the fclk with the others

Would be more interesting to see what it does on 3200cl14 for example, or 3600 cl 14 at least
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low
Posted on Reply
#3
Imsochobo
Axaion
I dont know man, 4333CL 14-13-13-28 doesnt really show us much ,except that IF fabric speed can go higher.

Current ryzen 3000 series desktop cpus would probably go super close to that if it wouldnt desync the fclk with the others

Would be more interesting to see what it does on 3200cl14 for example, or 3600 cl 14 at least
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low
about 5ns lower latency at jedec cl22 3200 vs matisse in my testing.
Posted on Reply
#4
Vya Domus
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
Posted on Reply
#5
TheLostSwede
Actually, going above 3800MHz on a Ryzen 3000 CPU would end up somewhere around 80ns+
Posted on Reply
#6
dyonoctis
Vya Domus
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
The lower latency can't be all about that. That would mean that AMD actually made a huge mistake with regular zen2 and effectivelly reduced the gaming performance with the "game cache"
Posted on Reply
#7
Vya Domus
dyonoctis
The lower latency can't be all about that. That would mean that AMD actually made a huge mistake with regular zen2 and effectivelly reduced the gaming performance with the "game cache"
Nah, cache size will always be more beneficial than slightly lower memory access time.
Posted on Reply
#8
HABO
TheLostSwede
Actually, going above 3800MHz on a Ryzen 3000 CPU would end up somewhere around 80ns+
Maybe there is 2100Mhz fclock ,1:1 mclock:uclock and this latency number is possible.
Posted on Reply
#9
Caring1
They're comparing an APU to a normal CPU, and it's the low power version too (GE).
Posted on Reply
#10
Bruno Vieira
The cache latencies arent dramatcly lower, but expected for the cache size. The memory latencies I think is just for the memcontroller beeing so close to the CPU and 7nm as well.
Here are my 3600 4.2Ghz results, with the best mem stable mem settings that matisse can do.
Posted on Reply
#11
HD64G
Now think of Zen3 having L3cache of Zen2 size with latencies matching or better than those of Renoir and clock speeds close to 5GHz.
Posted on Reply
#12
londiste
Looking at the latency charts in TPU Forums (www.techpowerup.com/forums/threads/how-low-can-you-go-memory-latency-competition-aida64.263929/) very noticeable improvment but does not seem to quite catch Intel's memory latency yet.

The closest and most comparable results to the 47.6 on the screenshot seem to be:
4200CL18 on 9600KF at 44.5
4266CL15 on 9900K at 33.6
(Keep in mind that compared to 4233CL14, 4266CL15 should be about 6% slower and 4200CL18 almost 30% slower in raw latency)
Posted on Reply
#13
GorbazTheDragon
Fouquin
I feel like I'm living in some kind of split timeline where we didn't already know Renoirs specs and seen reviews of this silicon in action already.
+500MHz FCLK on top of those Anandtech results makes a difference...
Axaion
The amount of people that has kits that goes to 4333 cl14-13-13-28 is pretty low
Most b-die kits will do around 4000-4400 with CAS14, but that would be at benching voltages (1.7-1.8v, iirc 1.8v is the max DRAM voltage Asus non crosshair/maximus etc boards) with maxmem. Just about any decent bin of b-die does 3666-3800 at CAS14, 14 ticks at 3800 is equivalent to 16 ticks at 4333 in terms of latency.

The frequency depends a bit more on the motherboard but many newer 8Gbit ICs don't struggle to run into the mid 4000s on recent motherboards. Stuff like Rev E, DJR, and D-die for example... I expect with normal voltages for these to land around 10ns quicker than what is currently being done on Matisse.
Vya Domus
Nah, cache size will always be more beneficial than slightly lower memory access time.
Depends on the access patterns of the program. Ryzen's L3 also gets used differently than Intel's skylake/xcove L3 because of Ryzen using exclusive victim caching while intel has been using inclusive (to L2).
Posted on Reply
#14
AlB80
Vya Domus
More like the tiny L3 cache explains the low latency. Generally, the smaller the cache the less time it takes to read/write to a particular cache line and therefore the overall average memory access time goes down.
Completely wrong.
Matisse and Renoir have the same L3$ associativity, that means L3$ tag check has the same latency.
Posted on Reply
#15
Vya Domus
AlB80
Completely wrong.
Matisse and Renoir have the same L3$ associativity, that means L3$ tag check has the same latency.
I said generally, the larger the cache and the more lines there are the more tags need to be checked.
Posted on Reply
#16
AlB80
Vya Domus
I said generally, the larger the cache and the more lines there are the more tags need to be checked.
Number of tags are need to be checked depends on its associativity only. Renoir and Matisse have 16-way L3$.
Also both chips have the same 10ns L3$ access latency, it means dram access penalty is the same too.
Posted on Reply
#17
Imsochobo
Bruno Vieira
The cache latencies arent dramatcly lower, but expected for the cache size. The memory latencies I think is just for the memcontroller beeing so close to the CPU and 7nm as well.
Here are my 3600 4.2Ghz results, with the best mem stable mem settings that matisse can do.

The physical difference have no major impact to memory latencies.
It's interconnect and purely interconnect which matters (Yes there is a physical difference in delay but who's counting 0.2ns or so)
however, the cpu and memory controller on the same die may allow the frequency of said interconnect at higher frequency as it's not going across a substrate to another chip and thus why it clocks higher.

Just a tiny correction, and information as many thing physical distance matters for latency and no it does not it does have massive implications to power consumption which is the drawback of chiplets :).
Posted on Reply
#18
Vya Domus
Imsochobo
Yes there is a physical difference in delay but who's counting 0.2ns or so
AMD is definitely counting those or anyone else that's making a chip. When you're accessing a cache millions of times a second you're going to start and feel those 0.2 of a nanosecond.
Posted on Reply
#19
InVasMani
Vya Domus
Nah, cache size will always be more beneficial than slightly lower memory access time.
Exactly being out of memory is far worse between the two. I'm wager we'll step into the 32GB minimum requirement for system memory on games before the next console generation is over and possibly cross into 64GB requirements in certain scenario's high resolutions and high AA/AF that's bound to happen. Hopefully we'll have some 64GB GPU cards by then at least the workstation level I'd anticipate it and the low end card will probably have 16GB by that point in time.
Posted on Reply
Add your own comment