• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GPU Memory Latency Tested on AMD's RDNA 2 and NVIDIA's Ampere Architecture

AleksandarK

Staff member
Joined
Aug 19, 2017
Messages
1,046 (0.77/day)
Graphics cards have been developed over the years so that they feature multi-level cache hierarchies. These levels of cache have been engineered to fill in the gap between memory and compute, a growing problem that cripples the performance of GPUs in many applications. Different GPU vendors, like AMD and NVIDIA, have different sizes of register files, L1, and L2 caches, depending on the architecture. For example, the amount of L2 cache on NVIDIA's A100 GPU is 40 MB, which is seven times larger compared to the previous generation V100. That just shows how much new applications require bigger cache sizes, which is ever-increasing to satisfy the needs.

Today, we have an interesting report coming from Chips and Cheese. The website has decided to measure GPU memory latency of the latest generation of cards - AMD's RDNA 2 and NVIDIA's Ampere. By using simple pointer chasing tests in OpenCL, we get interesting results. RDNA 2 cache is fast and massive. Compared to Ampere, cache latency is much lower, while the VRAM latency is about the same. NVIDIA uses a two-level cache system consisting out of L1 and L2, which seems to be a rather slow solution. Data coming from Ampere's SM, which holds L1 cache, to the outside L2 is taking over 100 ns of latency.



AMD on the other hand has a three-level cache system. There are L0, L1, and L2 cache levels to complement the RDNA 2 design. The latency between the L0 and L2, even with L1 between them, is just 66 ns. Infinity Cache, which is an L3 cache essentially, is adding only additional 20 ns of additional latency, making it still faster compared to NVIDIA's cache solutions. NVIDIA's GA102 massive die seems to represent a big problem for the L2 cache to go around it and many cycles are taken. You can read more about the test here.

View at TechPowerUp Main Site
 
Joined
Feb 3, 2017
Messages
2,963 (1.91/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) Geforce RTX 3070 FE
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
The slow uptick between cache level s on RDNA2 is interesting. While Ampere cache levels are quite clearly distinguished RDNA2 graph is much more smooth, including Infinity Cache past 32MB.
 
Joined
Sep 6, 2013
Messages
1,788 (0.64/day)
Location
Athens, Greece
System Name 3 systems: Gaming / Internet / HTPC
Processor Ryzen 7 2700X / Ryzen 7 2600X / AM3 Athlon 645 unlocked to 6 core
Motherboard MSI X470 Gaming Plus Max / ASRock A320M-HDV R4.0 / Gigabyte GA-990XA-UD3
Cooling Νoctua U12S / AMD Wraith / CoolerMaster TX2
Memory 16GB G.Skill RIPJAWS 3600 / 16GB G.Skill Aegis 3200 / 16GB Kingston 2400MHz (DDR3)
Video Card(s) XFX RX 580 8GB + GT 620 (PhysX)/ GT 710 / GT 620
Storage Intel NVMe 500GB + SATA SSDs + SATA HDDs / Samsung 256GB NVMe + 2.5'' HDDs / Samsung SSD 120GB
Display(s) Samsung LE32D550 32'' TV(2 systems connected) / 19'' monitor + projector
Case Sharkoon Rebel 12 / Sharkoon Rebel 9 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Sharkoon 650W / Seasonic 400W
Mouse CoolerMaster / Rapoo / Logitech
Keyboard CoolerMaster / Microsoft / Logitech
Software Windows
This probably shows AMD's better experience with caches, considering that their main business is CPUs. On the other hand it shows how much faster architecture Nvidia's is, that even with higher cache latencies it performs better.
 
Joined
Nov 6, 2016
Messages
688 (0.42/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
This probably shows AMD's better experience with caches, considering that their main business is CPUs. On the other hand it shows how much faster architecture Nvidia's is, that even with higher cache latencies it performs better.
Does it perform better across the board in every game? What GPUs are you comparing out of curiosity?
 
Joined
Jan 6, 2013
Messages
275 (0.09/day)
AMD should thank a lot to TSMC for allowing them to add that much cache in such little space.
Using cache is in general the lazy man way of solving things.
 
Joined
Nov 11, 2016
Messages
1,381 (0.84/day)
System Name The de-ploughminator
Processor I7 9900K @ 5.1Ghz
Motherboard Gigabyte Z370 Gaming 5
Cooling Custom Watercooling
Memory 4x8GB G.Skill Trident Neo 3600mhz 15-15-15-30
Video Card(s) RTX 3090 + Bitspower WB
Storage Plextor 512GB nvme SSD
Display(s) LG OLED CX48"
Case Lian Li 011D Dynamic
Audio Device(s) Creative AE-5
Power Supply Corsair RM1000
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software Win10
So Ampere is a compute/bandwidth monster and RDNA2 is a latency monster, in the end which solution grab the most market share will be the winner.
 
Joined
Jan 24, 2011
Messages
137 (0.04/day)
This probably shows AMD's better experience with caches, considering that their main business is CPUs. On the other hand it shows how much faster architecture Nvidia's is, that even with higher cache latencies it performs better.
How does It actually show that Ampere architecture is much faster? Care to elaborate how big impact latency has on a GPU performance?
BTW Nvidia has higher bandwidth than AMD and in high end(GA102) It's significantly higher, but you ignore this.

So Ampere is a compute/bandwidth monster and RDNA2 is a latency monster, in the end which solution grab the most market share will be the winner.
I think Nvidia adding FP32 functionality to Its INT units is a pretty good idea. Although I don't know how much transistors or power It cost gaming performance increased by ~25% and then there is the advantage in compute workload. I wouldn't mind If AMD did the same thing.

AMD should thank a lot to TSMC for allowing them to add that much cache in such little space.
Using cache is in general the lazy man way of solving things.
For CPU yes, but for GPU what better alternative do we have? Super expensive HBM2 or expensive GDDR6x with wider memory controller?
So you can't really say Infinity cache was a bad move. I just wonder, If a smaller one(1/2 or 1/3 smaller) wouldn't be a good enough option, because honestly IC uses up a lot of space.
 
Last edited:
Joined
Feb 3, 2017
Messages
2,963 (1.91/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) Geforce RTX 3070 FE
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
I just wonder, If a smaller one(1/2 or 1/3 smaller) wouldn't be a good enough option, because honestly IC uses up a lot of space.
128MB is not that much when it comes to caching for 16GB of VRAM.
Assuming die shots in AMD presentation is somewhat accurate, Infinity Cache is 15% of Navi21 die.
 
Joined
Jul 13, 2016
Messages
1,085 (0.62/day)
Processor Ryzen 5800X
Motherboard ASRock X570 Taichi
Cooling Le Grand Macho
Memory 32GB DDR4 3600 CL16
Video Card(s) EVGA 1080 Ti
Storage Too much
Display(s) Acer 144Hz 1440p IPS 27"
Case Thermaltake Core X9
Audio Device(s) JDS labs The Element II, Dan Clark Audio Aeon II
Power Supply EVGA 850w P2
Mouse G305
Keyboard iGK64 w/ 30n optical switches
This probably shows AMD's better experience with caches, considering that their main business is CPUs. On the other hand it shows how much faster architecture Nvidia's is, that even with higher cache latencies it performs better.

If only GPU architecture was a simple as a single factor determining performance.

How does It actually show that Ampere architecture is much faster? Care to elaborate how big impact latency has on a GPU performance?
BTW Nvidia has higher bandwidth than AMD and in high end(GA102) It's significantly higher, but you ignore this.


I think Nvidia adding FP32 functionality to Its INT units is a pretty good idea. Although I don't know how much transistors or power It cost gaming performance increased by ~25% and then there is the advantage in compute workload. I wouldn't mind If AMD did the same thing.


For CPU yes, but for GPU what better alternative do we have? Super expensive HBM2 or expensive GDDR6x with wider memory controller?
So you can't really say Infinity cache was a bad move. I just wonder, If a smaller one(1/2 or 1/3 smaller) wouldn't be a good enough option, because honestly IC uses up a lot of space.

It doesn't. The guy is just making an assumption and an incorrect one at that.
 
Joined
Jan 24, 2011
Messages
137 (0.04/day)
128MB is not that much when it comes to caching for 16GB of VRAM.
Assuming die shots in AMD presentation is somewhat accurate, Infinity Cache is 15% of Navi21 die.
I think somewhere It was mentioned It was ~20%. 20% from 520mm2 is 104mm2 and that's not a small number If we take into account that space could have been used for more CUs for example. BTW one RDNA1 WGP(2xCU) is only 4.1mm2 so I think RDNA2 WGP could be 5mm2 at most, so by halving Infinity cache and saving up 52mm2 you could put 25% more CU into N21. It would be great, If we could somehow disable a part of IC and see what kind of effect It has on performance.
 
Joined
Jan 3, 2021
Messages
173 (1.40/day)
Location
Exexfirstladyland
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The slow uptick between cache level s on RDNA2 is interesting. While Ampere cache levels are quite clearly distinguished RDNA2 graph is much more smooth, including Infinity Cache past 32MB.
Yes, that's interesting. The gradual increase above 4MB could indicate that the L3 cache is sectioned (with one part belonging to each memory controller?), and access time increases significantly when a CU needs to access data in a "distant" section. The gradual increase up to 4MB could mean that L2 is split into sections too, again with varying access time.
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
22,068 (3.56/day)
Processor Core i7-8700K
Memory 32 GB
Video Card(s) RTX 3080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
If we take into account that space could have been used for more CUs for example
AMD made it clear in press briefings that given their power and thermal goals, the L3 cache was the better option
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
12,086 (3.57/day)
Location
Concord, NH
System Name Apollo
Processor Intel Core i9 9880H
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Full Size Wireless Apple Magic Keyboard
Software MacOS 10.15.7
128MB is not that much when it comes to caching for 16GB of VRAM.
Assuming die shots in AMD presentation is somewhat accurate, Infinity Cache is 15% of Navi21 die.
It's plenty. By that logic, the 64GB of memory in my laptop is gimped by the 16MB of cache on my CPU. It's not about the amount, it's about the hit ratio. Also the cache uses less power, so sure, you could replace it with CUs, but that's also more compute with more memory latency and more heat. That doesn't sound like a winning combo compared to what AMD has now.
 
Joined
Aug 23, 2013
Messages
246 (0.09/day)
AMD should thank a lot to TSMC for allowing them to add that much cache in such little space.
Using cache is in general the lazy man way of solving things.
This is just a stepping stone for when they go chiplet. They need a fast cache, so they don't need to access the VRAM often when it is across a IO die.
 
Joined
Sep 28, 2012
Messages
738 (0.23/day)
System Name Potato PC
Processor AMD Ryzen 5 3600
Motherboard ASRock B550M Steel Legend
Cooling ID Cooling SE 224XT Basic
Memory 32GB Team Dark Alpha DDR4 3600Mhz
Video Card(s) MSI RX 5700XT Mech OC
Storage Kingston A2000 1TB + 8 TB Toshiba X300
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) Plantronic RIG 400
Power Supply Seasonic X650 Gold
Mouse Logitech G903
Keyboard Logitech G613
Benchmark Scores Who need bench when everything already fast?
This explains why RX 6800 series is a serious competitor at 1080p and up to 1440p, even though the Ampere has a much wider GDDR6x memory bandwidth. Oh and some YouTubers have also said that playing on the RX 6800 is smoother, so there another perks you can't measure.

 
Last edited:
Joined
Oct 4, 2017
Messages
509 (0.39/day)
Location
France
System Name White Rose ( https://imgur.com/gallery/l7Lg4Wj )
Processor RYZEN 7 3700X
Motherboard ROG STRIX B450-i
Cooling NOCTUA NH-L12S
Memory Patriot Viper Steel DDR4 4000Mhz 16Go PVS416G400C9K
Video Card(s) Gaming X 2080 Super ( TUF 3080 preordered )
Storage XPG SX8200 Pro 512 go NVMe + SAMSUNG 850 EVO 500GB
Display(s) Dell S2721DGF
Case Nouvolo Steck
Power Supply CORSAIR SF600
Mouse Logitech G203 Prodigy
Keyboard Ajazz ak33
Software Windows 10 20H2
This explains why RX 6800 series is a serious competitor at 1080p and up to 1440p, even though the Ampere has a much wider GDDR6x memory bandwidth. Oh and some YouTubers have also said that playing on the RX 6800 is smoother, so there another perks you can't measure.

''Smoothness'' of a game can be measured with frametimes , there is nothing magic about it that can't be measured !
 
Joined
Apr 10, 2010
Messages
1,765 (0.44/day)
Location
London
System Name Jaspe
Processor Ryzen 1500X
Motherboard Asus ROG Strix X370-F Gaming
Cooling Stock
Memory 16Gb Corsair 3000mhz
Video Card(s) EVGA GTS 450
Storage Crucial M500
Display(s) Philips 1080 24'
Case NZXT
Audio Device(s) Onboard
Power Supply Enermax 425W
Software Windows 10 Pro
It's plenty. By that logic, the 64GB of memory in my laptop is gimped by the 16MB of cache on my CPU. It's not about the amount, it's about the hit ratio. Also the cache uses less power, so sure, you could replace it with CUs, but that's also more compute with more memory latency and more heat. That doesn't sound like a winning combo compared to what AMD has now.
I think he's talking about how much space it takes on the chip.
 
Joined
Sep 28, 2012
Messages
738 (0.23/day)
System Name Potato PC
Processor AMD Ryzen 5 3600
Motherboard ASRock B550M Steel Legend
Cooling ID Cooling SE 224XT Basic
Memory 32GB Team Dark Alpha DDR4 3600Mhz
Video Card(s) MSI RX 5700XT Mech OC
Storage Kingston A2000 1TB + 8 TB Toshiba X300
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) Plantronic RIG 400
Power Supply Seasonic X650 Gold
Mouse Logitech G903
Keyboard Logitech G613
Benchmark Scores Who need bench when everything already fast?
''Smoothness'' of a game can be measured with frametimes , there is nothing magic about it that can't be measured !

Have you watched the video? It's called placebo effect, have you invented a tool to measure it?
 
Joined
Dec 28, 2012
Messages
1,592 (0.52/day)
Have you watched the video? It's called placebo effect, have you invented a tool to measure it?
Yeah, its called frametime measurement.

This probably shows AMD's better experience with caches, considering that their main business is CPUs. On the other hand it shows how much faster architecture Nvidia's is, that even with higher cache latencies it performs better.
Faster? I mean outside of raytracing, the 3080 loses to the 6900xt and 6800xt at 1440p, but wins at 4K. Nvidia also requires significantly more power to do so. I know, samsung 8nm vs TSMC 7nm, but we've seen what happens when nvidia's arch is way ahead of AMD with the maxwell VS GCN era. Even if you look at SM count instead of core count the 3090 and 6900xt are not that different.
 
Joined
Sep 28, 2012
Messages
738 (0.23/day)
System Name Potato PC
Processor AMD Ryzen 5 3600
Motherboard ASRock B550M Steel Legend
Cooling ID Cooling SE 224XT Basic
Memory 32GB Team Dark Alpha DDR4 3600Mhz
Video Card(s) MSI RX 5700XT Mech OC
Storage Kingston A2000 1TB + 8 TB Toshiba X300
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) Plantronic RIG 400
Power Supply Seasonic X650 Gold
Mouse Logitech G903
Keyboard Logitech G613
Benchmark Scores Who need bench when everything already fast?
Joined
Dec 28, 2012
Messages
1,592 (0.52/day)
Again, have you watched the video? There's also a frame counter in the top right corner. Here's link to save your time Linus
Again, you miss the point. "smoother" is a descriptor that can be measured. If it's a benefit, then surely you can link some evidence of benchmarks done showing AMD has better frametimes, yeah?
 
Joined
Sep 28, 2012
Messages
738 (0.23/day)
System Name Potato PC
Processor AMD Ryzen 5 3600
Motherboard ASRock B550M Steel Legend
Cooling ID Cooling SE 224XT Basic
Memory 32GB Team Dark Alpha DDR4 3600Mhz
Video Card(s) MSI RX 5700XT Mech OC
Storage Kingston A2000 1TB + 8 TB Toshiba X300
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) Plantronic RIG 400
Power Supply Seasonic X650 Gold
Mouse Logitech G903
Keyboard Logitech G613
Benchmark Scores Who need bench when everything already fast?
Again, you miss the point. "smoother" is a descriptor that can be measured. If it's a benefit, then surely you can link some evidence of benchmarks done showing AMD has better frametimes, yeah?

Smooth is an adjective not a noun, and has no metrics associated with it. I don't need to prove anything cause I have already presented a topic for debate.
 
Joined
Jan 8, 2017
Messages
6,619 (4.19/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
128MB is not that much when it comes to caching for 16GB of VRAM.

For GPUs it is a ludicrous amount of cache. Just a few years ago you were looking at less than <1 KB of combined levels of cache per thread in a GPU. Now that amount has went up by at least an order of magnitude.
 
Joined
Jan 24, 2011
Messages
137 (0.04/day)
AMD made it clear in press briefings that given their power and thermal goals, the L3 cache was the better option
Wasn't that statement about the actual use of IC?
I never said to get rid of the whole IC, which was clearly stated in my post. What I wanted is to halve It(64MB instead of 128MB) and the saved up space would be used for more CU. BTW I would love to see a performance penalty graph for using smaller IC to know, If that much cache is really needed or It can be smaller.
 
Last edited:
Joined
Dec 28, 2012
Messages
1,592 (0.52/day)
Smooth is an adjective not a noun, and has no metrics associated with it. I don't need to prove anything cause I have already presented a topic for debate.
You presented an opinion, an opinion that is objectively incorrect. You presented the argument, if you cant prove your argument then all you are doing is shitting up the thread. "smoothness" IS a noun, per oxford's learner dictionary, and can be measured via frametime measurement.

Oxford: https://www.oxfordlearnersdictionaries.com/us/definition/english/smoothness#:~:text=smoothness-,noun,any rough areas or holes

I can present a new topic for depate too: "Does 1d10t live up to his username?".
 
Top