Tuesday, October 6th 2020

AMD Big Navi GPU Features Infinity Cache?

As we are nearing the launch of AMD's highly hyped, next-generation RDNA 2 GPU codenamed "Big Navi", we are seeing more details emerge and crawl their way to us. We already got some rumors suggesting that this card is supposedly going to be called AMD Radeon RX 6900 and it is going to be AMD's top offering. Using a 256-bit bus with 16 GB of GDDR6 memory, the GPU will not use any type of HBM memory, which has historically been rather pricey. Instead, it looks like AMD will compensate for a smaller bus with a new technology it has developed. Thanks to the new findings on Justia Trademarks website by @momomo_us, we have information about the alleged "infinity cache" technology the new GPU uses.

It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
Source: VideoCardz
Add your own comment

141 Comments on AMD Big Navi GPU Features Infinity Cache?

#26
Vayra86
Vya Domus
I've noticed you are quite dead set on saying some pretty inflammatory and quite stupid things to be honest as of late. What's the matter ?

A 2080ti has 134% the performance of a 5700XT. The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC. Only a pretty avid fanboy of a certain color would think that such a GPU could only muster some 30% higher performance with all that. GPUs scale very well, you can expect it to be between 170-190% the performance of a 5700XT.



Caches aren't new, caches as big as the ones rumored are a new thing. I should also point out that bandwidth and the memory hierarchy is completely hidden away from the GPU cores, in other words, whether it's reading at 100GB/s from DRAM or at 1 TB/s from a cache, it doesn't care, it's just operating on some memory at an address as far as the GPU core is concerned.

Rendering is also an iterative process where you need to go over the same data many times a second, if you can keep for example megabytes of vertex data in some fast memory close to the cores that's a massive win.

GPUs hide very well memory bottlenecks by scheduling hundreds of threads, another thing you might have missed is that over time the ratio of GB/s from DRAM per GPU core has been getting lower and lower. And somehow performance keeps increasing, how the hell does that work if "bandwidth is bandwidth" ?

Clearly, there are ways of increasing the efficiency of these GPU such that they need less DRAM bandwidth to achieve the same performance, this is another one of those ways. By your logic, we must have had GPUs with tens of TB/s by now because otherwise the performances wouldn't have gone up.



They wont have much stock, most wafers are going to consoles.



While performance/watt must have increased massively, perhaps even over Ampere, the highest end card will still be north of 250W.
Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.

RT, where is it.

As for inflammatory... stupid.... time will tell won't it ;) Many times todays' flame in many beholders' eyes is tomorrows reality. Overhyping AMD's next best thing is not new and it never EVER paid off.
Posted on Reply
#27
M2B
Valantar
That comparison is nonetheless deeply flawed. You're comparing a GCN-based console (with a crap Jaguar CPU) to a PC with an RDNA-based GPU (unknown CPU, assuming it's not Jaguar-based though) and then that again (?) to a yet to be released console with an RDNA 2 GPU and Zen2 CPU. As there are no XSX titles out yet, the only performance data we have for the latter is while running in backwards compatibility mode, which bypasses most of the architectural improvements even in RDNA 1 and delivers IPC on par with GCN. The increased CPU performance also helps many CPU-limited XOX games perform better on the XSX. In other words, you're not even comparing apples to oranges, you're comparing an apple to an orange to a genetically modified pear that tastes like an apple but only exists in a secret laboratory.

Not to mention the issues with cross-platform benchmarking due to most console titles being very locked down in terms of settings etc. Digital Foundry does an excellent job of this, but their recent XSX back compat video went to great lengths to document how and why their comparisons were problematic.
Most of what you said makes sense but it's not THAT unrealistic to compare these things.
I'm sure you've watched DF's 5700XT vs X1X video, right?

We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.

Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.

What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhancements.

By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) aren't utilized.
If the series x was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.
Posted on Reply
#28
sergionography
Frick
It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.
It only burned people who for some reason think AMD need to have the fastest single GPU card on the market to compete. Reality is, most people will buy GPUs that cost less than 500. If I was AMD right now I'd take advantage of nvidia's desperate attempts to have that artificial fastest card in the market branding. I'd clock rdna2 in a way to maximize power efficiency and trash Nvidia for being a power hog. Ampere is worse than Fermi when it comes to being a power hog.
Posted on Reply
#29
bug
Vya Domus
... The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC...
Got a source for that?
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.
Posted on Reply
#30
Valantar
M2B
Most of what you said makes sense but it's not THAT unrealistic to compare these things.
I'm sure you've watched DF's 5700XT vs X1X video, right?

We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.

Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.

What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhncements.

By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) arent't utilized.
If the series was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.
A big part of the reason the XSX dramatically outperforms the XOX is the CPU performance improvement. You seem to be ignoring that completely.

As for the back-compat mode working as if it was GCN: AMD literally presented this when they presented RDNA1. It is by no means a console exclusive feature, it is simply down to how the GPU handles instructions. It's likely not entirely 1:1 as some low-level changes might carry over, but what AMD presented was essentially a mode where the GPU operates as if it was a GCN GPU. There's no reason to expect RDNA2 in consoles to behave differently. DF's review underscores this:
Digital Foundry
There may be the some consternation that Series X back-compat isn't a cure-all to all performance issues on all games, but again, this is the GPU running in compatibility mode, where it emulates the behaviour of the last generation Xbox - you aren't seeing the architectural improvements to performance from RDNA 2, which Microsoft says is 25 per cent to the better, teraflop to teraflop.
That is about as explicit as you get it: compatibility mode essentially nullifies the IPC (or "performance per TFlop") improvements of RDNA compared to GCN. That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.
Posted on Reply
#31
BoboOOZ
Vayra86
Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.
We have no idea fo that, really. I'm still half expecting to find out that there is HBM or that the bus width is in fact 384 bit.

In any case, one thing I am pretty sure AMD will not do: pair a 526 sq mm RDNA2 die with a memory bandwidth starved configuration similar to that of the 5700XT, that would definitely be stupid, even based on the average TPU forumite level.
bug
Got a source for that?
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.
Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Before you dismiss Coreteks' speculations, yes, I agree his speculations are more miss than hit, but this video is leak, not speculation.
Posted on Reply
#32
londiste
Vayra86
Cache replaces bandwidth yes.
Honest question - does it? Cache obviously helps with most compute uses, but how bandwidth-limited are for example textures in gaming? IIRC textures are excluded from caches on GPUs (for obvious reasons).
Posted on Reply
#33
M2B
Valantar
That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.
Isn't that 25% number the exact same IPC improvment AMD stated for the RDNA1 over GCN? If so, doesn't it make my point as RDNA2 not being that much of an improvment over RDNA in terms of IPC valid?

Anyways. The new cards will be out soon enough and we'll have a better idea of how much of an improvement RDNA2 brings in terms of IPC. It will be most obvious when comparing the rumored 40CU Navi22 to the 5700XT at the same clocks.
Posted on Reply
#34
Dazzm8
RedGamingTech were the first to bring this up btw, not VideoCardz.
Posted on Reply
#35
bug
BoboOOZ
Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count and add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.
Posted on Reply
#36
laszlo
i didn't want to jump in sooner as i had to digest a lot of infos...my 2c... a large cache can improve drastically the communication between gpu/ram even if bandwidth is used 100% ; it all depend how is used and what is processed in the end; if gpu can digest all without a bottleneck all is ok and we may see a higher performance with a new type of interconnection.
Posted on Reply
#37
M2B
This topic is probably beyond the understanding of us enthusiasts but I think extra cache can help to reduce the memory bandwidth requirements. It'll probably be application dependent and not as effective at higher resloutions where the sheer throughput might matter more but we've already seen higher clocked GPUs needing less bandwidth than an equally powerful GPU with lower clocks and more cores.
As higher clocks directly increases the bandwidth of the caches.
Posted on Reply
#38
BoboOOZ
bug
Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count and add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.
Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).

But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.

IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.
Posted on Reply
#39
bug
BoboOOZ
Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).

But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.

IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.
Yeah, I wasn't stating any of that as fact. Just that the initial claims seem optimistic given the little we know so far.
Posted on Reply
#40
Aquinus
Resident Wat-man
Vayra86
Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.
Cache alone does not replace bandwidth as you do still have to read from system memory. More cache does mean the number of hits goes up because more data is likely going to be available, but larger caches also usually means that latency goes up as well, so it's a balancing act. This is why the memory hierarchy is a thing and why cache levels are a thing, otherwise they'd just make an absolutely huge L1 cache for everything, but it doesn't work that way. So just saying "cache replaces bandwidth," is inaccurate. It augments memory bandwidth, but a system with very fast or a large amount of cache can still easily be crippled by slow memory. Just saying.
Posted on Reply
#41
R0H1T
It's actually exactly that, you don't usally see major changes in cache structure or indeed cache sizes unless you've exhausted other avenues of increasing IPC. A fast cache hobbled by slow memory or bad cache structure will decrease IPC, that's what happened with *Dozers IIRC. It had a poor memory controller & really slow L1/L2 write speeds, again IIRC. That wasn't the only drawback vs Phenoms but one of the major ones.
Posted on Reply
#42
bug
Not to mention all caches (big or small) can be thwarted by memory access patterns ;)
Posted on Reply
#43
hardcore_gamer
Vayra86
Good comedy, this

Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.

History repeats.

Bandwidth is bandwidth and cache is not new. Also... elephant in the room.... Nvidia needed expanded L2 Cache since Turing to cater for their new shader setup with RT/tensor in them...yeah, I really wonder what magic Navi is going to have with a similar change in cache sizes... surely they won't copy over what Nvidia has done before them like they always have right?! Surely this isn't history repeating, right? Right?!
Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.
Posted on Reply
#44
Punkenjoy
What i like about this news is more about how cache work than how large the cache are.

The thing with cache is more is not always better. You can increase latency with larger cache and sometime doubling the cache do not means a significant gain in cache hit. That would end in just wasted silicon.

So the fact to me that they are implementing a new way to handle the L1 cache is to me much more promising than if they just doubled the L2 or something like that.

Note that big gain in performance will come from better cache and memory subsystem. We are starting to hit a wall there and getting data from fast memory just cost more and more power. If you can have your data to travel less, you save a lot of energy. Doing the actual computations doesn't require that much power, it's really moving the data around that increase the power consumption. So if you want an efficient architecture, you need to try to have your data to travel as less distance as possible.

But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.

If that doesn't happen, the good news is making a 256 bit architecture with a 250w tdp card cost much less than making a 350w tdp with larger bus card. AMD if they can't compete on pure performance, will be able to be very competitive on the pricing.

and in the end, that is what matter. I dont care if people buying 3090 spend too much, the card is just there for that. But i will be very happy if the next gen AMD cards increase the performance/cost in the 250-500$ range.
Posted on Reply
#45
bug
hardcore_gamer
Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.
Well, as an engineer myself, I can tell you my job is 100% about balancing compromises. When I pick a solution, it's not the fastest and (usually) not the cheapest. And it's almost never what I would like to pick. It's what meets the requirements and can be implemented within a given budget and time frame.

Historically, any video card having a memory bus wider than 256 bits has been expensive (not talking HBM here), that is what made 256 bits standard for so many generations. 320 bits requires too complicated a PCB and even more so 384 or 512 bits.
Posted on Reply
#46
Jism
john_
I don't think cache can replace bandwidth. Especially when games ask for more and more VRAM. I might be looking at it the wrong way and the next example could be wrong, but, Hybrid HDDs NEVER performed as real SSDs.

I am keeping my expectations really low after reading about that 256bit data bus.
Well, going hybrid has a few key advantages. The data thats accessed frequently will be delivered much faster and data thats not frequently accessed or at least needs to be taken from the memory obviously has a small performance penalty. Second; using a cache like that you can actually save on memory bus and thus lowering power requirement for running a 312 / 512bit bus wide. But considering both consoles like the PS5 and Xbox carry the Navi hardware, it might be possible that devs finally know how to proper extract the performance in AMD really is.

Even if it's GDDR6, with a small bus, big gains could be gained when going low latency GDDR6. If i call correct applying the Ubermix 3.1 timings onto a polaris (which is basicly a 1666Mhz strap / timings applied onto 2000Mhz memory) yielded better results then simply overclocking the memory.

It's all speculation; what matters is the card being at 3080 territory or above, and AMD has a winner. Simple as that.
Posted on Reply
#47
mechtech
"Highly Hyped" ?? I must be living under a rock, I haven't seen much news on it. I recall seeing more stuff on Ampere over the past several months compared to RDNA 2.
Posted on Reply
#48
gruffi
Frick
It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.
Why do people like to poke around in the past? That should never ever be a valid argument. Things can always change for the good or the bad. Or did you expect the Ampere launch to be such a mess? Just concentrate on the facts and do the math. Big Navi will have twice the CUs of Navi 10 (80 vs 40), higher IPC per CU (10-15% ?) and higher gaming clock speeds (1.75 vs >2 GHz). Even without perfect scaling it shouldn't be hard to see that Big Navi could be 80-100% faster than Navi 10. What about power consumption? Navi 10 has a TDP of 225W, Big Navi is rumored to have up to 300W TDP. That's 33.33% more. With AMD's claimed 50% power efficiency improvement of RDNA 2 that means it can be twice as fast per watt. To sum it up, Big Navi has everything to be twice as fast as Navi 10. Or at least to be close to that, 1.8-1.9x. And some people still think it will be only 2080 Ti level. Which is ~40-50% faster than Navi 10.
Posted on Reply
#49
BoboOOZ
Punkenjoy
But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.
There's raw performance and there's processing performance, and they're not the same thing. I don't know if anybody remembers the Kyro GPU, it was a while ago, basically going toe to toe with Nvidia and ATI with less than half the bandwidth by using HSR and tile-based rendering.
gruffi
Why do people like to poke around in the past?
History is good science, the problem with most TPU users is that they only go back 2 generations, which is not much of history if you ask me.
Posted on Reply
#50
efikkan
Guys, please, if you mean performance per clock then say performance per clock. Don't use big words like "IPC" if you don't know what the technical term actually means. IPC is only relevant when comparing CPUs running the same ISA and workload for a single thread, while GPUs issues varying instructions across varying amounts of threads based on the GPU configuration, even within the same architecture.
Posted on Reply
Add your own comment