• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Big Navi GPU Features Infinity Cache?

Joined
May 2, 2017
Messages
7,762 (3.08/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
I agree, that part of my comment was a bit confusing but I didn't mean The X1X has RDNA.
just the Real-World performance increase didn't suggest higher IPC than RDNA1 to me, based on how RDNA performs in comparison to the console.
That comparison is nonetheless deeply flawed. You're comparing a GCN-based console (with a crap Jaguar CPU) to a PC with an RDNA-based GPU (unknown CPU, assuming it's not Jaguar-based though) and then that again (?) to a yet to be released console with an RDNA 2 GPU and Zen2 CPU. As there are no XSX titles out yet, the only performance data we have for the latter is while running in backwards compatibility mode, which bypasses most of the architectural improvements even in RDNA 1 and delivers IPC on par with GCN. The increased CPU performance also helps many CPU-limited XOX games perform better on the XSX. In other words, you're not even comparing apples to oranges, you're comparing an apple to an orange to a genetically modified pear that tastes like an apple but only exists in a secret laboratory.

Not to mention the issues with cross-platform benchmarking due to most console titles being very locked down in terms of settings etc. Digital Foundry does an excellent job of this, but their recent XSX back compat video went to great lengths to document how and why their comparisons were problematic.
 
Joined
Sep 17, 2014
Messages
20,776 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
I've noticed you are quite dead set on saying some pretty inflammatory and quite stupid things to be honest as of late. What's the matter ?

A 2080ti has 134% the performance of a 5700XT. The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC. Only a pretty avid fanboy of a certain color would think that such a GPU could only muster some 30% higher performance with all that. GPUs scale very well, you can expect it to be between 170-190% the performance of a 5700XT.



Caches aren't new, caches as big as the ones rumored are a new thing. I should also point out that bandwidth and the memory hierarchy is completely hidden away from the GPU cores, in other words, whether it's reading at 100GB/s from DRAM or at 1 TB/s from a cache, it doesn't care, it's just operating on some memory at an address as far as the GPU core is concerned.

Rendering is also an iterative process where you need to go over the same data many times a second, if you can keep for example megabytes of vertex data in some fast memory close to the cores that's a massive win.

GPUs hide very well memory bottlenecks by scheduling hundreds of threads, another thing you might have missed is that over time the ratio of GB/s from DRAM per GPU core has been getting lower and lower. And somehow performance keeps increasing, how the hell does that work if "bandwidth is bandwidth" ?

Clearly, there are ways of increasing the efficiency of these GPU such that they need less DRAM bandwidth to achieve the same performance, this is another one of those ways. By your logic, we must have had GPUs with tens of TB/s by now because otherwise the performances wouldn't have gone up.



They wont have much stock, most wafers are going to consoles.



While performance/watt must have increased massively, perhaps even over Ampere, the highest end card will still be north of 250W.

Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.

RT, where is it.

As for inflammatory... stupid.... time will tell won't it ;) Many times todays' flame in many beholders' eyes is tomorrows reality. Overhyping AMD's next best thing is not new and it never EVER paid off.
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.11/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
That comparison is nonetheless deeply flawed. You're comparing a GCN-based console (with a crap Jaguar CPU) to a PC with an RDNA-based GPU (unknown CPU, assuming it's not Jaguar-based though) and then that again (?) to a yet to be released console with an RDNA 2 GPU and Zen2 CPU. As there are no XSX titles out yet, the only performance data we have for the latter is while running in backwards compatibility mode, which bypasses most of the architectural improvements even in RDNA 1 and delivers IPC on par with GCN. The increased CPU performance also helps many CPU-limited XOX games perform better on the XSX. In other words, you're not even comparing apples to oranges, you're comparing an apple to an orange to a genetically modified pear that tastes like an apple but only exists in a secret laboratory.

Not to mention the issues with cross-platform benchmarking due to most console titles being very locked down in terms of settings etc. Digital Foundry does an excellent job of this, but their recent XSX back compat video went to great lengths to document how and why their comparisons were problematic.

Most of what you said makes sense but it's not THAT unrealistic to compare these things.
I'm sure you've watched DF's 5700XT vs X1X video, right?

We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.

Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.

What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhancements.

By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) aren't utilized.
If the series x was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.
 
Last edited:
Joined
Feb 13, 2012
Messages
522 (0.12/day)
It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.

It only burned people who for some reason think AMD need to have the fastest single GPU card on the market to compete. Reality is, most people will buy GPUs that cost less than 500. If I was AMD right now I'd take advantage of nvidia's desperate attempts to have that artificial fastest card in the market branding. I'd clock rdna2 in a way to maximize power efficiency and trash Nvidia for being a power hog. Ampere is worse than Fermi when it comes to being a power hog.
 

bug

Joined
May 22, 2015
Messages
13,161 (4.07/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
... The new flagship is said to have twice the shaders, likely higher clock speeds and improved IPC...

Got a source for that?
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.
 
Joined
May 2, 2017
Messages
7,762 (3.08/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Most of what you said makes sense but it's not THAT unrealistic to compare these things.
I'm sure you've watched DF's 5700XT vs X1X video, right?

We are both aware that the X1X has a very similar GPU to the RX 580. As you can see in their comparison, in a like for like comparison and in a GPU-limited scenario the 5700XT system performs 80 to 100% better than the console; in-line with how a 5700XT performs compared to a desktop RX580.

Now I'm not saying we can compare them exactly and extrapolate exact numbers; but we can get a decent idea.

What you said about the Series X being at GCN-level IPC when running Back-Compat games is honestly laughable (no offense)
you can't run a game natively on an entirely different architecture and not benefit from those extremely low-level IPC improvments. Those are some very low-level IPC improvements that will benefit your performance regardless of extra architectural enhncements.

By saying the back-compat games don't benefit from RDNA2's extra architectural benefits they didn't mean those games don't benefit from low-level architectural improvements, just that extra features of the RDNA2 (such as Variable Rate Shading) arent't utilized.
If the series was actually at GCN-level IPC, there was no way the XSX could straight-up double the X1X performance. As a 12TF GCN GPU like the Vega 64 barely performs 60% better than a RX 580.
A big part of the reason the XSX dramatically outperforms the XOX is the CPU performance improvement. You seem to be ignoring that completely.

As for the back-compat mode working as if it was GCN: AMD literally presented this when they presented RDNA1. It is by no means a console exclusive feature, it is simply down to how the GPU handles instructions. It's likely not entirely 1:1 as some low-level changes might carry over, but what AMD presented was essentially a mode where the GPU operates as if it was a GCN GPU. There's no reason to expect RDNA2 in consoles to behave differently. DF's review underscores this:
Digital Foundry said:
There may be the some consternation that Series X back-compat isn't a cure-all to all performance issues on all games, but again, this is the GPU running in compatibility mode, where it emulates the behaviour of the last generation Xbox - you aren't seeing the architectural improvements to performance from RDNA 2, which Microsoft says is 25 per cent to the better, teraflop to teraflop.
That is about as explicit as you get it: compatibility mode essentially nullifies the IPC (or "performance per TFlop") improvements of RDNA compared to GCN. That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.
 
Joined
May 15, 2020
Messages
697 (0.49/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.
We have no idea fo that, really. I'm still half expecting to find out that there is HBM or that the bus width is in fact 384 bit.

In any case, one thing I am pretty sure AMD will not do: pair a 526 sq mm RDNA2 die with a memory bandwidth starved configuration similar to that of the 5700XT, that would definitely be stupid, even based on the average TPU forumite level.

Got a source for that?
All I have is that Navi2 is twice as big as 5700XT. Considering they built using the same manufacturing process, I have a hard time imagining where everything you listed would fit. With RTRT added on top.
Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Before you dismiss Coreteks' speculations, yes, I agree his speculations are more miss than hit, but this video is leak, not speculation.
 
Joined
Feb 3, 2017
Messages
3,475 (1.33/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) EVGA Geforce RTX 3080 XC3
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
VR HMD HTC Vive
Cache replaces bandwidth yes.
Honest question - does it? Cache obviously helps with most compute uses, but how bandwidth-limited are for example textures in gaming? IIRC textures are excluded from caches on GPUs (for obvious reasons).
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.11/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
That 25% improvement MS is talking about is the IPC improvement of RDNA vs GCN.

Isn't that 25% number the exact same IPC improvment AMD stated for the RDNA1 over GCN? If so, doesn't it make my point as RDNA2 not being that much of an improvment over RDNA in terms of IPC valid?

Anyways. The new cards will be out soon enough and we'll have a better idea of how much of an improvement RDNA2 brings in terms of IPC. It will be most obvious when comparing the rumored 40CU Navi22 to the 5700XT at the same clocks.
 

bug

Joined
May 22, 2015
Messages
13,161 (4.07/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Rumors are that there is no dedicated hardware for the RT. Also, there are solid indications that the node is 7N+.
Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count and add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.
 
Joined
Jan 11, 2005
Messages
1,491 (0.21/day)
Location
66 feet from the ground
System Name 2nd AMD puppy
Processor FX-8350 vishera
Motherboard Gigabyte GA-970A-UD3
Cooling Cooler Master Hyper TX2
Memory 16 Gb DDR3:8GB Kingston HyperX Beast + 8Gb G.Skill Sniper(by courtesy of tabascosauz &TPU)
Video Card(s) Sapphire RX 580 Nitro+;1450/2000 Mhz
Storage SSD :840 pro 128 Gb;Iridium pro 240Gb ; HDD 2xWD-1Tb
Display(s) Benq XL2730Z 144 Hz freesync
Case NZXT 820 PHANTOM
Audio Device(s) Audigy SE with Logitech Z-5500
Power Supply Riotoro Enigma G2 850W
Mouse Razer copperhead / Gamdias zeus (by courtesy of sneekypeet & TPU)
Keyboard MS Sidewinder x4
Software win10 64bit ltsc
Benchmark Scores irrelevant for me
i didn't want to jump in sooner as i had to digest a lot of infos...my 2c... a large cache can improve drastically the communication between gpu/ram even if bandwidth is used 100% ; it all depend how is used and what is processed in the end; if gpu can digest all without a bottleneck all is ok and we may see a higher performance with a new type of interconnection.
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.11/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
This topic is probably beyond the understanding of us enthusiasts but I think extra cache can help to reduce the memory bandwidth requirements. It'll probably be application dependent and not as effective at higher resloutions where the sheer throughput might matter more but we've already seen higher clocked GPUs needing less bandwidth than an equally powerful GPU with lower clocks and more cores.
As higher clocks directly increases the bandwidth of the caches.
 
Joined
May 15, 2020
Messages
697 (0.49/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
Assuming by 7N+ you mean 7FF+, the math still doesn't work out. 7FF+ brings less than 20% more density. Not enough to double the CU count and add IPC improvements, even if RTRT takes zero space. Unless AMD has found a way to improve IPC using fewer transistors.
Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).

But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.

IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.
 
  • Like
Reactions: bug

bug

Joined
May 22, 2015
Messages
13,161 (4.07/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Well, that's a bit of napkin math, but basically, some components on the GPU are the same size no matter what SKU. For instance, the memory controller would take the same space on a 5700XT or on Navi 21 (still 256 bit).

But in any case, trying to discuss IPC based on approximate dies sizes is not something I try to argue about, since it is a complex issue, but I would bet it is perfectly possible to increase IPC without adding transistors. Not arguing that is what will happen here.

IF there is a huge cache, that should increase IPC a lot, because there should be much fewer cache misses, ie, time in which processing units are just requesting/waiting/storing the data from the VRAM to the cache. Remember that VRAM latency is pretty bad. On the other side, a huge cache would also take a huge chunk of the die. But trying to speculate about these things at this point seems to me a bit of a futile exercise, there are too many unknowns.
Yeah, I wasn't stating any of that as fact. Just that the initial claims seem optimistic given the little we know so far.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,147 (2.96/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Cache replaces bandwidth yes. Now, please do touch on the elephant in the room, because your selective quoting doesn't help you see things straight.
Cache alone does not replace bandwidth as you do still have to read from system memory. More cache does mean the number of hits goes up because more data is likely going to be available, but larger caches also usually means that latency goes up as well, so it's a balancing act. This is why the memory hierarchy is a thing and why cache levels are a thing, otherwise they'd just make an absolutely huge L1 cache for everything, but it doesn't work that way. So just saying "cache replaces bandwidth," is inaccurate. It augments memory bandwidth, but a system with very fast or a large amount of cache can still easily be crippled by slow memory. Just saying.
 
Joined
Apr 12, 2013
Messages
6,728 (1.68/day)
It's actually exactly that, you don't usally see major changes in cache structure or indeed cache sizes unless you've exhausted other avenues of increasing IPC. A fast cache hobbled by slow memory or bad cache structure will decrease IPC, that's what happened with *Dozers IIRC. It had a poor memory controller & really slow L1/L2 write speeds, again IIRC. That wasn't the only drawback vs Phenoms but one of the major ones.
 

bug

Joined
May 22, 2015
Messages
13,161 (4.07/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Not to mention all caches (big or small) can be thwarted by memory access patterns ;)
 
Joined
Jan 25, 2011
Messages
531 (0.11/day)
Location
Inside a mini ITX
System Name ITX Desktop
Processor Core i7 9700K
Motherboard Gigabyte Aorus Pro WiFi Z390
Cooling Arctic esports 34 duo.
Memory Corsair Vengeance LPX 16GB 3000MHz
Video Card(s) Gigabyte GeForce RTX 2070 Gaming OC White PRO
Storage Samsung 970 EVO Plus | Intel SSD 660p
Case NZXT H200
Power Supply Corsair CX Series 750 Watt
Good comedy, this

Fans desperately searching for some argument to say 256 bit GDDR6 will do anything more than hopefully get even with a 2080ti.

History repeats.

Bandwidth is bandwidth and cache is not new. Also... elephant in the room.... Nvidia needed expanded L2 Cache since Turing to cater for their new shader setup with RT/tensor in them...yeah, I really wonder what magic Navi is going to have with a similar change in cache sizes... surely they won't copy over what Nvidia has done before them like they always have right?! Surely this isn't history repeating, right? Right?!

Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.
 
Joined
Oct 12, 2005
Messages
681 (0.10/day)
What i like about this news is more about how cache work than how large the cache are.

The thing with cache is more is not always better. You can increase latency with larger cache and sometime doubling the cache do not means a significant gain in cache hit. That would end in just wasted silicon.

So the fact to me that they are implementing a new way to handle the L1 cache is to me much more promising than if they just doubled the L2 or something like that.

Note that big gain in performance will come from better cache and memory subsystem. We are starting to hit a wall there and getting data from fast memory just cost more and more power. If you can have your data to travel less, you save a lot of energy. Doing the actual computations doesn't require that much power, it's really moving the data around that increase the power consumption. So if you want an efficient architecture, you need to try to have your data to travel as less distance as possible.

But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.

If that doesn't happen, the good news is making a 256 bit architecture with a 250w tdp card cost much less than making a 350w tdp with larger bus card. AMD if they can't compete on pure performance, will be able to be very competitive on the pricing.

and in the end, that is what matter. I dont care if people buying 3090 spend too much, the card is just there for that. But i will be very happy if the next gen AMD cards increase the performance/cost in the 250-500$ range.
 

bug

Joined
May 22, 2015
Messages
13,161 (4.07/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Only if those 100s of engineers at AMD had your qualifications and your level of intellect. Obviously, they don't know what they're doing. They even managed to convince engineers at Sony and Microsoft to adopt this architecture. These companies should fire their engineering teams and hire people from TPU forums.
Well, as an engineer myself, I can tell you my job is 100% about balancing compromises. When I pick a solution, it's not the fastest and (usually) not the cheapest. And it's almost never what I would like to pick. It's what meets the requirements and can be implemented within a given budget and time frame.

Historically, any video card having a memory bus wider than 256 bits has been expensive (not talking HBM here), that is what made 256 bits standard for so many generations. 320 bits requires too complicated a PCB and even more so 384 or 512 bits.
 
Joined
Dec 30, 2010
Messages
2,082 (0.43/day)
I don't think cache can replace bandwidth. Especially when games ask for more and more VRAM. I might be looking at it the wrong way and the next example could be wrong, but, Hybrid HDDs NEVER performed as real SSDs.

I am keeping my expectations really low after reading about that 256bit data bus.

Well, going hybrid has a few key advantages. The data thats accessed frequently will be delivered much faster and data thats not frequently accessed or at least needs to be taken from the memory obviously has a small performance penalty. Second; using a cache like that you can actually save on memory bus and thus lowering power requirement for running a 312 / 512bit bus wide. But considering both consoles like the PS5 and Xbox carry the Navi hardware, it might be possible that devs finally know how to proper extract the performance in AMD really is.

Even if it's GDDR6, with a small bus, big gains could be gained when going low latency GDDR6. If i call correct applying the Ubermix 3.1 timings onto a polaris (which is basicly a 1666Mhz strap / timings applied onto 2000Mhz memory) yielded better results then simply overclocking the memory.

It's all speculation; what matters is the card being at 3080 territory or above, and AMD has a winner. Simple as that.
 
Joined
Dec 26, 2006
Messages
3,470 (0.55/day)
Location
Northern Ontario Canada
Processor Ryzen 5700x
Motherboard Gigabyte X570S Aero G R1.1 BiosF5g
Cooling Noctua NH-C12P SE14 w/ NF-A15 HS-PWM Fan 1500rpm
Memory Micron DDR4-3200 2x32GB D.S. D.R. (CT2K32G4DFD832A)
Video Card(s) AMD RX 6800 - Asus Tuf
Storage Kingston KC3000 1TB & 2TB & 4TB Corsair LPX
Display(s) LG 27UL550-W (27" 4k)
Case Be Quiet Pure Base 600 (no window)
Audio Device(s) Realtek ALC1220-VB
Power Supply SuperFlower Leadex V Gold Pro 850W ATX Ver2.52
Mouse Mionix Naos Pro
Keyboard Corsair Strafe with browns
Software W10 22H2 Pro x64
"Highly Hyped" ?? I must be living under a rock, I haven't seen much news on it. I recall seeing more stuff on Ampere over the past several months compared to RDNA 2.
 
Joined
Oct 6, 2020
Messages
33 (0.03/day)
It's less about being stupid and more about managing expectations. High tier AMD cards have burned people in the past because they expected too much. The only sensible thing to do is to wait for reviews.
Why do people like to poke around in the past? That should never ever be a valid argument. Things can always change for the good or the bad. Or did you expect the Ampere launch to be such a mess? Just concentrate on the facts and do the math. Big Navi will have twice the CUs of Navi 10 (80 vs 40), higher IPC per CU (10-15% ?) and higher gaming clock speeds (1.75 vs >2 GHz). Even without perfect scaling it shouldn't be hard to see that Big Navi could be 80-100% faster than Navi 10. What about power consumption? Navi 10 has a TDP of 225W, Big Navi is rumored to have up to 300W TDP. That's 33.33% more. With AMD's claimed 50% power efficiency improvement of RDNA 2 that means it can be twice as fast per watt. To sum it up, Big Navi has everything to be twice as fast as Navi 10. Or at least to be close to that, 1.8-1.9x. And some people still think it will be only 2080 Ti level. Which is ~40-50% faster than Navi 10.
 
Joined
May 15, 2020
Messages
697 (0.49/day)
Location
France
System Name Home
Processor Ryzen 3600X
Motherboard MSI Tomahawk 450 MAX
Cooling Noctua NH-U14S
Memory 16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s) MSI RX 5700XT EVOKE OC
Storage Samsung 970 PRO 512 GB
Display(s) ASUS VA326HR + MSI Optix G24C4
Case MSI - MAG Forge 100M
Power Supply Aerocool Lux RGB M 650W
But it is enough to fight the 3080? rumors say yes but we will see. But many time in the past, there were architecture that had less bandwidth while still performing better because they had a better memory subsystem. This might happen again.
There's raw performance and there's processing performance, and they're not the same thing. I don't know if anybody remembers the Kyro GPU, it was a while ago, basically going toe to toe with Nvidia and ATI with less than half the bandwidth by using HSR and tile-based rendering.

Why do people like to poke around in the past?

History is good science, the problem with most TPU users is that they only go back 2 generations, which is not much of history if you ask me.
 
Top