Tuesday, October 6th 2020

AMD Big Navi GPU Features Infinity Cache?

As we are nearing the launch of AMD's highly hyped, next-generation RDNA 2 GPU codenamed "Big Navi", we are seeing more details emerge and crawl their way to us. We already got some rumors suggesting that this card is supposedly going to be called AMD Radeon RX 6900 and it is going to be AMD's top offering. Using a 256-bit bus with 16 GB of GDDR6 memory, the GPU will not use any type of HBM memory, which has historically been rather pricey. Instead, it looks like AMD will compensate for a smaller bus with a new technology it has developed. Thanks to the new findings on Justia Trademarks website by @momomo_us, we have information about the alleged "infinity cache" technology the new GPU uses.

It is reported by VideoCardz that the internal name for this technology is not Infinity Cache, however, it seems that AMD could have changed it recently. What does exactly you might wonder? Well, it is a bit of a mystery for now. What it could be, is a new cache technology which would allow for L1 GPU cache sharing across the cores, or some connection between the caches found across the whole GPU unit. This information should be taken with a grain of salt, as we are yet to see what this technology does and how it works, when AMD announces their new GPU on October 28th.
Source: VideoCardz
Add your own comment

141 Comments on AMD Big Navi GPU Features Infinity Cache?

#126
Vayra86
gruffi
How about answering my question first? I'm still missing that one.
That was the answer to your question ;)
gruffi
No. I said what I said. I never categorized anything as good or bad. That was just you. But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.
First update was Tonga (R9 285 was it?) and it failed miserably, then they tried Fury X. Then came Polaris.

None of it was a serious move towards anything with a future, it was clearly grasping at straws as Hawaii XT already ran into the limits of what GCN could push ahead. They had a memory efficiency issue. Nvidia eclipsed that entirely with the release of Maxwell's Delta Compression tech which AMD at the time didn't have. Polaris didn't either so its questionable what use that 'update' really was. All Polaris really was, was a shrink from 22 > 14nm and an attempt to get some semblance of a cost effective GPU in the midrange. Other development was stalled and redirected to more compute (Vega) and pro markets because 'that's where the money is', while similarly the midrange 'is where the money is'. Then came mining... and it drove 90% of Polaris sales, I reckon. People still bought 1060's and 970's regardless, not in the least because those were actually available.

Current trend in GPUs... Jon Peddies reports year over year a steady growth (relatively) wrt high end GPUs and the average price is steadily rising. Its a strange question to ask me what undisclosed facts RDNA2 will bring to change the current state of things, but its a bit of a stretch to 'assume' they will suddenly leap ahead as some predict. The supposed specs we DO have, show about 500GB/s in bandwidth and that is a pretty hard limit, and apparently they do have some sort of cache system that does something for that as well, seeing the results. If the GPU we saw in AMD"s benches was the 500GB/s one, the cache is good for another 20%. Nice. But it still won't eclipse a 3080. This means they will need a wider bus for anything bigger; and this will in turn take a toll on TDPs and efficiency.

The first numbers are in and we've already seen about a 10% deficit to the 3080 with whatever that was supposed to be. There is probably some tier above it, but I reckon it will be minor like the 3090 above a 3080 is. As for right decisions... yes, retargeting the high end is a good decision, its the ONLY decision really and I hope they can make it happen, but the track record for RDNA so far isn't spotless, if not just plagued with very similar problems to what GCN had, up until now.

@gruffi sry, big ninja edit, I think you deserved it for pressing the question after all :)
Posted on Reply
#127
DeathtoGnomes
Valantar
This is a forum for computer enthusiasts.
stop spreading false rumors!! :p :rolleyes:
Posted on Reply
#128
InVasMani
Now here I thought this was a forum to learn about basket weaving 101 OOP!
Posted on Reply
#129
Valantar
gruffi
No. I never said that's the "only" factor. But it's very common to express the capability of such chips in FLOPS. AMD does it, Nvidia does it, every supercomputer does. You claimed I was off. And that's simply wrong. We should all know that actual performance depends on other factors as well, like workload or efficiency.
And camera manufacturers still market megapixels as if it's a meaningful indication of image quality. Should we accept misleading marketing terms just because they are common? Obviously not. The problems with using teraflops as an indicator of consumer GPU performance have been discussed at length both in forums like these and the media. As for supercomputers: that's one of the relatively few cases where teraflops actually matter, as supercomputers run complex compute workloads. Though arguably FP64 is likely more important to them than FP32. But for anyone outside of a datacenter? There are far more important metrics than the base teraflops of FP32 that a GPU can deliver.
gruffi
No. I said what I said. I never categorized anything as good or bad. That was just you.
Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.
gruffi
But if you want to know my opinion, yes, GCN was a good general architecture for computing and gaming when it was released. You can see that it aged better than Kepler. But AMD didn't continue its development. Probably most resources went into the Zen development back then. I don't know. The first major update was Polaris. And that was ~4.5 years after the first GCN generation. Which simply was too late. At that time Nvidia already had been made significant progress with Maxwell and Polaris. That's why I think splitting up the architecture into RDNA and CDNA was the right decision. It's a little bit like Skylake. Skylake was a really good architecture on release. But over the years there was no real improvement. Only higher clock speed and higher power consupmtion. OTOH AMD mode significant progress with every new full Zen generation.
I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN. It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall. It could be said to be a good architecture for compute, hence its use as the basis for CDNA, but for more general workloads it simply scales poorly.
DeathtoGnomes
stop spreading false rumors!! :p :rolleyes:
Sorry, my bad. I should have said "This is a forum for RGB enthusiasts."
Posted on Reply
#130
BoboOOZ
Valantar
Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.
Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
Better waste this energy with more meaningful discussions.

Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To

Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).
Posted on Reply
#131
Valantar
BoboOOZ
Come on, you're exaggerating and you can do better than this. In the context of this discussion, strong is closer to "raw performance" than to "good".
Better waste this energy with more meaningful discussions.

Also, bandwidth and TFlops are the best objective measures to express the potential performance of graphic cards, and they're fine if they're understood as what they are. To

Just an aside, the only time I see TFlops as truly misleading is with Ampere, because those double purpose CU will never attain their maximum theoretical througput, because they have to do integer computations ,too (which amount to about 30% of computations in gaming, according to Nvidia themselves).
Sorry, I might be pedantic, but I can't agree with this. Firstly, the meaning of "strong" is obviously dependent on context, and in this context (consumer gaming GPUs) the major relevant form of "strength" is gaming performance. Attributing FP32 compute performance as a more relevant reading of "strong" in a consumer GPU lineup needs some actual arguments to back it up. I have so far not seen a single one.

Your second statement is the worst type of misleading: something that is technically true, but is presented in a way that vastly understates the importance of context, rendering its truthfulness moot. "They're fine if they're understood as what they are" is entirely the point here: FP32 is in no way whatsoever a meaningful measure of consumer GPU performance across architectures. Is it a reasonable point of comparison within the same architecture? Kind of! For non-consumer uses, where pure FP32 compute is actually relevant? Sure (though it is still highly dependent on the workload). But for the vast majority of end users, let alone the people on these forums, FP32 as a measure of the performance of a GPU is very, very misleading.

Just as an example, here's a selection of GPUs and their game performance/Tflop in TPU's test suite at 1440p from the 3090 Strix OC review:

Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF

Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF

RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF
RX 5500 XT 27% 5.2TF = 5.2 perf/TF

GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF

Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF

This is of course at just one resolution, and the numbers would change at other resolutions. The point still shines through: even within the same architectures, using the same memory technology, gaming performance per teraflop of FP32 compute can vary by 25% or more. Across architectures we see more than 100% variance. Which demonstrates that for the average user, FP32 is an utterly meaningless metric. Going by these numbers, a 20TF GPU might beat the 3090 (if it matched the 2060 in performance/TF) or it might lag dramatically (like the VII or Ampere).

Unless you are a server admin or researcher or whatever else running workloads that are mostly FP32, using FP32 as a meaningful measure of performance is very misleading. Its use is very similar to how camera manufacturers have used (and partially still do) megapixels as a stand-in tech spec to represent image quality. There is some relation between the two, but it is wildly complex and inherently non-linear, making the one meaningless as a metric for the other.
Posted on Reply
#132
gruffi
Valantar
Ah, okay, so "strong" is a value-neutral term with a commonly accepted or institutionally defined meaning? If so, please provide a source for that. Until then, I'll keep reading your use of "strong" as "good", as that is the only reasonable interpretation of that word in this context.
I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².
Valantar
I agree that GCN was good when it launched. In 2012. It was entirely surpassed by Maxwell in 2014. As for "the first major update [being] Polaris", that is just plain wrong. Polaris was the fourth revision of GCN.
You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.
Valantar
It's obvious that the development of GCN was hurt by AMD's financial situation and lack of R&D money, but the fact that their only solution to this was to move to an entirely new architecture once they got their act together tells us that it was ultimately a relatively poor architecture overall.
I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.
Vayra86
That was the answer to your question ;)
Okay. Than we can agree that it could be possible to be competitive or at least very close in performance even with less memory bandwidth? ;)
Posted on Reply
#133
Vayra86
gruffi
Okay. Than we can agree that it could be possible to be competitive or at least very close in performance even with less memory bandwidth? ;)
Could as in highly unlikely, yes.
Posted on Reply
#134
londiste
Valantar
Ampere:
3090 (Strix OC) 100% 39TF = 2.56 perf/TF
3080 90% 29.8TF = 3 perf/TF

Turing:
2080 Ti 72% 13.45TF = 5.35 perf/TF
2070S 55% 9TF = 6.1 perf/TF
2060 41% 6.5TF = 6.3 perf/TF

RDNA:
RX 5700 XT 51% 9.8TF = 5.2 perf/TF
RX 5600 XT 40% 7.2TF = 5.6 perf/TF
RX 5500 XT 27% 5.2TF = 5.2 perf/TF

GCN
Radeon VII 53% 13.4 TF = 4 perf/TF
Vega 64 41% 12.7TF = 3.2 perf/TF
RX 590 29% 7.1TF = 4.1 perf/TF

Pascal:
1080 Ti 53% 11.3TF = 4.7 perf/TF
1070 34% 6.5TF = 5.2 perf/TF
At 1440p Ampere probably gets more of a hit than it should.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.

Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)

Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)

RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz

GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)

Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)

I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.
GPU        TFLOP 1440p %/TF  2160p %/TF
3090 40.3 100% 2.48 100% 2.48
3080 33.6 90% 2.68 84% 2.5

2080Ti 15.9 72% 4.53 64% 4.02
2070S 9.2 55% 5.98 46% 5.00
2060 7.1 41% 5.77 34% 4.79

RX5700XT 9.66 51% 5.28 42% 4.35
RX5600XT* 9.0 40% 5.00 33% 4.12
RX5500XT 5.1 27% 5.29 19% 3.72

Radeon VII 13.6 53% 3.90 46% 3.38
Vega64* 12.7 41% 3.23 34% 2.68
RX590 7.1 29% 4.08 24% 3.38

1080Ti 12.7 53% 4.17 45% 3.54
1070 6.9 34% 4.93 28% 4.06

- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.

As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?
Posted on Reply
#135
Valantar
gruffi
I think that's tho whole point of your misunderstanding. You interpreted. And you interpreted in a wrong way. So let me be clear once and for all. As I said, with "strong" a was referring to raw performance. And raw performance is usually measured in FLOPS. I didn't draw any conclusions if that makes an architecture good or bad. Which is usually defined by metrics like performance/watt and performance/mm².
Skipping the hilarity of (unintentionally, I presume) suggesting that reading without interpretation is possible, I have explained the reasons for my interpretation at length, and why it to me is a much more reasonable assumption of what a "strong" GPU architecture means in the consumer space. You clearly worded your statement vaguely and ended up saying something different from what you meant. (For the record: even calling FP32 "raw performance" is a stretch - it's the main performance metric of modern GPUs, but still one among at least a couple dozen relevant ones, all of which affect various workloads in different ways. Hence me arguing for why it alone is a poor indication of anything except performance in pure FP32 workloads. It's kind of like discussing which minivan is the best solely based on engine horsepower, while ignoring the number and quality of seats, doors, build quality, reliability, ride comfort, etc.) You're welcome to disagree with this, but so far your arguments for your side of this discussion have been unconvincing at best.
gruffi
You say it, just revisions. Hawaii, Tonga, Fiji. They all got mostly only ISA updates and more execution units. One exception was HBM for Fiji. But even that didn't change the architecture at all. Polaris was the first generation after Tahiti that had some real architecture improvements to increase IPC and efficiency.
Uhm ... updating the ISA is a change to the architecture. Beyond that, AMD kept talking about various low-level architectural changes to GCN for each revision - beyond what is published; after all, published information doesn't really go beyond block diagram levels - but these never really materialized as performance or efficiency improvements. You're right that the move to HBM didn't change the architecture, as the memory controllers generally aren't seen as part of the GPU architecture. Of course the main bottleneck for GCN was its 64 CU limit, which forced AMD to release the V64 at idiotic clocks to even remotely compete in absolute performance, but made the architecture look terrible for efficiency at the same time. A low-clocked Vega 64 is actually quite efficient, after all, and show that if AMD could have made a medium-clocked 80CU Vega card, they could have been in a much better competitive position (though at some cost due to the large die). That limitation alone is likely both the main reason for AMD's GPU woes and their choice of replacing GCN entirely - they had no other choice. But even with limited resources, they had more than half a decade to improve GCN architecturally, and managed pretty much nothing. Luckily with RDNA they've both removed the 64 CU limit and improved perf/TF dramatically, with promises of more to come.
gruffi
I wouldn't say that. The question is what's your goal. Obviously AMD's primary goal was a strong computing architecture to counter Fermi's successors. Maybe AMD didn't expect Nvidia to go the exact opposite way. Kepler and Maxwell were gaming architectures. They were quite poor at computing, especially Kepler. Back then, with enough resources, I think AMD could have done with GCN what they are doing now with RDNA. RDNA is no entirely new architecture from scratch like Zen. It's still based on GCN. So, it seems GCN was a good architecture after all. At least better than what some people try to claim. The lack of progress and the general purpose nature just made GCN look worse for gamers over time. Two separate developments for computing and gaming was the logical consequence. Nvidia might face the same problem. Ampere is somehow their GCN moment. Many shaders, apparently good computing performance, but way worse shader efficiency than Turing for gaming.
That's possible, but unlikely. The enterprise compute market is of course massively lucrative, but AMD didn't design GCN as a datacenter compute-first core. It was a graphics core design meant to replace VLIW, but it also happened to be very good at pure FP32. Call it a lucky side effect. At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market. RDNA is clearly a gaming-oriented architecture, with CDNA being split off (and reportedly being much closer to GCN in design) for compute work, but that doesn't mean that GCN wasn't initially designed for gaming.
londiste
At 1440p Ampere probably gets more of a hit than it should.
But more importantly, especially for Nvidia cards spec TFLOPs is misleading. Just check the average clock speeds in the respective reviews.
At the same time, RDNA has the Boost Boost clock that is not quite what the card actually achieves.

Ampere:
3090 (Strix OC, 100%): 1860 > 1921MHz - 39 > 40.3TF (2.48 %/TF)
3080 (90%): 1710 > 1931MHz - 29.8 > 33.6TF (2.68 %/TF)

Turing:
2080Ti (72%): 1545 > 1824MHz - 13.45 > 15.9TF (4.53 %/TF)
2070S (55%): 1770 > 1879MHz - 9 > 9.2TF (5.98 %/TF)
2060 (41%): 1680 > 1865MHz - 6.5 > 7.1TF (5.77 %/TF)

RDNA:
RX 5700 XT (51%): 1755 (1905) > 1887MHz - 9.0 (9.8) > 9.66TF (5.28 %/TF)
RX 5600 XT (40%): 1750 > 1730MHz - 8.1 > 8.0TF (5.00 %/TF) - this one is a mess with specs and clocks but ASUS TUF seems closes to newer reference spec and it is not the right comparison really
RX 5500 XT (27%): 1845 > 1822MHz - 5.2 > 5.1TF (5.29 %/TF) - all reviews are of AIB cards but the two closest to reference specs got 1822MHz

GCN:
Radeon VII (53%): 1750 > 1775MHz - 13.4 > 13.6TF (3.90 %/TF)
Vega 64 (41%): 1546MHz - 12.7TF (3.23 %/TF) - lets assume it ran at 1546MHz in review, I doubt it because my card struggled heavily to reach spec clocks
RX 590 (29%): 1545MHz - 7.1TF (4.08 %/TF)

Pascal
1080Ti (53%): 1582 > 1777MHz - 11.3 > 12.7TF (4.17 %/TF)
1070 (34%): 1683 > 1797MHz - 6.5 > 6.9TF (4.93 %/TF)

I actually think 4K might be better comparison for faster cards, perhaps down to Radeon VII. So instead of the unreadable mess above here is a table with GPUs, their actual TFLOPs numbers and relative performance (from the same referenced 3090 Strix review) as well as performance per TFLOPs in a table, both at 1440p and 2160p.
* means average clock speed is probably overrated, so less TFLOPs in reality and better %/TF.
GPU        TFLOP 1440p %/TF  2160p %/TF
3090 40.3 100% 2.48 100% 2.48
3080 33.6 90% 2.68 84% 2.5

2080Ti 15.9 72% 4.53 64% 4.02
2070S 9.2 55% 5.98 46% 5.00
2060 7.1 41% 5.77 34% 4.79

RX5700XT 9.66 51% 5.28 42% 4.35
RX5600XT* 9.0 40% 5.00 33% 4.12
RX5500XT 5.1 27% 5.29 19% 3.72

Radeon VII 13.6 53% 3.90 46% 3.38
Vega64* 12.7 41% 3.23 34% 2.68
RX590 7.1 29% 4.08 24% 3.38

1080Ti 12.7 53% 4.17 45% 3.54
1070 6.9 34% 4.93 28% 4.06

- Pascal, Turing and Navi/RDNA are fairly even on perf/TF.
- Polaris is a little worse than Pascal but not too bad.
- Vega struggles a little.
- 1080Ti low result is somewhat surprising.
- 2080Ti and Amperes are inefficient at 1440p and do better at 2160p.

As for what Ampere does, there is something we are missing about the double FP32 claim. Scheduling limitations are the obvious one but ~35% actual performance boost from double units sounds like something is very heavily restricting performance. This is optimistically - in the table/review it was 25% at 1440p and and 31% at 2160p from 2080Ti to 3080 that are largely identical except for the double FP32 units. Since productivity stuff does get twice the performance, is it really the complexity and variability of gaming workloads causing scheduling to cough blood?
I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.
Posted on Reply
#136
londiste
Valantar
I entirely agree that Ampere makes calculations like this even more of a mess than what they already were, but my point still stands after all - there are still massive variations even within the same architectures, let alone between different ones. I'm also well aware that boost clocks severely mess this up and that choosing one resolution limits its usefulness - I just didn't want the ten minutes I spent on that to become 45, looking up every boost speed and calculating my own FP32 numbers.
Variations are probably down to relative amount of other aspects of the card - memory bandwidth, TMUs, ROPs. Trying not to go down that rabbit hole right now. It didn't take me quite 45 minutes to put that one together but it wasn't too far off :D
Posted on Reply
#137
gruffi
Valantar
You clearly worded your statement vaguely
I was very clear about that I was just talking about raw performance as a simple fact. And not if something is considered to be good or bad based on that. Maybe next time if you are unsure about the meaning of other's words ask first to clear things up. ;) Your aggressive and bossy answers to put words in my mouth I never said is a very impolite and immature way of having a conversation.
Valantar
updating the ISA is a change to the architecture.
But it doesn't make the architecture faster or more efficient considering general performance. That's the important point. Or do you think adding AVX512 to Comet Lake would make it better for your daily tasks? Not at all.
Valantar
AMD didn't design GCN as a datacenter compute-first core. It was a graphics core design
In fact it was no graphics core design. Look up the press material that AMD published back then. You can read statements like "efficient and scalable architecture optimized for graphics and parallel compute" or "cutting-edge gaming and compute performance". GCN clearly was designed as a hybrid, an architecture to be equally good at gaming and compute. But I think the focus was more on improving compute performance. Because that was the philosophy of the AMD staff at that time. They wanted to be more competitive in professional markets. Bulldozer was designed with the same focus in mind. Lisa Su changed that. Nowadays AMD focuses more on client markets again.
Valantar
but it also happened to be very good at pure FP32. Call it a lucky side effect.
That naivety is almost funny. Nothing happens as a side effect during years of development. It was on purpose.
Valantar
At the time it was designed datacenter GPU compute barely existed at all (datacenters and supercomputers at that time were mostly CPU-based), and the market when it emerged was nearly 100% CUDA, leaving AMD on the outside looking in. AMD tried to get into this with OpenCL and similar compute-oriented initiatives, but those came long after GCN hit the market.
AMD had GPGPU software solutions before OpenCL and even before CUDA. The first was the CTM (Close To The Metal) interface. Later it was replaced by the Stream SDK. All those developments happened in the later 00's. Likely when GCN was in its design phase. It's obvious that AMD wanted a performant compute architecture to be prepared for future GPGPU environments. That doesn't mean GCN was a compute only architecture. Again, I didn't say that. But compute performance seemed to be at least as equally important as graphics performance.
Posted on Reply
#138
InVasMani
Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.
Posted on Reply
#139
londiste
InVasMani
Perhaps this is part of why AMD wants to buy Xilinx for FPGA's. Even if they lose 5%-10% to either workload be it compute or graphics performance if they've got the posibility of gaining a better parity between the two with a FPGA approach rather than fixed hardware with shoe size fits all approach that can't work for both at the same time efficiently and well it's still better overall approach. In fact over time I would have to say that gap between them with fixed hardware must be widening if anything complicating things.
FPGA is incredibly inefficient compared to fixed hardware.
Posted on Reply
#140
Bronan
I actually do not care at all about all the hyped news
For me the most important is that the card seems to be power efficient and thats more important than the power hungry , super heater for your home nvidia solution. Imagine living in spain and italy with temps above 40c and then not being able to play a game because your silly machine gets overheated by your o so precious nvidia card :)
bug
Ok, who the hell calls Navi2 "Big Navi"?
Big Navi was a pipe dream of AMD loyalists left wanting for a first gen Navi high-end card.
This quote " Something Big is coming is not a lie" because its going to be a big card, they have not said anything about performance the only thing they talk about is a more efficient product. That most people translate that to faster than nvidia is their own vision.
But if it does beat the 3070 then i will consider buying it even though its not such a big step upwards from my current 5700XT which runs darn well.

I really wish that they introduce the AMD Quantum mini pc which was showed at the E3 2015 with current hardware or something similar.
Because i want my systems to be smaller without having to limit the performance too much, i am pretty sure the current hardware could be more than capable to create such a mini pc by now with enough performance.
Posted on Reply
Add your own comment