• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GPU IPC Showdown: NVIDIA Blackwell vs Ada Lovelace; AMD RDNA 4 vs RDNA 3

Sure, yep I get the sarcasm but the attitudes of some of the Nvidia users have gotten so extreme lately, I don't want them taking anything too seriously. We regular PC enthusiasts know the capabilities of the different hardware over time but even though most of us are getting info from the same sources, the Nvidia enthusiasm is, how shall we say, distorting reality just a bit. :)
Screw what other people do or feel man. You/we arent gonna change any of it. Humans be human ;)

Convincing is a simple matter of proving the point. Every time some shit occurs; like a melting 12v pin, planned obscolescence, price manipulation etc. Or just a better product - and thats where we hit the core of the issue dont we...
 
Quoted for emphasis. I do this every generation to see if anything has moved the price/performance/power metrics and this matches what I've figured using the data here from the TPU charts.

The 40% 7600XT --> 9060 XT IPC improvement with the same core count and similar memory shows that something was terribly broken in RDNA3 and AMD did say that RDNA4 was a bugfix.

I wanna know what that damn bug was!

Edit: My handwavy guess is the doubled throughput FP cores which were added in RDNA 3 didn't work properly which is why the 7600 was only marginally faster than the 6650 XT with the old cores. And that got fixed.
At least some of that 40% increase was the 15% increase in clock speed. I looked at the latest TPU Sapphire 9060XT review and the TPU Sapphire 7600XT review. The clock speed is 15% higher on the 9060XT.
 
At least some of that 40% increase was the 15% increase in clock speed. I looked at the latest TPU Sapphire 9060XT review and the TPU Sapphire 7600XT review. The clock speed is 15% higher on the 9060XT.

That leaves about 21% leftover for IPC increase which is more modest but appreciable. I'm still quite interested in that bugfix!
 
I expect UDNA to have similar shifts over RDNA4 per CU:

20% gen ras
30% RT
100% path tracing

Some of this has already been rumored.


The above combined with a doubling of CUs from 64 to 128 would result in a dominating GPU product.
Kepler said he was just making stuff up. It’s hilarious to see Twitter posts being taken as some kind of evidence.
 
That alone proves that Nvidia doesn't care at all any more about gaming but AI. Otoh, AMD did great jump in performance and efficiency closing to Nvidia more than anyone anticipated since Nvidia stood still. Arrogance almost always bites you back.
 
Not only there is no performance progress in Ada vs. Blackwell, there is also no energy efficiency progress as well. Blackwell is just bigger 40xx with messed up drivers.
And no 32bit physx …
 
It'd be interesting to see a similar test done for compute workloads, instead of games.
 
To make sure you are not making stuff up, can you provide the Twitter post showing that he said that?

Why would I make stuff up?
1750708654666.png
 
That alone proves that Nvidia doesn't care at all any more about gaming but AI. Otoh, AMD did great jump in performance and efficiency closing to Nvidia more than anyone anticipated since Nvidia stood still. Arrogance almost always bites you back.
NVidia doesn’t really have to care in this instance. All AMD did is reach near-parity performance-wise and still trails behind in terms of ecosystem and software support. This basically won’t affect the market in a meaningful way, and yes, noting that NV focuses their efforts on AI and datacenter is… obvious to anyone sane? Same as AMD developing Zen 5 for the needs of enterprise. That’s just business. RDNA 4 is the last “gaming oriented” GPU architecture, probably ever, even AMD saw folly in trying that.

I am actually somewhat curious now if the measurements in this article are another “Zen 5%” sort of deal and Blackwell is actually significantly faster for the tasks it was designed for. It’s obviously almost impossible to verify since that would require getting one hands on enterprise level accelerators, but still, would be interesting.
 
There is a surprising lack of decent data for this. Ironically a ton of AI generated articles using Nvidia marketing material but not a single one with something of value.


Actually...

Not a large leap in a (limited, admittedly) suite of AI/ML/Pro workloads over RTX40 either.

Will this improve with drivers or support? Remains to be seen.
 
Not a large leap in a (limited, admittedly) suite of AI/ML/Pro workloads over RTX40 either.
Wai, wha? In AI and ML workloads the 5060Ti is, by those very tests, 20-40% faster than the very nearly same core count 4060Ti. That’s a very significant leap. Blender and others are understandably similar, those aren’t AI workloads. So it just points again to the fact that Blackwell was VERY AI/ML optimized.
 
+20% means to be on pair with the 5080 2 years later
That's the 64CU version. But who assumes that AMD will not introduce 96CU or even 128CU versions with UDNA?
Yep, created topic about RDNA3 vs. RDNA4 in terms of IPC here on TPU forum.
My calculations for Ada vs. Blackwell were accurate, after all.
All Blackwell performance improvements are based on increment of compute units (die scaling) which translates into increased power draw.
So much for those who believed that RTX 5080 with 2/3s the amount of RTX 4090 compute units would beat RTX 4090.

View attachment 404965

RX 9070 XT compared to RX 7900 XT shows massive (+44%) IPC improvement per compute unit thanks to 20% higher clocks and other part is on architectural changes. We can rule out memory bandwidth being in favor of RX 9070 XT here, as it has significantly lower memory throughput than RX 7900 XT (644 vs. 800 GB/s). RX 9070 XT with RX 7900 XT memory bandwidth would be even faster.
With OC my 9070 XT has 730GB/s (2835 real, 2850 on the slider). Not quite 7900 XT level, but better that the default 644GB/s.
Also i said from the start that 5080 will never reach 4090 performance. Some people here genuinely believed that.
I wanna know what that damn bug was!

Edit: My handwavy guess is the doubled throughput FP cores which were added in RDNA 3 didn't work properly which is why the 7600 was only marginally faster than the 6650 XT with the old cores. And that got fixed.
I think it's pretty clear what the bug was - RDNA3 was the first (and thus far the only only) chiplet based gaming dGPU. Naturally such innovations have growing pains. It never quite reached it's true potential. My guess is due to the chiplet communication issues. Only the 7600 series in that series was fully monolithic.
Kepler said he was just making stuff up. It’s hilarious to see Twitter posts being taken as some kind of evidence.
"making stuff up" and "guessing" are not the same. The first one implies malice or lack of knowledge on the subject. The second could be considered an educated guess.
And no 32bit physx …
Also no more hot spot sensor.
 
Wai, wha? In AI and ML workloads the 5060Ti is, by those very tests, 20-40% faster than the very nearly same core count 4060Ti. That’s a very significant leap. Blender and others are understandably similar, those aren’t AI workloads. So it just points again to the fact that Blackwell was VERY AI/ML optimized.


At first glance, yes, BUT, consider that the 5060Ti is already faster in gaming compared to the 4060Ti, and it draws more power too. You don't even have to normalize it to IPC (like this article), or power consumption; you already see a far smaller boost, around 10-20% just normalizing it to the average gaming performance boost vs 4060Ti. But again, maybe there are other performance figures that paint a different picture.

Point is, given how much hype nVidia put into AI while clearly not caring as much about gaming, this is underwhelming.
 
It's pretty good to see AMD stepping up when most of us thought they were going to quit. Really curious to see RDNA5 and what they do to leaverage Xbox partnership.
 
But maybe the 5080 Ti Super will :)
Doubtful. 5080 Super as it's currently speculated will equal the memory capacity and speed of 4090 (24GB, ~1TB/s), but still be a a far cry from 4090 core config.
In order to truly equal 4090 the 5080 Super/Ti would have to be based on the RTX Pro 5000 based GB202 at the very minimum (with 24GB, naturally) and i suspect even that would fall short without a significant clock speed bump. https://www.techpowerup.com/gpu-specs/rtx-pro-5000-blackwell.c4276
 
Go ahead and get hyped for the next two years. I’ve watched AMD fans get aboard the hype train for decades, only to be disappointed when it arrives at the station.
Are Intel and Nvidia fans any different?

Arrow Lake was supposed to be amazing. Much better power efficiency and equaling or surpassing Zen 5. What they got was often worse than 14th gen in many areas.

Or Nvidia. Two years to release what we know know is just a Lovelace refresh. Lovelace itself was not well received despite lofty expectation of moving back to TSMC on a much better node.
 
Actually...

Not a large leap in a (limited, admittedly) suite of AI/ML/Pro workloads over RTX40 either.

Will this improve with drivers or support? Remains to be seen.

Yep and I'm willing to bet most of that is the new memory.

Wai, wha? In AI and ML workloads the 5060Ti is, by those very tests, 20-40% faster than the very nearly same core count 4060Ti. That’s a very significant leap. Blender and others are understandably similar, those aren’t AI workloads. So it just points again to the fact that Blackwell was VERY AI/ML optimized.

Tom's states 15-20% on average. Most of that is going to be the higher TDP and faster memory. At the end of the day IPC improvement to the AI cores are very small if any.
 
Tom's states 15-20% on average.
They do not state that. There are three AI/ML tests and that’s applicable only to Procyon. It’s for 40 for MLPerf and around 30 for SPEC.
And this is the consumer cards, which weren’t what I was wondering about initially. I would be much more interested to see how Blackwell accelerators perform compared to Ada, though that would be almost impossible to test apples to apples.
 
Sounds like copium to me

An educated guess is typically labeled as such. A nondescript guess is as good as making stuff up
Kepler_L2 has a pretty good track record. I would not dismiss anything he says just because it's a guess.
Besides his "guess" is nothing outrageous. It merely implies that AMD will repeat the same IPC uplift they already did with RDNA4, now with UDNA.
 
They do not state that. There are three AI/ML tests and that’s applicable only to Procyon. It’s for 40 for MLPerf and around 30 for SPEC.


Yes, they do:

1750726807055.png



As stated in the article, MLPerf performance is heavily tied to VRAM size.

SPECWorkstation was not 30%, it was 25%:

1750726994315.png



My earlier comment stands, most of that uplift is likely the memory and not IPC.

And this is the consumer cards, which weren’t what I was wondering about initially. I would be much more interested to see how Blackwell accelerators perform compared to Ada, though that would be almost impossible to test apples to apples.

Whether it's consumer or enterprise is irrelevant when they use the same architecture. IPC will still be the same, the only difference is that enterprise cards will be larger with better memory.
 
Back
Top