• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GPU IPC Showdown: NVIDIA Blackwell vs Ada Lovelace; AMD RDNA 4 vs RDNA 3

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
3,254 (1.13/day)
Instructions per clock is a metric used to define and compare CPU architecture performance usually. However, enthusiast colleagues at ComputerBase had an idea to test the IPC improvement in GPUs, comparing it across current and past generations. NVIDIA's Blackwell-based GeForce RTX 50 series faces off against the Ada Lovelace-based RTX 40 generation, while AMD's RDNA 4-powered Radeon RX 9000 lineup challenges the RDNA 3-based RX 7000 series. For NVIDIA, the test used RTX 5070 Ti and 4070 Ti SUPER, aligning ALU counts and clock speeds and treating memory bandwidth differences as negligible. For AMD, the test matched the RX 9060 XT to the RX 7600 XT, both featuring identical ALUs and GDDR6 memory. By closely matching shader counts and normalizing for clock variations, ComputerBase isolates IPC improvements from other hardware enhancements. In rasterized rendering tests across 19 popular titles, NVIDIA's Blackwell architecture delivered an average IPC advantage of just 1% over the older Ada Lovelace.

This difference could easily be attributed to normal benchmark variance. Ray tracing and path tracing benchmarks showed no significant IPC uplift, leaving the latest generation essentially on par with its predecessor when normalized for clock and unit count. AMD's RDNA 4, by contrast, exhibited a substantial IPC leap. Rasterized performance improved by around 20% compared to RDNA 3, while ray-traced workloads enjoyed a roughly 31% gain. Path tracing results were even more extreme, with RDNA 4 delivering nearly twice the FPS, a 100% increase over its predecessor. These findings suggest that NVIDIA's performance improvements primarily stem from higher clock speeds, increased execution unit counts, and enhanced features. AMD's RDNA 4 represents a significant architectural advance, marking its most notable IPC gain since the original RDNA launch.



View at TechPowerUp Main Site | Source
 
Not only there is no performance progress in Ada vs. Blackwell, there is also no energy efficiency progress as well. Blackwell is just bigger 40xx with messed up drivers.

At this point it’s quite clear that nVidia was pursuing AI gains and nothing else for this generation.


I’d just like to know if they succeeded in their pursuit.
 
If accurate, it seems Nvidia has hand a "free" generation to allow AMD to catch up in performance.

Hopefully with a die shrink and some additional AMD gains over RDNA 4, we some actual gains from the next gen.

I've been on a 79000 XTX for two years now. I want these companies to compete a give me better product.

I understand die shrinks have slowed and expectations need to be adjusted to meet these changes, but things seems to particularly suck the last two generations from AMD and NVidia.

At this point it’s quite clear that nVidia was pursuing AI gains and nothing else for this generation.


I’d just like to know if they succeeded in their pursuit.
I hope that their pusuit of AI pays off for gaiming features.

Include all the compute hardware you want for AI, just ACTUAL add game features that are worth while, as they seem to me missing currently, in a quality i'd be willing to use.
 
What are the odds/chances that Wizzard can try to do an IPC comparison for the last 2 or 3 version of Nvidia and AMD's GPUs? It'd be awesome to see how far Ncidia has come form the RTX 2080, to RTX 5080, using clock synced settings. Same for AMD hardware.
 
What are the odds/chances that Wizzard can try to do an IPC comparison for the last 2 or 3 version of Nvidia and AMD's GPUs? It'd be awesome to see how far Ncidia has come form the RTX 2080, to RTX 5080, using clock synced settings. Same for AMD hardware.
That easy for Nvidia since Maxwell (750Ti). Add 50% IPC gen to gen whenever there is a die shrink. Add 0% IPC gen to gen when there is no die shrink.
 
That easy for Nvidia since Maxwell (750Ti). Add 50% IPC gen to gen whenever there is a die shrink. Add 0% IPC gen to gen when there is no die shrink.
A die shrink does not mean that there is no IPC improvement. Die shrinks would normally mean a bigger IPC improvement, since you aren't relying on just the die shrink for additional headroom. I think getting some hard numbers on the difference for die shrinks and generational changes, or lack there of for each new product generations would make for a sweet article.

Lord knows it's the type of article I subscribe to Patreon for.
 
At this point it’s quite clear that nVidia was pursuing AI gains and nothing else for this generation.


I’d just like to know if they succeeded in their pursuit.

There is a surprising lack of decent data for this. Ironically a ton of AI generated articles using Nvidia marketing material but not a single one with something of value.

Blackwell's AI improvements include 4-bit precision support, faster memory, improved error checking (AMD has actually led in quality at the same settings), and hardware support for structured sparsity.

Nvidia claims 2x to 4x in charts but that's with their B200. Given that the B200 is simply two dies, cut that figure in half. Alright so 2x is not bad right? Well that figure too is going to be roughly cut in half given it compares FP4 vs FP8 (incredibly misleading). In typical Nvidia fashion, make comparisons completely worthless by a feature only support on new hardware.

Yes you can have up to 4x in the absolute best case scenario if you spend a massive amount of money but in a like for like comparison where you are comparing AI IPC improvements of new gen vs last gen, I'm not sure there are any improvements. You have to specifically utilize the new features to get the performance boost or purchase physically more silicon (which isn't a sign of progress). The memory speed increase alone will provide some boost but it cannot do all the lifting. Plus consumer cards have far too little VRAM to really take advantage of that speed. 16GB on the 5080 doesn't even run the now aging FLUX Dev model without making quality compromises, let alone what will come out in the next couple of years. Even the 5090 should have had 48GB.
 
There is a surprising lack of decent data for this. Ironically a ton of AI generated articles using Nvidia marketing material but not a single one with something of value.

Blackwell's AI improvements include 4-bit precision support, faster memory, improved error checking (AMD has actually led in quality at the same settings), and hardware support for structured sparsity.

Nvidia claims 2x to 4x in charts but that's with their B200. Given that the B200 is simply two dies, cut that figure in half. Alright so 2x is not bad right? Well that figure too is going to be roughly cut in half given it compares FP4 vs FP8 (incredibly misleading). In typical Nvidia fashion, make comparisons completely worthless by a feature only support on new hardware.

Yes you can have up to 4x in the absolute best case scenario if you spend a massive amount of money but in a like for like comparison where you are comparing AI IPC improvements of new gen vs last gen, I'm not sure there are any improvements. You have to specifically utilize the new features to get the performance boost or purchase physically more silicon (which isn't a sign of progress). The memory speed increase alone will provide some boost but it cannot do all the lifting. Plus consumer cards have far too little VRAM to really take advantage of that speed. 16GB on the 5080 doesn't even run the now aging FLUX Dev model without making quality compromises, let alone what will come out in the next couple of years. Even the 5090 should have had 48GB.

You make some great points! I'd just love to see, so a 2080, 3080, 4080 and 5080, running and the same clocks and memory bandwidth, to see what the years have brought to our world. Yes, it isn't just this easy and i want the same for AMD. I really think the 3k series was the "aces" series for progress and since then we've been boned by everyone.

If Wizzard needs it, I'll drop a donation to get this article published. Rich and informed content is hot!
 
Yep, created topic about RDNA3 vs. RDNA4 in terms of IPC here on TPU forum.
My calculations for Ada vs. Blackwell were accurate, after all.
All Blackwell performance improvements are based on increment of compute units (die scaling) which translates into increased power draw.
So much for those who believed that RTX 5080 with 2/3s the amount of RTX 4090 compute units would beat RTX 4090.

1750690654479.png


RX 9070 XT compared to RX 7900 XT shows massive performance (+44%) improvement per compute unit thanks to 20% higher clocks and the rest is on architectural changes (IPC). We can rule out memory bandwidth being in favor of RX 9070 XT here, as it has significantly lower memory throughput than RX 7900 XT (644 vs. 800 GB/s). RX 9070 XT with RX 7900 XT memory bandwidth would be even faster.
 
Last edited:
A die shrink does not mean that there is no IPC improvement. Die shrinks would normally mean a bigger IPC improvement, since you aren't relying on just the die shrink for additional headroom. I think getting some hard numbers on the difference for die shrinks and generational changes, or lack there of for each new product generations would make for a sweet article.

Lord knows it's the type of article I subscribe to Patreon for.
I think you read my comment backwards.
 
AMD hasn't really dominated anything discrete GPU wise since acquiring ATI. They have held their own sometimes but that's about it.
Yeah I know, /s required :D
 
A die shrink does not mean that there is no IPC improvement. Die shrinks would normally mean a bigger IPC improvement, since you aren't relying on just the die shrink for additional headroom. I think getting some hard numbers on the difference for die shrinks and generational changes, or lack there of for each new product generations would make for a sweet article.

Lord knows it's the type of article I subscribe to Patreon for.
There was no die shrink between Ada and Blackwell. Both are made using TSMC 4N process which is TSMC's 5nm process tailored specifically for Nvidia's purposes.
AMD RX 9000 series uses TSMC's N4P process which offers a bit better density than what Nvidia ended up with. Even with RX 9000 we can't talk about die shrink (it's improved TSMC 5nm process).
 
Yeah I know, /s required :D
Sure, yep I get the sarcasm but the attitudes of some of the Nvidia users have gotten so extreme lately, I don't want them taking anything too seriously. We regular PC enthusiasts know the capabilities of the different hardware over time but even though most of us are getting info from the same sources, the Nvidia enthusiasm is, how shall we say, distorting reality just a bit. :)
 
RX 9070 XT compared to RX 7900 XT shows massive (+44%) IPC improvement per compute unit thanks to 20% higher clocks and other part is on architectural changes
IPC stands for instruction per cycle, thus higher clocks do not contribute to the IPC metric, but to overall performance. So if the clocks are 20% higher, and we're seeing 144% of overall performance, we have roughly 20% IPC gain generation-to-generation.
 
Next a Power Connector Failure showdown.
 
Yep, created topic about RDNA3 vs. RDNA4 in terms of IPC here on TPU forum.
My calculations for Ada vs. Blackwell were accurate, after all.
All Blackwell performance improvements are based on increment of compute units (die scaling) which translates into increased power draw.
So much for those who believed that RTX 5080 with 2/3s the amount of RTX 4090 compute units would beat RTX 4090.

View attachment 404965

RX 9070 XT compared to RX 7900 XT shows massive (+44%) IPC improvement per compute unit thanks to 20% higher clocks and other part is on architectural changes. We can rule out memory bandwidth being in favor of RX 9070 XT here, as it has significantly lower memory throughput than RX 7900 XT (644 vs. 800 GB/s). RX 9070 XT with RX 7900 XT memory bandwidth would be even faster.

Quoted for emphasis. I do this every generation to see if anything has moved the price/performance/power metrics and this matches what I've figured using the data here from the TPU charts.

The 40% 7600XT --> 9060 XT IPC improvement with the same core count and similar memory shows that something was terribly broken in RDNA3 and AMD did say that RDNA4 was a bugfix.

I wanna know what that damn bug was!

Edit: My handwavy guess is the doubled throughput FP cores which were added in RDNA 3 didn't work properly which is why the 7600 was only marginally faster than the 6650 XT with the old cores. And that got fixed.
 
Last edited:
Back
Top