• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GeForce RTX 5000 gaming "Blackwell" power efficiency improvement

10tothemin9volts

New Member
Joined
Jun 26, 2023
Messages
14 (0.04/day)
How good is NV's upcoming gaming "Blackwell" power efficiency improvement is going to be?
I compared the last 4 X070 gens and, based on the average, I get 33% power efficiency improvement:
Code:
               Power efficiency improvement over prev gen
5070           33%[2]
4070           47%
3070           37%
2070 Super FE  -3%
1070           51%[1]
---
[1]:
1070: 102FPS/150W
 980: 74FPS/165W
      efficiency improvement: (((102/150)/(74/165))-1)*100 = ~51%
[2]: (51%-3%+37%+47%)/4 = 33%

en.wikipedia.org/wiki/Blackwell_(microarchitecture) says:
Process node

Blackwell is fabricated on the custom 4NP node from TSMC. 4NP is an enhancement of the 4N node used for the Hopper and Ada Lovelace architectures with an increase in transistor density. With the enhanced 4NP node, the GB100 die contains 104 billion transistors, a 30% increase over the 80 billion transistors in the previous generation Hopper GH100 die.[11] As Blackwell cannot reap the benefits that come with a major process node advancement, it must achieve power efficiency and performance gains through underlying architectural changes.[12]

Not sure if the 30% improvement means smaller transistors, which would consume less power. Usually architectural changes don't amount to much, but AMD RDNA2 has shown it's possible:
en.wikipedia.org/wiki/RDNA_2:
Power efficiency

AMD claims that RDNA 2 achieves up to a 54% increase in performance-per-watt over the first RDNA microarchitecture.[16] 21% of that 54% improvement is attributed to performance-per-clock enhancements, in part due to the addition of Infinity Cache.[17]

But RDNA2 increased the cache and GeForce RTX 4000 series already did that?

So, in total, considering the almost same lithography, a ~33% power efficiency improvement per same core type would be good.
 
Last edited:
Joined
Sep 17, 2014
Messages
21,534 (6.00/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Euh...

Yes.

50% would be better. Your point? We can also throw some stuff at the wall and see what sticks. I mean, you just provided the numbers yourself that there is no rhyme or reason, and definitely no consistency to the gen-to-gen power efficiency improvements. Ampere could have been a lot better but then we would not have seen the major gain going to Ada. Blackwell might be clocked very high to allow smaller chips and it won't do a thing on the efficiency scale.

Its anyone's guess and a big part of this is not the hardware or the node, but rather, the market.
 

10tothemin9volts

New Member
Joined
Jun 26, 2023
Messages
14 (0.04/day)
Euh...

Yes.

50% would be better. Your point? We can also throw some stuff at the wall and see what sticks. I mean, you just provided the numbers yourself that there is no rhyme or reason, and definitely no consistency to the gen-to-gen power efficiency improvements. Ampere could have been a lot better but then we would not have seen the major gain going to Ada. Blackwell might be clocked very high to allow smaller chips and it won't do a thing on the efficiency scale.

Its anyone's guess and a big part of this is not the hardware or the node, but rather, the market.
50% would be very nice. The point is you have to be realistic considering the supposedly almost same TSMC process.

Code:
 970 - 145W TDP - TSMC
1070 - 150W TDP - TSMC - +3.5% TDP
2070 - 175W TDP - TSMC - +16.7% TDP
3070 - 220W TDP - SAMSUNG - +25.7% TDP
4070 - 200W TDP - TSMC - -9.09% TDP (basically a 2 full nodes jump, that's why they could even reduce the TDP slightly)

They can't increase the TDP by much anymore (maybe that's why they went with DLSS/upscaling, RT and FG, as of latest, in the first place). The point is, of course (and when u say there may only be a clock improvement), that to get, say, a +30% performance improvement [not a power efficiency improvement], a 5070 would need to be 260W TDP, but that's not even it: A 5090 would need to be 585W TDP. Mobiles be like: .. .

The "30% increase" in transistor density would still allow for lithography-based power efficiency improvements.

For the +30% perf improvement I guess NV could do: +10% TDP increase (the 3070 already has a 220W TDP) + hope for 10% TSMC lithography process power efficiency improvement + 10% architectural improvement (partially maybe also thanks to the TSMC process) (450W * 1.1 = 5090 495W TDP, could be fun if they would do it and if it's still true that melting cables/power connectors are happening with the 4090' 450W TDP). The 20% power efficiency improvement wouldn't be a new low, because the RTX 2000 series only has a single digit improvement, if any (because lot of space got used by the new Tensor and RT cores).

I guess they (NV or AMD for that matter, as both are cooking with TSMC) could also push more upscaling, RT, FG or something new to get around the only 20% power efficiency improvement (the 20% are just an example for a rather low value and I could be totally wrong).

I guess my other point is: I hear constant rumors of +50% performance improvements, but just how? (yes +20% efficiency +30% TDP, but as already said..) I'm just saying I'm curbing my power efficiency and performance improvements expectations for the upcoming GeForce RTX 5000 series gen. and hope I'm gonna be pleasantly surprised.
 
Joined
Sep 17, 2014
Messages
21,534 (6.00/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
I've learned a few generations back that its pointless to be doing predictions on this. A lot of it is also game specific. The ballpark you can estimate towards is pretty large if you weigh all the ifs and buts. And a lot of it isn't up to logic either, or hardware, but external factors. And also unforeseen internal factors. I mean, Ada sells in part because of the heavy push on DLSS/3. Its very likely Nvidia will market future cards with more proprietary / software improvement elements. How much of those you can actually benefit from, remains to be seen, especially if they're game specific.

Still I applaud your effort for the approach :) We'll see where it lands...
 
Joined
Nov 27, 2023
Messages
1,563 (6.80/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original)
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (23H2)
I suggest we go even further beyond and start speculating about improvements that Rubin will bring. Blackwell is so 2023.
 
Joined
Oct 4, 2023
Messages
79 (0.28/day)
I would imagine gddr7 brings some amount of efficiency to the table (potentially)
An example i can lower voltage/clocks on my 4090 and overclock vram and lose negligible amount of performance for decent power savings
 
Joined
Dec 31, 2020
Messages
826 (0.64/day)
Processor E5-2690 v4
Motherboard VEINEDA X99
Video Card(s) 2080 Ti
Storage NE-512 KingSpec
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
4N to 4NP +6% perf @ iso-pwr as well as density.. Compared to 4070/4080, 5070/5080 only slightly increases the number of shaders 5888 vs. 6144 -6400 and 9752 vs. 10240-10752. Probably 20% perf thanks to GDDR7 alone and in RT. Wait for the next gen to see any difference.
 

10tothemin9volts

New Member
Joined
Jun 26, 2023
Messages
14 (0.04/day)
Thanks, we'll see indeed. It's a solid approach, because it's all about the lithography process: Even DLSS, which uses custom cores to improve graphics at same power consumption, can only improve so gen-over-gen if the lithography allows for it (sure, a few single digit percent of architectural improvements (first double digits gen-over-gen, then single digits) on the same lithography process can be done for a few gens).

I would imagine gddr7 brings some amount of efficiency to the table (potentially)
An example i can lower voltage/clocks on my 4090 and overclock vram and lose negligible amount of performance for decent power savings
The VRAM consumes like up to ~5 Watt per chip and I don't think GDDR7 is going to improve it by much, if any. GDDR7 improvements are bandwidth and density, both at roughly the same power consumption.

4N to 4NP +6% perf @ iso-pwr as well as density.. Compared to 4070/4080, 5070/5080 only slightly increases the number of shaders 5888 vs. 6144 -6400 and 9752 vs. 10240-10752. Probably 20% perf thanks to GDDR7 alone and in RT. Wait for the next gen to see any difference.
Only +6% @ iso power? That's a disappointingly low number. Maybe NV can ask TSMC to get it to ~10% bc they already have a custom process in their current GeForce RTX 4000 gen or get the secret arch improvements out of their secret drawer (of course I don't know how/in what way custom it really is, maybe it's not power efficiency) (such process may be more expensive (maybe more EUV layers and stuff), but it's NV, people gonna buy their GPUs).
I don't see how 20% perf thanks to GDDR7 alone (only if you imply provided for other improvements like process and arch), it just allows for the bandwidth improvement of ~30-50% [at same power], which then, in turn, may allow for the perf improvement [at same power].

Indeed, we will see.
 
Joined
Oct 4, 2023
Messages
79 (0.28/day)
Did a (very) little test, stock vs vram+1000mhz, vram overclock allowed the 4090 to run with lower core clocks (thus lower voltage too) while maintaining stock performance and there was around 8% power efficiency gain doing that
So while vram alone might not be a huge part of total power consumption, in combination with higher bandwidth i'd still expect a decent (total) efficiency gain thanks to gddr7

Stock:
stock.png
vram + 1000mhz:
vram+1000.png
 
Joined
Aug 12, 2019
Messages
1,858 (1.03/day)
Location
LV-426
System Name Custom
Processor i9 9900k
Motherboard Gigabyte Z390 arous master
Cooling corsair h150i
Memory 4x8 3200mhz corsair
Video Card(s) Galax RTX 3090 EX Gamer White OC
Storage 500gb Samsung 970 Evo PLus
Display(s) MSi MAG341CQ
Case Lian Li Pc-011 Dynamic
Audio Device(s) Arctis Pro Wireless
Power Supply 850w Seasonic Focus Platinum
Mouse Logitech G403
Keyboard Logitech G110
Efficiency is good, but i want to see better performance across the board x60 x70 x80, because consumer is paying top dollars now and 40 series gpu is pricey for the perf it offers...
that said i felt 4070 super and 4070ti super are great gpus and are okayish for its current price but can be slightly cheaper..
 
Joined
Dec 31, 2020
Messages
826 (0.64/day)
Processor E5-2690 v4
Motherboard VEINEDA X99
Video Card(s) 2080 Ti
Storage NE-512 KingSpec
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
2-slot and 2.9Ghz could point to an N3 node being used, that was the original rumor.
 
Joined
Apr 2, 2011
Messages
2,707 (0.56/day)
So...what is the endgame to this speculation? I'm not seeing the direct link from A to B...only the indirect link of A to N.

Let me elaborate. Total power consumption is a function not of efficiency, but a net output efficiency. IE, I can run one engine at 95% of its rated output and get a desired value or I can run two engines at 47.5% of their rated value...and if the efficiency of the engine at 47.5% is greater than at 95% output my twin engine solution is actually more efficient despite the same inputs and outputs due to process variation. This sounds silly...but you then compare engines that run at different efficiencies, and pushed to different outputs, and the result is a silly non-answer to a problem. Which is more efficient is not a straight comparison when you have so many different components shifting behind the scene.


Let me go one better. The 1070 had no real ray tracing. The 4070 does. If you take a game that only uses raster, then is the "efficiency" of the 4070 better because it's using less energy...but has idle components for ray trace that are literally doing nothing? I'd hazard no...but I'd also suggest that instead of efficiency by power consumption we look at something like calculations completed per energy unit used. At which point you are then comparing apples to apples...but your average usage case is probably closer to a fruit salad than a tin of applesauce. As such...this all feels like trying to generate hype for nothing...almost like you're already trying to internally justify buying Blackwell before ever seeing anything. That's fine...just not useful for anybody. If you want to buy Blackwell without comparing it to other stuff then do so...I'll be sitting here waiting for the pricing on old inventory of 4xxx series to make them viable, or to see whatever AMD decides to compete with.
 

ARF

Joined
Jan 28, 2020
Messages
4,309 (2.65/day)
Location
Ex-usa | slava the trolls
Let me go one better. The 1070 had no real ray tracing. The 4070 does. If you take a game that only uses raster, then is the "efficiency" of the 4070 better because it's using less energy...but has idle components for ray trace that are literally doing nothing?

All chips at all times have transistors which are not loaded. During gaming, the media engine, during video playback, the shaders, during normal graphics load, the ray-tracing units, some games can't utilise the shaders (do not give parallel enough tasks), others don't utilise the full memory throughput and capacity, etc.
 
Top