• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD RDNA3 Offers Over 50% Perf/Watt Uplift Akin to RDNA2 vs. RDNA; RDNA4 Announced

ARF

Joined
Jan 28, 2020
Messages
3,931 (2.55/day)
Location
Ex-usa
MCM isn't a magical secret sauce that will miraculously make RDNA3 way more power efficient; it's simply a way to make larger GPUs without running into the issue of yields or other limiting factors (such as reticule size). So think of the benefits of MCM as just the benefits of having larger silicon areas and nothing more. But having a larger total silicon area can bring a lot of obvious benefits, and the fact that AMD cites MCM as a factor in their supposed 50% power efficiency boost tells us how they plan to utilize that extra silicon... for now, at least.

Cutting the large dice into ever smaller slices is production cost reduction, not IPC or performance related exercise.
But it may have a slight background advantage of improving the qualities of the overall solution, my guess is not more than 5% overall.
I mean better thermals, improved management of the integrated parts.

The audio processors should be cut altogether, I don't understand why the GPUs must include audio device which costs transistors on the dice.
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Obviously I cannot know this for certain, but it's gotta be all about the yields. Navi 22 is only in 6700XT because they clearly have near perfect yields on that chip. Meanwhile Navi 21 6900XTs outnumber the 6800s by a factor of [I don't how much exactly, but it must be a lot lol] for the exact same reasons. Had they known how few failed 21s they'll be getting they would have designed a Navi that slots in-between the current 21s and 22s, but since mid-range is where AMD usually gets their sales at and with all the supply shortages that were rearing their heads at the time they just doubled down on the smaller chips and called it a day.
That's exactly why I think the segmentation doesn't make sense. It's clear they veered pessimistic in their initial yield estimates, but even accounting for that dropping all the way to 40 CUs for Navi 22 from the 60CU low-end bin of Navi 21 doesn't add up to me. If I were to guess, it might be that they were planning for Navi 22 to be a massive volume push for mobile, which would have made a larger die size expensive. But then that didn't materialize either - there aren't many 6700M/6800M/6850M XT laptops out there. And then theres Navi 23 which is just weirdly close - just 8 CUs less? - leaving no room for cut-down Navi 22 anywhere. I mean, TSMC 7nm yields were pretty well known - AMD had been using the process for a couple of years at this point. Were they expecting to have tons of faulty dice for every SKU? Was the plan all along for Navi 22 to be some quickly forgotten in-between thing that nobody really cares about? Seems like such a waste to me. Even just adding 8 CUs to Navi 22 would have made it a lot more flexible, allowing for a more competitive 6700 XT and a cheaper, lower power 6700 if they had wanted it. Such a weird decision.

Cutting the large dice into ever smaller slices is production cost reduction, not IPC or performance related exercise.
But it may have a slight background advantage of improving the qualities of the overall solution, my guess is not more than 5% overall.
I mean better thermals, improved management of the integrated parts.
It's only really a production cost reduction if it allows for small enough chips to noticeably increase wafer area utilization (i.e. reducing the area used on incomplete dice etc.) or if it improves yields (and at least TSMC 7nm yields are near perfect at this point). It won't meaningfully affect thermals or anything else, and of course there's always additional interconnect power with an MCM solution. The main advantage is optimization and splitting a large, complex die into smaller, simpler chiplets.
The audio processors should be cut altogether, I don't understand why the GPUs must include audio device which costs transistors on the dice.
Are you actually arguing for cutting basic functionality that GPUs have had for more than a decade? That's a bad idea. HDMI and DP both carry audio signals, and tons of people use audio outputs (either speakers or headphone connectors) on their monitors/TVs. And the transistor cost of audio processing is tiny compared to essentially everything else. This really isn't worth caring about, and cutting it would piss a lot of people off with minimal gains to show from it.
 

ARF

Joined
Jan 28, 2020
Messages
3,931 (2.55/day)
Location
Ex-usa
Are you actually arguing for cutting basic functionality that GPUs have had for more than a decade? That's a bad idea. HDMI and DP both carry audio signals, and tons of people use audio outputs (either speakers or headphone connectors) on their monitors/TVs. And the transistor cost of audio processing is tiny compared to essentially everything else. This really isn't worth caring about, and cutting it would piss a lot of people off with minimal gains to show from it.

It causes driver related issues, and yes, I have never used the graphics integrated audio. I have a normal audio processor integrated on the motherboard..
 
Joined
Dec 30, 2021
Messages
358 (0.43/day)
Cutting the large dice into ever smaller slices is production cost reduction, not IPC or performance related exercise.
But it may have a slight background advantage of improving the qualities of the overall solution, my guess is not more than 5% overall.
I mean better thermals, improved management of the integrated parts.

The audio processors should be cut altogether, I don't understand why the GPUs must include audio device which costs transistors on the dice.
Well, the point I'm making is that MCM allows them to do 800+ mm2 dies in a way that's more practical for them, and it could even allow them to do GPU sizes that were previously impossible. We probably won't see it for RDNA3, but don't be surprised if you see 1000+ mm2 GPUs eventually. Again, it's just a means for them to put more silicon on the card, which has its pros when it comes to power efficiency.

And please don't cut the audio. I actually use it! Almost everyone who HDMI-outs to a TV uses GPU audio.
 

ARF

Joined
Jan 28, 2020
Messages
3,931 (2.55/day)
Location
Ex-usa
Well, the point I'm making is that MCM allows them to do 800+ mm2 dies in a way that's more practical for them, and it could even allow them to do GPU sizes that were previously impossible. We probably won't see it for RDNA3, but don't be surprised if you see 1000+ mm2 GPUs eventually. Again, it's just a means for them to put more silicon on the card, which has its pros when it comes to power efficiency.

Yeah, AMD does not want to reach or even think about the absolute production limit - the reticle size. Nvidia, on the other hand, does it almost every generation.

And please don't cut the audio. I actually use it! Almost everyone who HDMI-outs to a TV uses GPU audio.

I have never succeeded in making it work - there is always silence.
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
Sorry, but what are these numbers you're working with? Are you inventing them out of thin air? And what's the relation between the different numbers? You also seem to be mixing power and performance? Remember, performance (clocks) and power do not scale linearly, and any interconnect will consume power. You're making this out to be far simpler than it is. Other than that all you're really saying here seems to be the age-old truism of wide and slow chips generally being more efficient. And, of course, you're completely ignoring the cost of using two dice to deliver the performance of one.

What? A 1/6th/16.67% area reduction from a node change will be 16.67% no matter how large your die, no matter how many of them you combine. A percentage/fractional reduction in area doesn't add up as you add parts together - that number is relative, not absolute.

It's absolutely possible that an MCM approach can allow for power savings, but only if it allows for larger total die sizes and lower clocks. Otherwise it's no different from a monolithic die, except for the added interconnect power. And, of course, larger dice are themselves a fundamental problem when per-transistor costs are no longer dropping noticeably, which is leading to rapidly rising chip prices.

Again, this isn't accurate. A GPU die has its heat very evenly spread across the entire die (unlike CPUs which are very concentrated), as most of the die is compute cores. Spreading this across two dice won't affect thermals much, as both dice will still be connected to the same cooler - it's not like you're running them independently of each other. Assuming the same power draw and area for a monolithic and MCM solution, the thermal difference between the two will be minimal. And, crucially, you want the distance between dice on package to be as small as possible to keep latencies low.

Fans generally run directly off 12V and don't rely on VRMs on the GPU, just a fan controller IC sending out PWM signals (unless the fans are for some reason controlled through voltage, which is rather unlikely).


Idk, I think the truth is somewhere in the middle. Both chips have distinct qualities and deficiencies. The 6800 is fantastically efficient; the 6700 XT gets a lot of performance out of a relatively small die. Now, the 6700 XT is indeed rather poor in terms of efficiency for an RDNA2 chip, but it still beats out the majority of Ampere GPUs, so ... meh. (The 6500XT is another matter entirely.)

I still can't wrap my head around AMD's RDNA2 segmentation though. The 16-32-40-80CU lineup just doesn't make sense IMO, and kind of forced them to tune the 6700XT the way they did. 20-32-48-80 or something like that would have made a lot more sense. It's also weird just how few SKUs Navi 22 has been used in overall.

The wattage was just a example case figure point to work around for illustrative purposes in how things tie together. It's irrelevant AMD can scale things however they see fit and pretty linearly or non linearly for any given aspect of the design to a point barring legitimate limitations like silicone space.

I was thinking more along the lines of 100w chip split into two 50w chips and a 50% efficiency uplift on each. across. If you take 50% efficiency uplift on each chip you end up at 2x performance at the same wattage. There approximately 1/6 die size reduction as well from 6nm to 5nm so if that were linear and it's not exactly that simple, but you'd end up with a 50w part consuming more like 41.66w each instead or 83.34w in total. That leaves like 16.66w to account for or efficiency power per watt.

You also have double the infinity fabric cache and we've seen measurable gains in efficiency thru that. Plus AMD indicated architectural optimization improvements over RDNA2 whatever that implies. For gaming it's hard to say and probably some stuff to do with variable rate shading/upscale/compression and overall configuration balance between TMU's/ROPs and so on scaling. You've also got memory performance and efficiency uplift isn't standing still either. Simply put there are plenty of possible ways AMD can extract further performance and efficiency to reach that target figure.

Heat concentration will be worse in a square die versus that same die chopped in two and elongated to form more of a rectangle in series with each other that can spread heat better w/o contending with hot spots. The whole GPU/CPU heat concentration thing is entirely irrelevant to any of that. The issue is hot spots are easier to tackle and deal with by cutting them in half and elongating heat dispersion. It's like taking a bed of coals in a wood stove or camp fire if you spread them out it'll burn out/cool down more quickly. In this instance you're spreading heat to the IHS and coolers mounting surface in turn.

Power savings a p-state could control one of the two dies to turn it off or place it into a deeper sleep power savings mode below a given work threshold. Additionally it could do something like Ethernet flow control and auto negotiate round-robin deep sleep between each of the GPU die's when it goes below thresholds. Basically they can modulate between the two chip dies for power savings and better heat dispersion between them to negate throttling under heavier workloads leading to more even sustained boost performance.

You reach a saturation point naturally full load like you would with water cooling and ambient temperature rise or any cooling really in regard to cooling TDP effectiveness, but you get there more slowly and have more evenly distributed performance along the way.

In terms of fans idk, but GPU Power Level goes up about 1% and Total Power Level goes up about 3% if if I change my GPU's fan speed from 45% to 100% according to NVIDIA Inspector's readings. I could've sworn it can even play a role in boost clocks in some instances. The power has to come from either PCIE bus or PCIE power connector in either case and will attribute some to warming the PCB in turn due to higher current.

I could've sworn there were instances with Maxwell where if you lower fan RPM's sometimes you can end up with a better boost clock performance than cranking the fan RPM's to 100% contrary to what you'd expect. That's in regard to fairly strict power limits though with a undervolt and overclock scenario to extract better performance per watt. The overall jist of it was fan speed power eats into the boost speeds expected power drawn within the power limits imposed.

In terms of worse RNDA2 SKU's from a balance standpoint with where they are positioned performance and cost consideration wise 6500XT and 6650XT are two of the worst. That unofficial 6700 non XT was done for a reason the 6650XT is a pitiful uplift over the 6600XT and between the 6700XT so it basically better fills that gap not that it's a heck of a lot better. From a performance standpoint it does a better job fill the gap from a price one I didn't check. The 6400XT isn't so bad the PCIE x4 matter is less problematic with it given the further cut down design it doesn't suffocate it quite the same way.
 
Last edited:
Joined
Dec 30, 2021
Messages
358 (0.43/day)
I was thinking more along the lines of 100w chip split into two 50w chips and a 50% efficiency uplift on each. across. If you take 50% efficiency uplift on each chip you end up at 2x performance at the same wattage.
You're adding percentages where you shouldn't be again. 50% efficiency gain is 50% efficiency gain no matter how the silicon is divided. It doesn't become a 2x perf/watt gain just because there are two chips now.
 

ARF

Joined
Jan 28, 2020
Messages
3,931 (2.55/day)
Location
Ex-usa
Have we seen the actual dice configuration for the Navi 31?
I guess it will be like this:

1654890412560.png
 
Joined
Apr 1, 2017
Messages
420 (0.16/day)
System Name The Cum Blaster
Processor R9 5900x
Motherboard Gigabyte X470 Aorus Gaming 7 Wifi
Cooling Alphacool Eisbaer LT360
Memory 4x8GB Crucial Ballistix @ 3800C16
Video Card(s) 7900 XTX Nitro+
Storage Lots
Display(s) 4k60hz, 4k144hz
Case Obsidian 750D Airflow Edition
Power Supply EVGA SuperNOVA G3 750W
b-but there's no way a 6800 XT would use less power than the 2080 ti AND be faster!
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
You're adding percentages where you shouldn't be again. 50% efficiency gain is 50% efficiency gain no matter how the silicon is divided. It doesn't become a 2x perf/watt gain just because there are two chips now.
Yeah confusing myself on the matter brain fart moment I guess where I'm trying to warp my head around it in way that makes sense. So it's a bit like GDDR6X PAM4 divided between two die's that can each double data transfer rates that can access each half of the VRAM and I/O can micro manage in essence!?

Have we seen the actual dice configuration for the Navi 31?
I guess it will be like this:

View attachment 250565
I'd like to inject based on the v6 Navi 31 diagram above should've had a V8!!?
 
Last edited:
Joined
Aug 21, 2015
Messages
1,664 (0.53/day)
Location
North Dakota
System Name Office
Processor Ryzen 5600G
Motherboard ASUS B450M-A II
Cooling be quiet! Shadow Rock LP
Memory 16GB Patriot Viper Steel DDR4-3200
Video Card(s) Gigabyte RX 5600 XT
Storage PNY CS1030 250GB, Crucial MX500 2TB
Display(s) Dell S2719DGF
Case Fractal Define 7 Compact
Power Supply EVGA 550 G3
Mouse Logitech M705 Marthon
Keyboard Logitech G410
Software Windows 10 Pro 22H2
I have never succeeded in making it work - there is always silence.

By contrast, GPU audio output works for me 9 times out of 10, all the way back to Fermi. Maybe it's a useless feature for you, but I like to be able to plug my PC into a TV and have sound come out with just an HDMI cable. Heck, one of my recent builds was having trouble with the onboard audio (drivers or hardware I never figured out), so I plugged my speakers into the monitor output instead. GPU audio saved my life!*

*not really
 
Joined
Feb 1, 2019
Messages
2,575 (1.35/day)
Location
UK, Leicester
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 3080 RTX FE 10G
Storage 1TB 980 PRO (OS, games), 2TB SN850X (games), 2TB DC P4600 (work), 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Asus Xonar D2X
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
Navi 31 will break the 1000 GTexels/s (1 TTexel/s) Texture Fillrate barrier.

View attachment 250486
Damn that looks so sweet man, would rather have 16 gig vram any day of the week instead of RT cores I will never use. Shame AMD dont have the equivelent of FE in the UK at MSRP. Else I would have switched over to the red team.
 
Joined
Mar 4, 2022
Messages
31 (0.04/day)
With the ever-so increasing difficulty of being able to source new, power-efficient nodes, this is very hard to believe. I'd love for it to be true, but I'm going to keep my expectations in check until independent reviews come out.
It was true for RDNA1 over GCN and for RDNA2 over RDNA1... all on the same 7nm node...
They did it twice over the last 3 years, I don't get what is hard to belive.
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
Not only that, but people had similar takes on Zen 1 to Zen 2 and Zen 2 to Zen 3 you saw plenty of similar rhetoric some people are just waiting to see AMD falter to jump all over them or have low optimism.
 
Joined
Apr 21, 2005
Messages
170 (0.02/day)
Not only that, but people had similar takes on Zen 1 to Zen 2 and Zen 2 to Zen 3 you saw plenty of similar rhetoric some people are just waiting to see AMD falter to jump all over them or have low optimism.

AMD have been pretty reliable with their performance estimates and comparisons to be fair to them. They did it with the 5800X3D. They could have easily pointed at the gains in ACC, MSFS, iRacing, Stellaris and made some pretty rediculous claims but they didn't, they kept to the sort of gaming suite reviewers use and said the gain over a 5900X was around 15% which has now been verified by several independent reviewers.

AMD don't seem to excessively puff up their numbers so I see no reason why this >50% perf / watt claim should not be believed.
 
Joined
Mar 21, 2016
Messages
2,197 (0.74/day)
What would happen if AMD did something akin to big LITTLE combined with I/O on the GPU side combining N5 and N5P I can't help speculate a bit. I don't believe they've done so, but could the I/O reserve the extra hardware performance for say hardware accelerated FSR? Also given N5P can be +5% performance or -10% power could one chip be used one and the other opposite!!? Sort of a interesting hypothetical thought. Similarly what about differentiating something like FP half precision and double precision between two chips?

It seems like there are a lot of tangible ways they might extract a bit more performance or efficiency potentially depending on how the I/O can sync and leverage it. It's basically a dedicated sequencer for how the GPU operates with low latency chips closely connected to it.
 

Mussels

Freshwater Moderator
Staff member
Joined
Oct 6, 2004
Messages
58,413 (8.19/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Two 100w chips reduce performance per chip 50% add together 50% + 50% = 100% you're still at 100w and double the performance. Of course it begs the question how can they get there? Well a node shrink. I'm not sure how much that accounts for on the chip die, but there could be a bit higher performance per watt on the DRAM side advancement as well. Still 50% is a lot it seems like, but then you've always got infinity cache and it makes a huge difference on performance per watt especially since benefits from compression as well and perhaps the I/O die controlling the two MCM chips handles a bit of decompression and intelligently at the same time.

The node reduction itself from 6nm down to 5nm is what 1/6? across two chip dies which works out to 1/3 they also shuffle logic to the I/O and I'm not sure how much that occupies off hand, but say it bumps up to 40% more silicone space with the space that is pretty good. The other good aspect is heat is spread out more between two chip dies which is better than a one chip die the size of 2 all condensed in one spot. It's much better for the heat load to be spread apart and radiate more to the cooler. That even reduces stress on the VRM's that have to power the fans for the GPU. Something interesting is if a AIB's were to ever put a fan header on the side that could be plugged into a system header instead shifting more stress to the MB VRM's and off of the GPU's VRM's given they can consume a few watts.

It seems pretty reasonable and plausible. Let's not forget there could be a bit more room to increase the die size to make room for more of that cache over the previous chip dies. In fact even not taking that into account if the cache is on the die and you pair two you double the cache. This isn't SLI/CF either plus it's got a dedicated I/O die as well. Just moving logic to the I/O die will free up silicone space on the chip die. It might not be 50% in all instances, but up to in the right scenario I can see it. Lastly FSR is another metric in all of this and gives a uplift on efficiency per watt. You can certainly argue it's important to consider the performance per watt context scenario a company be it AMD/Intel/Nvidia or others are talking about.

I'm going to go out on a limb on this one and say it could be 50% performance per watt or greater across the entire RDNA3 product segment under the right circumstances. You have to also consider along with all the other parts mentioned voltage is squared and smaller dies running at lower wattage require lower voltage increasing efficiency per watt as a whole. So I'm pretty certain this can be very much realistic. I'm not going to say I'm 100% about 50% performance per watt across the entire SKU lineup, but AMD hints at it you can argue without explicitly going into detail. AMD neither indicates nor discredits that it's for a particular RDNA3 SKU, but rather lists RDNA3 which could be either or though can subtly pointing out it's across the product lineup or at least the initial launch product lineup.
You aren't great at math, friend

I have never succeeded in making it work - there is always silence.
With the exception of old cards like the 8800GT that had an SPDIF port, all modern GPU's just have simple built in audio. There is nothing to do except install your drivers, and change the default audio device in windows.
 
Joined
Dec 30, 2010
Messages
2,087 (0.43/day)
They're already doing this with CDNA based cards, I'd say an even chance they'll do so with consumer cards especially if Nvidia releases 500~600W monstrosity chips! No way AMD matches them with just 400~500W even if they lead in perf/W at the high end.


It is possible. Nvidia's approach might be fast / low latency but at the expensive of quite some power. AMD's approach is 4 tiny "GPUs" basicly as a hardware crossfire linked together paired with infinity cache and HBM. The advantage for AMD is higher yields, scalability but more important more efficient then throwing in one big core. The downside would be latency, however infinity cache can tackle that. It works amazingly well for RDNA cutting costs of for example larger GDDR bus width.
 
Joined
Jun 6, 2022
Messages
621 (0.91/day)
System Name Common 1/ Common 2/ Gaming
Processor i5-10500/ i5-13500/ i7-14700KF
Motherboard Z490 UD/ B660M DS3H/ Z690 Gaming X
Cooling TDP: 135W/ 200W/ AIO
Memory 16GB/ 16GB/ 32GB
Video Card(s) GTX 1650/ UHD 770/ RTX 3070 Ti
Storage ~12TB inside + 6TB external.
Display(s) 1080p@75Hz/ 1080p@75Hz/ 1080p@165Hz+4K@60Hz
Case Budget/ Mini/ AQIRYS Aquilla White
Audio Device(s) Razer/ Xonar U7 MKII/ Creative Audigy Rx
Power Supply Cougar 450W Bronze/ Corsair 450W Bronze/ Seasonic 650W Gold
Mouse Razer/ A4Tech/ Razer
Keyboard Razer/ Microsoft/ Razer
Software W10/ W11/ W11
Benchmark Scores For my home target: all ok
My thoughts exactly. It's a funny turnaround to see AMD taking the lead on perf/W when they were so far behind Nvidia for so many years, but they knocked it out of the park with RDNA2, so I'm inclined to be optimistic towards this.

Today
AMD: TSMC 7nm
nVidia: Samsung 8nm

Next
AMD and nVidia: TSMC 5nm

I expect nVidia to keep the advantage in RT and consolidate its advantage with DLSS.
Remember that an nVidia card also supports FSR. The two technologies practically keep the old Turing alive.
 

Tremdog

New Member
Joined
Jun 11, 2022
Messages
1 (0.00/day)
I'm starting to believe, that the perf/watt is a dead end in the graphics and CPU industries. It no longer satisfies me when companies say that and obviously the growing power consumption for these has a lot to do with it. I'm looking forward for the new tech but if the power consumption is through the roof, I will literally skip buying and investing in graphics cards and CPUs for that matter.


where do you have 2x performance increase over RDNA2? AMD said 50% increase.
Amd said a 50% increase per watt. So to have twice the performance, they increase the number of streaming processors as well as clock cycle.

Interesting if the rumored 2x performance increase over RDNA2 is true the top gpu would need to use around 500W.
RTX 3090 Ti uses 450 watts, the 4000 series is rumoured to chug significantly more than 3000 series.
 
Joined
May 2, 2017
Messages
7,762 (3.05/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
By contrast, GPU audio output works for me 9 times out of 10, all the way back to Fermi. Maybe it's a useless feature for you, but I like to be able to plug my PC into a TV and have sound come out with just an HDMI cable. Heck, one of my recent builds was having trouble with the onboard audio (drivers or hardware I never figured out), so I plugged my speakers into the monitor output instead. GPU audio saved my life!*

*not really
Yeah, I've literally never heard of GPU audio not working. Like... how would it not work?

I have never succeeded in making it work - there is always silence.
Does your monitor have speakers? Have they been turned all the way down? There is no reason why this shouldn't work.

Damn that looks so sweet man, would rather have 16 gig vram any day of the week instead of RT cores I will never use. Shame AMD dont have the equivelent of FE in the UK at MSRP. Else I would have switched over to the red team.
Doesn't the AMD.com 'shop' site work in the uk?
What would happen if AMD did something akin to big LITTLE combined with I/O on the GPU side combining N5 and N5P I can't help speculate a bit. I don't believe they've done so, but could the I/O reserve the extra hardware performance for say hardware accelerated FSR? Also given N5P can be +5% performance or -10% power could one chip be used one and the other opposite!!? Sort of a interesting hypothetical thought. Similarly what about differentiating something like FP half precision and double precision between two chips?

It seems like there are a lot of tangible ways they might extract a bit more performance or efficiency potentially depending on how the I/O can sync and leverage it. It's basically a dedicated sequencer for how the GPU operates with low latency chips closely connected to it.
What would "akin to big.little" even mean in a GPU, where all the cores are incredibly tiny? Remember, that +5%perf/-10%power is mainly down to clock tuning, and can be achieved on any gpu through the boost algorithm.

As for most of the rest of what you're saying here, it would be incredibly expensive (taping out two different dice, possibly using different libraries) for very little benefit. Double precision FP doesn't matter for consumers.
Today
AMD: TSMC 7nm
nVidia: Samsung 8nm

Next
AMD and nVidia: TSMC 5nm

I expect nVidia to keep the advantage in RT and consolidate its advantage with DLSS.
Remember that an nVidia card also supports FSR. The two technologies practically keep the old Turing alive.
Sure, they have a node advantage, but what you're doing here is essentially hand-waving away the massive efficiency improvements AMD has delivered over the past few generations. Remember, back when both were on 14/16nm, Nvidia was miles ahead, with a significant architectural efficiency lead. Now, AMD has a minor node advantage but also crucially has caught up in terms of architectural efficiency - TSMC 7nm is better than Samsung 8nm, but not by that much. And they're then promising another >50% perf/W increase on top of that, while Nvidia is uncharacteristically quiet.
 
Joined
Apr 30, 2011
Messages
2,651 (0.56/day)
Location
Greece
Processor AMD Ryzen 5 5600@80W
Motherboard MSI B550 Tomahawk
Cooling ZALMAN CNPS9X OPTIMA
Memory 2*8GB PATRIOT PVS416G400C9K@3733MT_C16
Video Card(s) Sapphire Radeon RX 6750 XT Pulse 12GB
Storage Sandisk SSD 128GB, Kingston A2000 NVMe 1TB, Samsung F1 1TB, WD Black 10TB
Display(s) AOC 27G2U/BK IPS 144Hz
Case SHARKOON M25-W 7.1 BLACK
Audio Device(s) Realtek 7.1 onboard
Power Supply Seasonic Core GC 500W
Mouse Sharkoon SHARK Force Black
Keyboard Trust GXT280
Software Win 7 Ultimate 64bit/Win 10 pro 64bit/Manjaro Linux
RDNA3's MCM design for sure means that the top GPUs (Navi31&32) will have 2 GPU core chiplets and some more dies for IO and anything else needed. Chiplet means die having compute cores as Zen arch showed since Zen2. They call die the IO chip, not chiplet. And chiplets for CPU worked great for both cost/yileds and binning, so with the +50% in perf/W, Navi31 at 5nm could double theoretical performance over Navi21 and consume 400-450W. While nVidia being at 4nm to double theoretical performance over 3090 will consume at least 550-600W and cost more.
 
Joined
Jan 14, 2019
Messages
9,819 (5.11/day)
Location
Midlands, UK
System Name Nebulon-B Mk. 4
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance EXPO DDR5-6000
Video Card(s) Sapphire Pulse Radeon RX 7800 XT
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2, 4 + 8 TB Seagate Barracuda 3.5"
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Windows 10 Pro
Benchmark Scores Cinebench R23 single-core: 1,800, multi-core: 18,000. Superposition 1080p Extreme: 9,900.
It causes driver related issues, and yes, I have never used the graphics integrated audio. I have a normal audio processor integrated on the motherboard..
What kind of driver related issues does it cause? I've never had any problem using the GPU's audio. In fact, I find it quite useful, not just for TVs - there's one less cable in the jungle at the back of the PC.
 
Joined
Oct 27, 2020
Messages
788 (0.62/day)
The 50% Perf/Watt isn't something new, both Rich Bergman (interview) and David Wang (Presentation) mentioned it in the past:

https://www.notebookcheck.net/AMD-i...att-over-the-new-RX-6000-series.503412.0.html

Although we still don't know many details (chiplets, gpu & memory frequencies etc.) so i will be updating the below sometime in the future when more info becomes known, my current hypothesis is the below :

IMG_20220611_110113.jpg

I meant Navi32-8192 instead of Navi31-8192
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
3,931 (2.55/day)
Location
Ex-usa
What kind of driver related issues does it cause? I've never had any problem using the GPU's audio. In fact, I find it quite useful, not just for TVs - there's one less cable in the jungle at the back of the PC.

Mess, usually when I uninstall the Radeon Software, it deleted the Dolby Audio software panel from my tray, as well. Needed to reinstall the entire Realtek High Definition Audio Driver package again..

Does your monitor have speakers? Have they been turned all the way down? There is no reason why this shouldn't work.

No, it doesn't.
It is when I try to connect the notebook to the UHD 4K TV using the HDMI cable. There is never sound via that cable.

View attachment 250617
I meant Navi32-8192 instead of Navi31-8192

Yes, for cut down Navi 31 that would be terrible performance causing lack of sense to launch such a version, completely..

Fixed it for you:

1654938687096.png
 
Last edited:
Top