Tuesday, October 20th 2020

AMD Radeon RX 6000 Series "Big Navi" GPU Features 320 W TGP, 16 Gbps GDDR6 Memory

AMD is preparing to launch its Radeon RX 6000 series of graphics cards codenamed "Big Navi", and it seems like we are getting more and more leaks about the upcoming cards. Set for October 28th launch, the Big Navi GPU is based on Navi 21 revision, which comes in two variants. Thanks to the sources over at Igor's Lab, Igor Wallossek has published a handful of information regarding the upcoming graphics cards release. More specifically, there are more details about the Total Graphics Power (TGP) of the cards and how it is used across the board (pun intended). To clarify, TDP (Thermal Design Power) is a measurement only used to the chip, or die of the GPU and how much thermal headroom it has, it doesn't measure the whole GPU power as there are more heat-producing components.

So the break down of the Navi 21 XT graphics card goes as follows: 235 Watts for the GPU alone, 20 Watts for Samsung's 16 Gbps GDDR6 memory, 35 Watts for voltage regulation (MOSFETs, Inductors, Caps), 15 Watts for Fans and other stuff, and 15 Watts that are used up by PCB and the losses found there. This puts the combined TGP to 320 Watts, showing just how much power is used by the non-GPU element. For custom OC AIB cards, the TGP is boosted to 355 Watts, as the GPU alone is using 270 Watts. When it comes to the Navi 21 XL GPU variant, the cards based on it are using 290 Watts of TGP, as the GPU sees a reduction to 203 Watts, and GDDR6 memory uses 17 Watts. The non-GPU components found on the board use the same amount of power.
When it comes to the selection of memory, AMD uses Samsung's 16 Gbps GDDR6 modules (K4ZAF325BM-HC16). The bundle AMD ships to its AIBs contains 16 GB of this memory paired with GPU core, however, AIBs are free to put different memory if they want to, as long as it is a 16 Gbps module. You can see the tables below and see the breakdown of the TGP of each card for yourself.
Sources: Igor's Lab, via VideoCardz
Add your own comment

153 Comments on AMD Radeon RX 6000 Series "Big Navi" GPU Features 320 W TGP, 16 Gbps GDDR6 Memory

#101
TheoneandonlyMrK
mtcn77There is also the semi-permanent vector operations(vector packed scalars, afaik) which are all the buzz.

Frontend and backend are different. The gpu has to decode first, then shaders run them. For the initial period, shaders don't do much. The graphics command processor & workload managers(4 as per each rasterizer) download instructions that shaders will use up.
Wouldn't there be a flow through the shaders while the decoders work on the next batch and the batch before is returned to memory, except on startup.
I thought GPU were made to stream data in and out not do one job at a time.
The command processor and scheduling keep the flow going..
Posted on Reply
#102
dragontamer5788
mtcn77There is also the semi-permanent vector operations(vector packed scalars, afaik) which are all the buzz.
Those are just vector ops from the perspective of the assembly language.
mtcn77Frontend and backend are different. The gpu has to decode first, then shaders run them. For the initial period, shaders don't do much. The graphics command processor & workload managers(4 as per each rasterizer) download instructions that shaders will use up.
What I'm talking about is in the compute units themselves. See page 12: developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf



sALU processes Scalar instructions (loops, branching, booleans), where sGPRs are primarily booleans, but also function-pointers, the call stack, and things of that nature.

vALUs process vector instructions, which include those "packed" instructions. If we wanted to get more specific, there are also LDS, load/store, and DPP instructions going to different units. But by and large, the two instructions that constitute the majority of AMD GPUs are classified as vector, or scalar.

You're right in that the fixed-function pipeline (not shown in the above diagram), in particular rasterization ("ROPs") constitute a significant portion of the modern GPU. But you can see that the command-processor is very far away from the vALUs / sALUs inside of the compute units.
theoneandonlymrkWouldn't there be a flow through the shaders while the decoders work on the next batch and the batch before is returned to memory, except on startup.
I thought GPU were made to stream data in and out not do one job at a time.
The command processor and scheduling keep the flow going..
AMD's command processors are poorly documented. I can't find anything that describes their operation very well. (Well... I could read the ROCm source code, but I'm not THAT curious...)

But from my understanding: the command processor simply launches wavefronts. That is: it sets up the initial sGPRs for a workgroup (x, y, and z coordinate of the block), as well as VGPR0, VGPR1, and VGPR2 (for the x, y, and z coordinate of the thread). Additional parameters go into sGPRs (shared between all threads). Then, it issues a command to jump (or function call) the compute unit to a location in memory. AMD command processors have a significant amount of hardware scheduling logic for events and ordering of wavefronts: priorities and the like.

But the shader has already been converted into machine code by the OpenCL or Vulkan or DirectX driver, and loaded somewhere. The command processor only has to setup the parameters, and issue a jump command to get a compute unit to that code (once all synchronization functions, such as OpenCL Events, have proven that this particular wavefront is ready to run).
Posted on Reply
#103
Cheeseball
Not a Potato
Chrispy_I'll be trying out the RDNA2 cards for the exact same reason as you. 250W max in my HTPC but the reason I'm back to Nvidia in the HTPC at the moment is the AMD HDMI audio driver cutting out with Navi cards. Didn't happen when I swapped to an RX480 or a 2060S, but when I tried a vanilla 5700 the exact same bug reappeared. A microsoft update was the trigger but AMD haven't put out a fix yet and after 3 months I got bored of watching the thread of people complaining on AMD's forum get longer without acknowledgement and moved on.
The new 20.10.1 driver seems to address the HDMI audio issue with AV receivers. I have not tested this on the RX 5700 XT and Onkyo yet.
Posted on Reply
#104
Zach_01
RedelZaVednoPerformance per watt did go up on Ampere, but that's to be expected given that Nvidia moved from TSMCs 12nm to Samsung’s 8nm 8LPP, a 10nm extension node. What is not impressive is only 10% performance per watt increase over Turing while being build on 25% denser node. RDNA2 arch being on 7 nm+ looks to be even worse efficiency wise given that density of 7nm+ is much higher, but let's wait for the actual benchmarks.
You can’t actually compare nodes nor estimate performance/W gains by node shrink alone. Look at ZEN3. On exact same node of ZEN2 has a 20% better perf/W just by architectural improvements. Don’t confuse this with higher IPC on same speed. If you increase IPC alone without perf/W improvements the power consumption goes up. It’s physics. It’s not only clock that draws power.

RDNA2 is on a better (from RDNA1) 7nm node (the 7NP DUV and not 7nm+ EUV) that (by rumors) offers a 10-15% higher density and combined with the improvements in RDNA2 architecture it is “said” to have +50% better perf/W.

If true, where exactly is going to place 6900 against Ampere, is yet to be seen
EarthDogIt seems like Ampere so far is and the 5700XT... clocked out of their efficiency curves... and it seems the same with RDNA2 with the rumors so far...I don't think any of the AMD fanatics saw similar power envelopes coming (they are awfully quiet here... go figure) and here we are.
I was expecting it... the 300~320W TBP. It couldn’t be anything else in order to offer similar 3080 perf. Less watts didn’t add up, and why AMD shouldn’t use all Watts up to Ampere. Again, my thoughts.

—————————————

Personally I don’t care about a GPU drawing 350 or 400W. I used to have a R9 390X OC model with 2.5 slot cooler and it was just fine. That was rated 375W TBP.
The 5700XT now is more than x2 the perf with 240W peaks and 220W avg power draw.

Every flagship GPU is set to work (when maxed) out of the efficiency curve. Unless there is no competition.

AMD Drivers, except from power, perf bars, also offer the function “chill”. You can set a min/max FPS target. In most games if I use this feature to cap FPS at min/max 40/60 the avg draw of the card is less than 100W.
60Hz is my monitor, and that the target within movement. If you stop moving in game the FPS drops to 40. I can set it 60/60 if I like.
My monitor is 13,5year old 1920x1200 16:10 and I was planing to switch to ultra wide 6 months ago but the human malware changed that, along other aspects of my(our) life(s).

There is no point for me to complain about the amount of power GPU are drawing. Buy a lower tier model. And perf/W is a continuously improved matter. We just can’t use flagship models as examples, sometimes.
Posted on Reply
#105
dragontamer5788
CheeseballThe new 20.10.1 driver seems to address the HDMI audio issue with AV receivers. I have not tested this on the RX 5700 XT and Onkyo yet.
My real issue with the RX 5700 XT series is the lack of ROCm support.

Unofficial ROCm is beginning to happen in ROCm 3.7 (released in August 2020). But there's been over a year where compute-fans were unable to use ROCm at all on NAVI. To be fair: AMD never promised ROCm support on all of their cards. But it really knocks the wind from people's sails when they're unable to "play" with their cards. Even older cards like the RX 550 never really got ROCm support (only RX 580 got official support).

For now, my recommendation for AMD GPU-compute fans is to read the documentation carefully before buying. Wait for a Radeon Machine Intelligence card, like MI25 (aka: Vega64) to come out before buying that model. AMD ROCm is clearly aimed at their MI-platform and not really their consumer cards. MI8 (aka: RX 580) and MI6 (aka: Rx Fury) have good support, but not necessarily other cards.

---------

ROCm suddenly getting support for NAVI in 3.7 suggests that this new NAVI 2x series might have a MI-card in the works, and therefore might be compatible with ROCm.
Posted on Reply
#106
AusWolf
theoneandonlymrkIt's a concern for me, UK power isn't cheap, having said that ,as you say power use depends on load, and few cards use flat out power much of the day, even folding at home or mining doesn't Max a cards power use in reality.
Still, Some game's are going to cook people while gaming, warm winter perhaps, hopefully that looto tickets not as shit as all my last one's.
I just made my calculations a few posts above yours. If you pay 20p per kWh, then 2 hours of gaming (or folding, or whatever) every day on a computer that eats 300 W more than the one you currently own will increase your bills by £3.65 a month! If you fold 24/7, fair enough, but other than that, I wouldn't worry too much.
Posted on Reply
#107
Vayra86
AusWolfI just made my calculations a few posts above yours. If you pay 20p per kWh, then 2 hours of gaming (or folding, or whatever) every day with a computer that eats 300 W more than the one you currently own will increase your bills by £3.65 a month! If you fold 24/7, fair enough, but other than that, I wouldn't worry too much.
3,5 pounds is a few pints in the pub you cant go to. Definitely worth considering I say
Posted on Reply
#108
TheoneandonlyMrK
dragontamer5788Those are just vector ops from the perspective of the assembly language.



What I'm talking about is in the compute units themselves. See page 12: developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf



sALU processes Scalar instructions (loops, branching, booleans), where sGPRs are primarily booleans, but also function-pointers, the call stack, and things of that nature.

vALUs process vector instructions, which include those "packed" instructions. If we wanted to get more specific, there are also LDS, load/store, and DPP instructions going to different units. But by and large, the two instructions that constitute the majority of AMD GPUs are classified as vector, or scalar.

You're right in that the fixed-function pipeline (not shown in the above diagram), in particular rasterization ("ROPs") constitute a significant portion of the modern GPU. But you can see that the command-processor is very far away from the vALUs / sALUs inside of the compute units.



AMD's command processors are poorly documented. I can't find anything that describes their operation very well. (Well... I could read the ROCm source code, but I'm not THAT curious...)

But from my understanding: the command processor simply launches wavefronts. That is: it sets up the initial sGPRs for a workgroup (x, y, and z coordinate of the block), as well as VGPR0, VGPR1, and VGPR2 (for the x, y, and z coordinate of the thread). Additional parameters go into sGPRs (shared between all threads). Then, it issues a command to jump (or function call) the compute unit to a location in memory. AMD command processors have a significant amount of hardware scheduling logic for events and ordering of wavefronts: priorities and the like.

But the shader has already been converted into machine code by the OpenCL or Vulkan or DirectX driver, and loaded somewhere. The command processor only has to setup the parameters, and issue a jump command to get a compute unit to that code (once all synchronization functions, such as OpenCL Events, have proven that this particular wavefront is ready to run).
Sooo, work Does flow through then? Lol Ty.
Posted on Reply
#109
thesmokingman
TurmaniaDoes anybody not care about electricity bills anymore, or most not having responsibilty to pay the bills? Who would buy these cards?
That's ironic...
TurmaniaIt is expected and rightly deserved by AMD, I expect this tike around both their CPU & GPU are both fully matured and we wont see the bios and software issues that happened last year at least not in that scale. However, when it comes to power consumption i do not believe the 65w. It will consume just as much as i5 10600k.
The same ppl who buy 10600K cpus who think they only use 125w?? The same ppl who think they're saving the world by consuming less power but are actually running at PL2 all the time thus consuming way more power?
Posted on Reply
#110
TheoneandonlyMrK
AusWolfI just made my calculations a few posts above yours. If you pay 20p per kWh, then 2 hours of gaming (or folding, or whatever) every day on a computer that eats 300 W more than the one you currently own will increase your bills by £3.65 a month! If you fold 24/7, fair enough, but other than that, I wouldn't worry too much.
Technically I don't directly pay the bill ;) she does ;) damn electric company :D.
Posted on Reply
#111
Tomgang
320 watt is a lot what ever it's nvidia or amd. But for for those that might don't want there card to consume 320 watt+ all the time and I am one of them.

There are ways to to keep consumption down. You can limit fps in some games, you can activate v-sync so you have 60 fps and keep's gpu load down or you can download example msi afterburner. There is this little slider Called power target. With that you can limit the maximum power the card is allowed to to use. What I know, rtx 3080 can be limited all the way down to only 100 watt. Also under volting can save you some watt. Again it seems rtx 3080 can be good for up to 100 watt saving just by limiting max voltage to gpu, with out offering to much performance loss. I used the power target slider for years to adjust a fitting power consumption.

I am not expecting to get RDNA2 based card. But I do hope amd can still provide a good amount of resistance to rtx 3080, cause we all know. Competition is good for consumer pricing.
Posted on Reply
#112
mtcn77
dragontamer5788But you can see that the command-processor is very far away from the vALUs / sALUs inside of the compute units.
Well, far or near, on time scale they are consecutively placed, one precedes the other which puts the pressure on gcn frontend.
dragontamer5788vALUs process vector instructions, which include those "packed" instructions.
Semi persistent stuff are scalar timed vector ops which save on critical timing. It consumes vector memory in a scalar fashion which saves on decode latency since it follows the developer's instruction and allows for lane instrinsics and full memory utilization.
dragontamer5788But from my understanding: the command processor simply launches wavefronts.
Yes. There are 2560 wavefronts in a CU and there are 64 CU's per command processor. It takes 64 cycles for each CU to get 1 operation workgroup issued and thereafter 64 cycles for every wave per CU. It takes a lot of time until shaders are fully operational.
dragontamer5788The command processor only has to issue a jump command to get a compute unit to that code.
Posted on Reply
#113
EarthDog
dragontamer5788You pay for a bigger and wider GPU. Effectively: you're paying for the silicon (as well as the size of a successful die. The larger the die, the harder it is to produce and naturally, the more expensive it is).

Whether you run it at maximum power, or minimum power, is up to you. Laptop chips, such as the Laptop RTX 2070 Super, are effectively underclocked versions of the desktop chip. The same thing, just running at lower power (and greater energy efficiency) for portability reasons. Similarly, a mini-PC user may have a harder time cooling down their computer, or maybe a silent-build wants to reduce the fan noise.

A wider GPU (ex: 3090) will still provide more power-efficiency than a narrower GPU (ex: 3070), even if you downclock a 3090 to 3080 or 3070 levels. More performance at the same levels of power, that's the main benefit of "more silicon".

--------

Power consumption is something like the voltage-cubed (!!!). If you reduce voltage by 10%, you get something like 30% less power draw. Dropping 10% of your voltage causes a 10% loss of frequency, but you drop in power-usage by a far greater number.
we can think of situations where it cod be worthwhile. Im not talking about shoehorning these things in tiny boxes, etc. We can all think of exceptions. ;)
Posted on Reply
#114
AusWolf
theoneandonlymrkTechnically I don't directly pay the bill ;) she does ;) damn electric company :D.
And in my case, she shares the costs of living, so with my calculations, I would only pay £1.82 more per month. :laugh: I really don't understand why some of you guys are so scared of power-hungry PC components (unless you run your PCs on full load 24/7).
Posted on Reply
#115
Turmania
NaterI don't think I've ever once thought of the electricity bill when it comes to computers, except when it comes to convincing the wife that upgrading will actually SAVE us money. "Honey, it literally pays for itself!"

We have an 18k BTU mini-split that runs in our 1300 sq. ft. garage virtually 24/7. The 3-5 PC's in the home that run at any given moment are NOTHING compared to that.
When I buy a new system, I have to pay wifey tax. Which in the end costs me the same as a new system and in many cases more! But at least, everyone is happy.
Posted on Reply
#116
dragontamer5788
mtcn77Yes. There are 2560 wavefronts in a CU and there are 64 CU's per command processor. It takes 64 cycles for each CU to get 1 operation workgroup issued and thereafter 64 cycles for every wave per CU. It takes a lot of time until shaders are fully operational.
By my tests, it takes 750 clock cycles to read a single wavefront's worth of data from VRAM (64x32 bit reads). So on the timescales of computations, 64 cycles isn't very much. Its certainly non-negligible, but I expect that the typical shader will at least read one value of memory, then write one value of memory (or take 1500 clocks), plus all of the math operations it has to do. If you're doing heavy math, that will only increase the number of cycles per shader.

If you are shader-launch constrained, it isn't a big deal to have a for(int i=0; i<16; i++){} statement wrapping your shader code. Just loop your shader 16 times before returning.
Yeah, I remember seeing the slide but I couldn't remember where to find it. Thanks for the reminder. You'd think something like that would be in the ISA. Really, AMD needs to put out a new optimization guide that contains information like this (which they haven't written one since the 7950 series)
Posted on Reply
#117
Makaveli
SLKLooks like this gen of GPUs are all power-hungry. Efficiency is out of the window!
Pretty much to be expected everyone is trying to make 4k playable with the new hardware and was not going to happen on a low power budget.
TurmaniaDoes anybody not care about electricity bills anymore, or most not having responsibilty to pay the bills? Who would buy these cards?
You assume they cannot afford it and that everyone pays the same price for power.
Posted on Reply
#118
Turmania
thesmokingmanThat's ironic...



The same ppl who buy 10600K cpus who think they only use 125w?? The same ppl who think they're saving the world by consuming less power but are actually running at PL2 all the time thus consuming way more power?
Flattering to see you search my posts in other topics but I still do not understand what you tried to say here? perhaps i just had a very boring day at work and not focused enough...
Posted on Reply
#119
mtcn77
dragontamer5788Yeah, I remember seeing the slide but I couldn't remember where to find it.
It is the 'engine optimization hot lap' by Timothy Lottes.
slideplayer.com/slide/17173687/
dragontamer5788By my tests, it takes 750 clock cycles to read a single wavefront's worth of data from VRAM (64x32 bit reads). So on the timescales of computations, 64 cycles isn't very much. Its certainly non-negligible, but I expect that the typical shader will at least read one value of memory, then write one value of memory (or take 1500 clocks), plus all of the math operations it has to do. If you're doing heavy math, that will only increase the number of cycles per shader.



There is also, "AMD GPU Hardware Basics".
Basically, a hodge-podge of why we cannot keep gpus on duty. Pretty funny stuff, an engineered list of excuses why their hardware don't work.
Posted on Reply
#120
thesmokingman
TurmaniaFlattering to see you search my posts in other topics but I still do not understand what you tried to say here? perhaps i just had a very boring day at work and not focused enough...
Nah I was just reading the other thread and flabbergasted at how misinformed you are and it was ironic to see it here.
MakaveliPretty much to be expected everyone is trying to make 4k playable with the new hardware and was not going to happen on a low power budget.
Good point, especially considering the the megapixel density is four fold at 4k vs 1080. These new gpus' power draws have not risen at the same increment as the megapixel density.
Posted on Reply
#121
Turmania
MakaveliPretty much to be expected everyone is trying to make 4k playable with the new hardware and was not going to happen on a low power budget.
I can not speak about new Radeons yet, but with ampere, I would have settled for 25% improvement in performance whilst keeping the same power envelope.
Of course, there will be many people happy with the current situation as well. So, I can see and understand both sides of the argument.
Posted on Reply
#122
Icon Charlie
Well it looks like I'll be keeping my 5700 for awhile. MY entire load when running with this card is 247watts max from the wall outlet. Do you think I'm going to buy a video card that is going to add another 200 watts without a 150% increase in performance???

Absolutely not. I bitched about Nvidia and their wattage vs performance and when this card comes out I will bitch about that one too.
This has nor ever will be a Nivida vs AMD. This is and always will be the Best bang for the buck
Posted on Reply
#123
B-Real
Hereit's 255W.
lemoncarbonateYou get more framerate with 3080 despite the insane power draw, some said it's the most-power-per-frame-efficient GPU out there.

But, I agree with you... I wish they could have made something that less hungrier. Imagine how amazing it would be if we could get <200W card that can beat 2080 Ti.
Of course it's the most power-per-frame-efficient GPU, but if you compare it against the 1080-980, the 1080 gained near equal performance as the 3080 (a bit more for the 1080), the efficiency gain there was more than 3x bigger (18% vs. 59%).
Posted on Reply
#124
thesmokingman
B-RealHereit's 255W.
Interesting. Igor's calculating what they expect it to be. The tweet source just lists TGP not TBP which is what Igor's revised list is. Both are generally speaking in line with each other.

Again, the power draw numbers are not real so relax until we have actual real numbers. But don't be surprised they will be in the same range as Nvidia's because it will take MORE POWER to run realistic framerates at 4K because the pixel density is really steep!
Posted on Reply
#125
Makaveli
RedelZaVednoIt's not about the bill, it's about the heat. 400W GPU, 150W CPU, 50-150W for the rest of the system and you get yourself 0.6-0.7 KWh room heater. That's a no go in a 16m2 or smaller room in late spring and summer months if you live in moderate or warm climate.
I don't see that as big problem at all. You are taking fully loaded numbers and applying them very generally. When you are gaming that 3080 isn't using that full 320 watts. You can notice this by using something like Furmark which is considered a power virus by the GPU makers. Check how much wattage the card is using during this vs playing a game.

Same applied for a 105watt AM4 cpu and the rest of the system.

Unless you have everything running fully loaded non stop you won't hit those maximum power numbers you are trying to use to make your argument. Current hardware is very good at quickly dropping in lower power states when needed. And pretty much everything out today is very good at idle power draw.
Posted on Reply
Add your own comment
May 21st, 2024 05:41 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts