Tuesday, May 21st 2019

Sapphire Reps Leak Juicy Details on AMD Radeon Navi

A Sapphire product manager and PR director, speaking to the Chinese press spilled the beans on AMD's upcoming Radeon Navi graphics card lineup. It looks like with Navi, AMD is targeting the meat of the serious gamer market, at two specific price points, USD $399 with a "Pro" (cut-down) product, and $499 with an "XT" (fully-fledged) product. AMD has two NVIDIA products in its crosshairs, the GeForce RTX 2070, and the RTX 2060. In the interview, the Sapphire rep mentioned "stronger than 2070", when talking about performance numbers, which we assume is for the Navi XT variant - definitely promising. The $399 Navi "Pro" is probably being designed with a performance target somewhere between the RTX 2060 and RTX 2070, so you typically pay $50 more than you would for an RTX 2060, for noticeably higher performance.

Sapphire also confirmed that AMD's Navi does not have specialized ray-tracing hardware on the silicon, but such technology will debut with "next year's new architecture". They also suggested that AMD is unlikely to scale up Navi for the enthusiast segment, and that the Vega-based Radeon VII will continue to be the company's flagship product. On the topic of Radeon VII custom designs, Sapphire commented that "there is no plans for that". On the other hand, Sapphire is actively working on custom designs for the Navi architecture, and mentioned that "work on a "Toxic" version of Navi is complete, and it is watercooled". Many people have speculated that AMD will unveil Navi at its Computex keynote address on May 27. Sapphire confirmed that date, and also added that the launch will be on 7th of July, 2019.
Source: Zhihu (Blog)
Add your own comment

119 Comments on Sapphire Reps Leak Juicy Details on AMD Radeon Navi

#51
Valantar
bug, post: 4051307, member: 157434"
More like, that's what you need to believe.
Because surely you have noticed there's a dearth of water cooled RX 560 or GTX 1030 cards.
Not to sound overly snide, but you do know there are wattages in between 50 and 275, right? As mentioned above, there have been water cooled GTX 1070 cards (150W), there are plenty of water cooled RTX 2070 cards (175W). In other words, partner cards with AIOs are in no way necessarily proof of high power consumption, just that the cards is in a high enough price bracket where "premium cooling" allows AIB partners to demand premium pricing.

And as @Vayra86 pointed out above: low-end cards don't sell if they're too expensive. Sticking a $70 AIO on a $200 RX 580 doesn't make sense, but it does so on a $500 RTX 2070, even though they're roughly the same wattage, as the cost of the cooler would then represent a much smaller percentage of the total price, and that market segment is generally more open to "premium cooling".

Vya Domus, post: 4051255, member: 169281"
So there is a God written rule that they need to name it in a specific way ? GCN 5 is drastically different from GCN 1 in pretty much every way, they are worlds apart both in feature set and microarthitectural differences that change the clocks/power etc. It's a label that they may chose to keep using or not, it doesn't mean anything in particular if they do.
No, but it does make sense to not change the fundamentals of a chip architecture and keep the same name - that would be very confusing for everyone involved, particularly the people writing drivers for the hardware. And as pointed out above, GCN has not been fundamentally changed since its inception, it has been iterated upon, tweaked and upgraded, expanded, had features added - but the base architecture is still roughly the same, and works within the same framework - unlike, say, Nvidia's transition from Kepler to Maxwell, where driver compatibility fell off a cliff due to major architectural differences.
Posted on Reply
#52
Metroid
Navi is not vega, polaris is a good example of how efficient amd can be when they want to, vega in my book is terrible x power consumption while polaris is great and navi should follow polaris.
Posted on Reply
#54
bug
Valantar, post: 4051315, member: 171585"
Not to sound overly snide, but you do know there are wattages in between 50 and 275, right? As mentioned above, there have been water cooled GTX 1070 cards (150W), there are plenty of water cooled RTX 2070 cards (175W). In other words, partner cards with AIOs are in no way necessarily proof of high power consumption, just that the cards is in a high enough price bracket where "premium cooling" allows AIB partners to demand premium pricing.
I'm guessing we could spin this a million different ways.
What I know right now is:
1. Cards with water cooling sport above average power draw.
2. Till now GCN didn't do TBR, so it had power draw well above Nvidia's.
3. The first glimpse we have at Navi is apparently water cooled.

People keep hoping for a GPU's Zen, I keep seeing Bulldozer iterations...
Posted on Reply
#55
Alpha_Lyrae
Valantar, post: 4051253, member: 171585"
Uhm, no. It's a core architecture, which AMD has iterated on since they abandoned the previous TeraScale architecture. There are many variants, but they share a core framework and a lot of functionality. No GCN variant is fundamentally different from any other - just improved upon, added features to, etc. That's why AMD's early GCN cards have had such excellent longevity.
It's more of an ISA than "core architecture". GCN is a quad-SIMD design utilizing 1 instruction for the set, usually tasked in 64-thread groups. AMD's "next-gen" architecture still looks similar to GCN and is even executed similarly to current ISA, but has moved to VLIW2 (Super SIMD) and has drastically reworked CU clusters and caches. It probably won't be called GCN though, simply because AMD wants to retire that nomenclature. Vega was the largest change to GCN to date. Previously, ROPs used their own local cache, but the new tiling rasterizers need the ROPs connected to L2 to keep track of occluded primitives within pixels to cull them and reuse data for immediate mode tiling (hybrid raster). 2xFP16 is also useful in certain scenarios.

Vega and Turing both have new small geometry shaders that replace certain tessellation stages. In Vega, they're called primitive shaders, and in Turing, simply, mesh shaders. AMD is waiting for standardization in major APIs, while Nvidia seems fine with using a proprietary API extension to call them. Both types will further speed small geometry creation to enhance game realism, while AMD can also use them to speed geometry culling using their shader arrays to help their geometry front-ends.

Nvidia's basic GPC design (mini-GPUs within an interconnect fabric) dates back to Fermi, although Kepler fixed many of Fermi's shortcomings, Maxwell was the one to really propel it forward in perf/watt and not just from moving to immediate mode tiling rasterizers. Nvidia has also iterated on their GPC architecture, but in a much more aggressive manner (it helps to have a large R&D budget). Turing is still a VLIW2 GPC design*, using up to 6 GPCs in TU102. 7nm can extend that up to 8 GPCs when Nvidia moves to Ampere, but with RT taking priority now, Nvidia may just dedicate more die space to accelerating BVH traversal and intersection, trying to reduce ray tracing's very random hits to VRAM, and of course, making hybrid rendering, as a whole, more efficient and performant.

But, both AMD's GCN (2011) and Nvidia's GPC (2010) designs have been around for quite some time.

* Turing has to execute 2 SMs concurrently due to INT32 taking up 64 of 128 cores within an SM. So, using 2 SMs, 128 FP32/CUDA cores are tasked (warp is still 32 threads), similarly to Pascal and prior and thereby retains compatibility.
Posted on Reply
#57
Zubasa
THANATOS, post: 4051326, member: 184835"
[quote=Metroid, post: 4051317, member: 178915"]
Navi is not vega, polaris is a good example of how efficient amd can be when they want to, vega in my book is terrible x power consumption while polaris is great and navi should follow polaris.
Really?
https://www.techpowerup.com/reviews/EVGA/GTX_1650_SC_Ultra_Black/28.html
Vega is more efficient than Polaris.[/quote]Polaris is prime example when AMD tried to clock a GPU way past it's efficiency curve.
The original RX400 series were okay on performance / watt, but after 1060 released AMD try to get that little bit of performance for a rather large TDP increase with their RX580 refresh.
Posted on Reply
#58
cucker tarlson
btarunr, post: 4051181, member: 43587"
You are assuming that Navi as an architecture is slower than Turing on the basis of Vega being slower than Turing?
Lol of course it's slower if the full chip (56cu is it?) is targeting tu106
Posted on Reply
#59
THANATOS
Zubasa, post: 4051332, member: 30988"
Polaris is prime example when AMD tried to clock an GPU way past it's efficiency curve.
The original RX400 series were decent on performance / watt, but after 1060 released AMD try to get that little bit of performance for a rather large TDP increase with their RX580 refresh.
And Vega is not clocked past it's efficiency curve?
Posted on Reply
#60
HwGeek
Vega is efficient, it just not fast as NV's parts so they had to compensate over it with Clock speed and thus got out of the efficiency curve - same issue with Intel's parts that pushing clocks towards 5Ghz at "95" TDP parts with actual power draw of 150W+.
Posted on Reply
#61
m4dn355
As much as I love AMD, this nävi looks like POS.
1. Power-hungry
2. Sound-hungry
3. Perf.-hungry?!
Posted on Reply
#62
Zubasa
THANATOS, post: 4051336, member: 184835"
And Vega is not clocked past it's efficiency curve?
Vega is as well, but Vega was designed to reach a higher clock speed than polaris in the first place.
Therefore it (at least for Vega 56) wasn't as far off the efficiency curve as Polaris ended up.
But you do see the same crazy power draw happening with the AIO version of Vega64, that performance / watt dropped off a cliff.
Posted on Reply
#63
bug
Alpha_Lyrae, post: 4051325, member: 187828"
It's more of an ISA than "core architecture". GCN is a quad-SIMD design utilizing 1 instruction for the set, usually tasked in 64-thread groups. AMD's "next-gen" architecture still looks similar to GCN and is even executed similarly to current ISA, but has moved to VLIW2 (Super SIMD) and has drastically reworked CU clusters and caches. It probably won't be called GCN though, simply because AMD wants to retire that nomenclature. Vega was the largest change to GCN to date. Previously, ROPs used their own local cache, but the new tiling rasterizers need the ROPs connected to L2 to keep track of occluded primitives within pixels to cull them and reuse data for immediate mode tiling (hybrid raster). 2xFP16 is also useful in certain scenarios.

Vega and Turing both have new small geometry shaders that replace certain tessellation stages. In Vega, they're called primitive shaders, and in Turing, simply, mesh shaders. AMD is waiting for standardization in major APIs, while Nvidia seems fine with using a proprietary API extension to call them. Both types will further speed small geometry creation to enhance game realism, while AMD can also use them to speed geometry culling using their shader arrays to help their geometry front-ends.

Nvidia's basic GPC design (mini-GPUs within an interconnect fabric) dates back to Fermi, although Kepler fixed many of Fermi's shortcomings, Maxwell was the one to really propel it forward in perf/watt and not just from moving to immediate mode tiling rasterizers. Nvidia has also iterated on their GPC architecture, but in a much more aggressive manner (it helps to have a large R&D budget). Turing is still a VLIW2 GPC design*, using up to 6 GPCs in TU102. 7nm can extend that up to 8 GPCs when Nvidia moves to Ampere, but with RT taking priority now, Nvidia may just dedicate more die space to accelerating BVH traversal and intersection, trying to reduce ray tracing's very random hits to VRAM, and of course, making hybrid rendering, as a whole, more efficient and performant.

But, both AMD's GCN (2011) and Nvidia's GPC (2010) designs have been around for quite some time.

* Turing has to execute 2 SMs concurrently due to INT32 taking up 64 of 128 cores within an SM. So, using 2 SMs, 128 FP32/CUDA cores are tasked (warp is still 32 threads), similarly to Pascal and prior and thereby retains compatibility.
Hey, welcome to TPU.
Just so you know, informed, to the point posts are not the norm here. But this being your first, I won't report it ;)
Posted on Reply
#64
Manu_PT
Enterprise24, post: 4051320, member: 137706"
AdoredTV shat on the face again.
Yep, let´s see if he deletes his videos again this time, I guess not as they got big attention now. RTX 2070 performance for 250€, right.....
Posted on Reply
#65
THANATOS
Zubasa, post: 4051340, member: 30988"
Vega is as well, but Vega was designed to reach a higher clock speed than polaris in the first place.
Therefore it wasn't as far off the efficiency curve as Polaris ended up.
And you know Polaris or Vega's actual efficiency curve? You can't really say It was the clocks RX470 or RX480 had, because If you underclocked those chips, then they would have most likely better performance/power ratio than at their default clocks. Then I could also claim they are past their efficiency curve at their default clocks.

BTW comparing Polaris to Vega is unfair to begin with. Vega has more efficient HBM2 memory, but is also a more powerful gpu. Vega 64(4096SP, 256TMU, 64ROPs) has 10215-12665 GFLOPs vs RX570(2048SP, 128TMU, 32ROPs)which has 4784-5095 GFLOPs. Vega 64 is 114-149% more powerful on paper, but in reality It's only 97.5% faster than RX 570 in 4K resolution.
If we really wanted to compare which one is more efficient, we would need to have a 32-36CU version of Vega without HBM2.
Posted on Reply
#66
cucker tarlson
The $399 Navi "Pro" is probably being designed with a performance target somewhere between the RTX 2060 and RTX 2070, so you typically pay $50 more than you would for an RTX 2060, for noticeably higher performance.
stronger than 2070
I hope no one here has short terms memory loss to believe what reps say



updated may 19

https://www.pcgameshardware.de/Grafikkarten-Grafikkarte-97980/Specials/Rangliste-GPU-Grafikchip-Benchmark-1174201/2/




this has ddr6,less cu,lower clocks and worse performance per cu than R7 which beats 2070 by 6%.
stronger than 2070,yeah,right.

Melvis, post: 4051252, member: 50520"
Not as much now it seems

come on,let's not pretend that 90% of such channels cater for anything more than one or the other fanbases exclusively."this video is nothing new from what you've already seen a hundred times" doesn't earn clicks.look at pcgh test above.or the one that computerbase.de recently updated too.
worthless videos.but you go ahead and believe what they tell you.and don't forget to like and subscribe :)

they're gonna have to throw in one hell of a game bundle for people to defend this.
Posted on Reply
#67
Zubasa
THANATOS, post: 4051384, member: 184835"
BTW comparing Polaris to Vega is unfair to begin with. Vega has more efficient HBM2 memory, but is also a more powerful gpu. Vega 64(4096SP, 256TMU, 64ROPs) has 10215-12665 GFLOPs vs RX570(2048SP, 128TMU, 32ROPs)which has 4784-5095 GFLOPs. Vega 64 is 114-149% more powerful on paper, but in reality It's only 97.5% faster than RX 570 in 4K resolution.
If we really wanted to compare which one is more efficient, we would need to have a 32-36CU version of Vega without HBM2.
One thing you left out of that comparison, that is the Geometry performance of Vega vs Polaris.
This have become the Achilles heel of GCN the four 4 Shader Engine / Geometry Engine limit thus far.
Posted on Reply
#68
THANATOS
Zubasa, post: 4051391, member: 30988"
One thing you left out of that comparison, that is the Geometry performance of Vega vs Polaris.
This have become the Achilles hill of GCN the four 4 Shader Engine / Geometry Engine limit thus far.
I just wanted to point out that big Vega loses a lot of performance, which in turn causes It to have worse performance/W ratio than If It was smaller.
Posted on Reply
#69
Valantar
As I've been saying for a while now, AMD got stuck with a rather serious problem when they maxed out the CU config of GCN with Fiji - it was competitive at the time, but left zero room to grow by adding CUs, so further improvements required pushing clocks past their sweet spot (in the mean time, Nvidia has increased their CUDA core count by a whopping 55% at the high end). Which gave us Vega. Not a bad arch update or bad GPUs overall, but they delivered a rather poor efficiency improvement considering the move from 28nm to 14nm - again, because GCN stopped AMD from adding more CUs, forcing them to squeeze as high clocks as possible from the chips. Not to mention that this made it look like they were chasing Nvidia's clock speeds for no good reason, while both failing at matching them and losing efficiency. A bit of a pile-up of bad consequences of an inherent architectural trait, sadly. I would imagine an 80-CU Vega at ~1200MHz would perform amazingly, and do a decent job at perf/W too. If AMD matched Nvidia's core count increase since 2015 (980 Ti/Fury X) we'd now have a 100CU/6400SP Vega card - which it's not hard to imagine would compete quite well with Nvidia's top end cards even at low clocks and on 14nm. The die would be large, just like the Fury X, so a compromise around 80-90 CUs and clocks in the 1300-1450MHz range might be better, but all in all, AMD is being bottlenecked by being incapable of widening their chip designs, and this is what has truly been holding them back since 2015. Fingers crossed that the NG arch takes this into account by allowing ~unlimited core count scaling.
Posted on Reply
#70
Vya Domus
It wasn't up until very recently that AMD could make a 64 CU GPU in under 500 mm^2. People weren't exactly thrilled with Vega as it was , making it even more expensive would have served no purpose. AMD's performance problemes can't and shouldn't be solved by adding more CUs. Besides I don't even think they could have even made such a GPU feasible with GloFo's 14nm node and TSMC's 7nm probably doesn't allow for huge dies at moment either.
Posted on Reply
#71
efikkan
Valantar, post: 4051450, member: 171585"
As I've been saying for a while now, AMD got stuck with a rather serious problem when they maxed out the CU config of GCN with Fiji
I've seen this claim over and over again, but has it been explicitly stated from AMD that GCN can't do more than 64 CUs/4096 SPs?
To my knowledge there is no architectural reason why it wouldn't be possible, but there is a very good reason why they don't do it; adding e.g. 50% SPs would increase the energy consumption by ~50% but only increase the performance by ~20-30% at best, because a GPU with more clusters would need more powerful scheduling, and to maintain higher efficiency than the predecessor it would require more than 50% better scheduling. The problem for GCN have always been management of resources, and this is the reason why GCN has fallen behind Nvidia. GCN have plenty of computational power, just not the means to harness it.
Posted on Reply
#72
Valantar
efikkan, post: 4051478, member: 150226"
I've seen this claim over and over again, but has it been explicitly stated from AMD that GCN can't do more than 64 CUs/4096 SPs?
To my knowledge there is no architectural reason why it wouldn't be possible, but there is a very good reason why they don't do it; adding e.g. 50% SPs would increase the energy consumption by ~50% but only increase the performance by ~20-30% at best, because a GPU with more clusters would need more powerful scheduling, and to maintain higher efficiency than the predecessor it would require more than 50% better scheduling. The problem for GCN have always been management of resources, and this is the reason why GCN has fallen behind Nvidia. GCN have plenty of computational power, just not the means to harness it.
They haven't confirmed it, no (why would they? That'd be pretty much the same as saying "we can't compete in the high end until our next arch, no matter what! - and that's bad business strategy), but three subsequent generations with ~the same specs save for clocks, cache and other minor tweaks (in terms of real-world performance) does tell us something. What you're saying is not an argument for not scaling out the die - after all, pushing clocks puts just as much demand on scheduling as making a wider design. It might of course be that AMD would gain more by "rebalancing" their architecture by increasing the number of other components than SPs alone, but that's besides the point. Also, Nvidia has demonstrated pretty well that your scaling numbers are on the pessimistic side. With a similar node shrink (28nm to 12nm) and two small-to-medium architecture updates they've increased the CUDA core count by 55%, increased clocks by ~60% (at least, depending on whether you look at real-world boost or not), and kept power draw at the same level. Of course there are highly complex technical reasons for why this works for Nvidia and not AMD, but claiming that AMD has deliberately chosen not to increase their CU count while their main competitor has increased theirs by 55% - and at the same time run off with the high-end GPU segment - sounds a bit like wishful thinking.

With all this being said, my Fury X is getting long enough in the tooth that I might still get one of these if they match or beat the 2070. But I'd really like for Arcturus(?) to arrive sooner rather than later.
Posted on Reply
#73
efikkan
Valantar, post: 4051507, member: 171585"
…but claiming that AMD has deliberately chosen not to increase their CU count while their main competitor has increased theirs by 55% - and at the same time run off with the high-end GPU segment - sounds a bit like wishful thinking.
The fact remains that AMD have plenty of computational performance, while Nvidia manages to squeeze more gaming performance out of less theoretical performance, because AMD chose to focus on "brute force" performance rather than efficiency.
Posted on Reply
#74
ZoneDymo
you would think people would welcome some competition of any kind...but no, instant negativity.
Posted on Reply
#75
unikin
If leak turns out to be true, this is how I would describe AMD - PC gamers relationship:

Posted on Reply
Add your own comment