• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Editorial On AMD's Raja Koduri RX Vega Tweetstorm

Between fairly strong DX12 and 4K performance and very strong 3D modeling performance in Blender VEGA represents a good value to me if I could manage to pick one up at their MSRP in the semi near future. However once Nvidia launches a new lineup forget it's too late in all likely hood.

The 4k performance isn't good enough to even mention.
 
The reality is AMD achieve wonders with their R&D budgets, but realistically they need to go the same route... if they can afford it.
But does it make sense to? Graphics are just math and Vega has math in spades. The problem was, and remains, that the render pipeline front end and backend of GCN isn't as efficient as Maxwell. Vega did make pretty significant gains there compared to Fiji so AMD is undeniably moving in the right direction but NVIDIA is able to move much faster.

AMD has required more compute units to get ahead of Maxwell. Vega only has about 512 more compute units compared to GTX 1080 Ti but it has 1,536 more than GTX 1080. Vega was always meant to take on GTX 1080, and it does, but AMD was blindsided by GTX 1080 Ti. This huge difference in compute units versus performance is where AMD gets it's significant lead in hashing and also why GTX 1080 has a huge margin in their favor in terms of performance per watt.

There's two paths forward for AMD:
1) Debut a Fiji-like chip: as big as the interposer can fit. It will be expensive, it will be power hungry, but it will be king until Volta comes.
2) Create a new, lean microarchitecture that attempts to squeeze every frame it can out of every watt.

Since Ryzen has been pushed out the door, my hope is that AMD fully commits to #2.
 
Radeon Pro SSG just needs to trickle down to consumers quickly with less storage or less M.2 connectors other than that reducing power and increasing clock speeds as usual though they don't particularly needs anymore cores at present they need to beef up some other aspects more heavily.
 
Well you may argue that Ryzen 7 is not up to 7700K level of gaming performance, but remember that while 7700K is breaking every drop of sweat in BF1, Ryzen is doing it at 50% core load.

Ryzen has come a long way tho, comes pretty damn close to the 7700k and for a lot less money.
 
In all honesty, if Vega 64 AiO was priced roughly as much as beefy aftermarket air cooled GTX 1080, I'd have absolutely no objections. Power consumption actually isn't of such an issue as it was parroted around by everyone (using Turbo numbers which are just pointless). But pricing it at beefy aftermarket air cooled GTX 1080Ti is something not even I could get past regardless of everything else. And I really wanted to own an RX Vega 64. So, all the talks about price/performance/$ is just pointless at this point quite frankly. You don't need a fancy chart with 30 graphic cards to see this. All you need to know price and performance of both competitive card from opposite camp and you can see something just doesn't add up.
 
When I spend 500+ on a GPU I expect it to be at least tuned good enough to use directly. If I can overclock it then great, icing on the cake.

RX Vega on the hand is already maxed out in terms of thermal profile, power consumption and MHz. How the hell should the end user adjust it to make it an appealing and somewhat OK gaming card. It is one thing to like a brand and it is another to hand out free passes when they simply failed to deliver a good enough gaming GPU for the current market.

On top of all that, RTG locked Vega's BIOS. Meaning there will be no way to implement any actual BIOS based modifications. RX Vega is a failure no matter what way you spin the story.

Also adding onto the "compute" market of Vega. Well good luck with RTG's next to non-existent technical support. In a market already has wide spread adoption of CUDA, it would be pretty fun to see how much RTG can carve out by utilizing their "Open standard Free stuff" strategy.

What you are explaining is a reference card, which is the only Vega on the market. If you didn't already know, what you described also applies to many of Nvidia's reference cards as well. If you had complained about the power draw that would have been valid but clearly Vega has more breathing room. I've already seen multiple videos of large performance increases when put under water as seen on GamersNexus.

"On top of all that, RTG locked Vega's BIOS. Meaning there will be no way to implement any actual BIOS based modifications. RX Vega is a failure no matter what way you spin the story."

Lol, no

https://www.techpowerup.com/236632/...lash-no-unlocked-shaders-improved-performance
https://forum.ethereum.org/discussion/15024/hows-it-hashin-vega-people

It hasn't even been very long and people can easily flash their Vega BIOS.

There were plenty of other things you could have shit on Vega for but you literally choose non-issues.
 
Like Fiji, all those compute units are fantastic for compute workloads but the frontend and backend of the render pipeline isn't capable of saturating them in most cases. Think of Vega as a faster Fiji and...well...pretty much everything derived from that fits (especially heavy bias towards high resolutions).
 
Like Fiji, all those compute units are fantastic for compute workloads but the frontend and backend of the render pipeline isn't capable of saturating them in most cases. Think of Vega as a faster Fiji and...well...pretty much everything derived from that fits (especially heavy bias towards high resolutions).
One more reason for saturation issues, GCN compute unit is way too asymmetrical, not enough granularity and has too many special purpose modules ... let me illustrate:
GCN.png

  • special units for integers and special for float vectors, opposed to each cuda core having both alus inside
  • too many special purpose decode hardware blocks, opposed to one unit that knows to decode all and shares internal logic for all
  • too many special purpose cache units connected to its special purpose block, opposed to more flexible approach with bigger unified shared cache pool and bigger multipurpose and unified local caches
Basically it's a low-latency throughput favoring design that is wasteful and inflexible. Based on the type of the code running, at some particular moment, bunch of the units are doing nothing still being fully powered on to maybe do something useful in the next clock cycle. To gracefully saturate GCN (both peak efficiency and 100% usage) you should have right ratio of int/float instructions and right amount of memory operations sprinkled through code :laugh: ... which is incidentally easier to do using async compute
 
Yep! Count those pennies! :)
Again,I'm stunned reading such worthless comments from a person who calls themselves a reviewer. It's not about the electricity bill but the heat that is produced and needs to get dumped out of the case and noise levels that cpu and case cooling maintains while doing so.
 
How it should´ve been done !!!

perfwatt-2560-1440-ballanced.png
 
AMD sure has a lot of weird folks putting thing in the wrong directions, not to mention how much confusion they've made when Vega was just mere days or weeks till release day.
 
One more reason for saturation issues, GCN compute unit is way too asymmetrical, not enough granularity and has too many special purpose modules ... let me illustrate:
View attachment 91697
  • special units for integers and special for float vectors, opposed to each cuda core having both alus inside
  • too many special purpose decode hardware blocks, opposed to one unit that knows to decode all and shares internal logic for all
  • too many special purpose cache units connected to its special purpose block, opposed to more flexible approach with bigger unified shared cache pool and bigger multipurpose and unified local caches
Basically it's a low-latency throughput favoring design that is wasteful and inflexible. Based on the type of the code running, at some particular moment, bunch of the units are doing nothing still being fully powered on to maybe do something useful in the next clock cycle. To gracefully saturate GCN (both peak efficiency and 100% usage) you should have right ratio of int/float instructions and right amount of memory operations sprinkled through code :laugh: ... which is incidentally easier to do using async compute
Got a similar diagram for Pascal generation CUDA core? Everything I'm finding is overly simplified.


AMD effectively has a 4096 core co-processor which is why it is fantastic at compute workloads (async and otherwise). Problem is, rendering operates on wave fronts that are very synchronous. I think you're fundamentally right: these two things are at odds with each other. AMD needs to make a new architecture that is mostly synchronous with only some cores capable of async work.
 
Yes, FordGT90Concept, rendering is a pipeline of synchronized tasks, essentially making rendering a specialized synchronous compute workload.
The design mistake with Vega is making the cores even more complex than Fiji, which requires higher voltage to operate, and a longer pipeline which makes it less efficient in dynamic workloads such as gaming. This is also why Vega has lower "IPC" than Fiji.
 
Ryzen has come a long way tho, comes pretty damn close to the 7700k and for a lot less money.

7700k is Intel's VEGA if you think about it. It manages an extremely limited performance win over anything clocked 0,5 - 1 Ghz lower. You gain what, 5% in game fps for 20% clock differences

This is precisely why I'm looking at an i7 5775c instead. It almost matches 7700k, with exceptions actually *favoring* the 5775c by a serious margin, and just needs 4.2 Ghz to do so (versus a 5 Ghz OC on the other) and runs 15-20C cooler. Its curious though that when its a CPU, nobody pays it any mind, when its an AMD GPU, its a horrible product ;) The entire Kaby Lake gen consists of Skylake CPUs with a clock bump that puts them outside their ideal perf/watt.

Food for thought...
 
Yup, I actually think both 5775c and 6700K are a better buy than 7700K. It's better to get a Z170 board with 6700K and invest the difference in faster DDR4 than buy Z270,7700K and use something like 3000MHz DDR4.
 
Yup, I actually think both 5775c and 6700K are a better buy than 7700K. It's better to get a Z170 board with 6700K and invest the difference in faster DDR4 than buy Z270,7700K and use something like 3000MHz DDR4.

Really the only thing holding me back is the dead platform. I'm struggling already to find a nice board. And then there's coffee lake bringing 6 core to mainstream, which also seems a very good move to make. Dilemma's...
 
the one thing I dont get is all this talk about undervolting yet overclocking.
Is that just a fluke of some people? because if that is just totally possible then why does it not ship like that in the first place?
That has always been AMDs Achilles heel, they've just about done that with every card in like the last 5 years or so. Best case example is XFXs 480 gtr, best 480 on the block imho. Gamers Nexus did an analysis of this and basically came to the conclusion that its a maximization(guarantee)that all cards should work given this voltage, not that all cards need that voltage to operate. I think it's a lazy approach in AMDs part, it's like what we're seeing now everything needs optimization because they didn't do it on the front end, so now they have to play catch up. A lot of Vega owners are undervolting their cards while overclocking at the same time. What's crazy is Vega is more bandwidth starved that core speed. Increasing the HBM speed nets you better return than just pushing for higher core clocks. The shameful post imo is that pushing the hbm speeds higher results in good gains while keeping power consumption relatively the same.

Why did AMD not see this? If they did why weren't memory clocks increased?

This probably is probably where Vega would shine in perf/watt. Can you imagine VEGA @1400-1600 core speed with HBM speeds in the same range?
I think that would've been a more compelling release.
What do you all think?
 
@Vayra86
I'd favor covfefe lake in your situation. You need a new platform anyway. Good,new Z97X boards cost a lot now since they're scarce, they severely limit m.2 nvme drives as well with pci-e 2.0 m.2 slots and dmi 2.0. Z170/270 have better nvme ssd support since you can run two 32gb/s m.2 ssds on a decent z270 boards but 5775c outperforms 6700k/7700k in terms of performance/thermals and power draw. 6c/12t i7 will have very good efficiency due to what I described in one of me pervious posts. Even if 8700K will have marginal performance improvement over 7700K it will be able to maintain better efficiency due to lower load on cores in gaming.
 
That has always been AMDs Achilles heel, they've just about done that with every card in like the last 5 years or so. Best case example is XFXs 480 gtr, best 480 on the block imho. Gamers Nexus did an analysis of this and basically came to the conclusion that its a maximization(guarantee)that all cards should work given this voltage, not that all cards need that voltage to operate. I think it's a lazy approach in AMDs part, it's like what we're seeing now everything needs optimization because they didn't do it on the front end, so now they have to play catch up. A lot of Vega owners are undervolting their cards while overclocking at the same time. What's crazy is Vega is more bandwidth starved that core speed. Increasing the HBM speed nets you better return than just pushing for higher core clocks. The shameful post imo is that pushing the hbm speeds higher results in good gains while keeping power consumption relatively the same.

Why did AMD not see this? If they did why weren't memory clocks increased?

This probably is probably where Vega would shine in perf/watt. Can you imagine VEGA @1400-1600 core speed with HBM speeds in the same range?
I think that would've been a more compelling release.
What do you all think?

Yes its what I was also saying that surprised me earlier. You can read Raja's tweets and see his surprise at others finding the much better perf/watt delta for HIS OWN products. Its ridiculous, it speaks volumes of the level of dedication they have over at RTG. This is not an R&D problem, its a company culture and people problem. It also echoes everything we've seen for years now on GCN: bad decision making, bad marketing, overselling and misrepresenting your products, and bad optimization all over the place. It also echoes AMD's eternal 'more hardware to brute force software hurdles' problem.
 
Yup, I actually think both 5775c and 6700K are a better buy than 7700K.

I think I've said this, but Poland seems to be the only place where Broadwell a) exists and b) wasn't/isn't €100+ more than an i7K. In the vast majority of the market they were non existent, and getting an LGA1150 platform these days is just dumb, unless you get a good deal on an old system and somehow get a hold of the 5775c for less than the €200ish the i7K's usually go for these days.
 
Got a similar diagram for Pascal generation CUDA core? Everything I'm finding is overly simplified.
Yeah, I suppose all you find is simplified diagram is with all units stacked without any diagram interconnect arrows:
NVIDIA-Pascal-SMP.jpg
No actual diagram for pascal but here is similar setup in maxwell gpu - with all the arrows:
IMG0044019_1.png
and the cuda core itself didn't change much since fermi afaik:
cuda-core.gif

As you can see, Nvidia doesn't have fetch/decode/dispatch machinery around every 1 scalar + 4 simd units ... they have it around 32 versatile simd/scalar cores.

There is another benefit for having a small GCN compute unit beside async and that's having better yields when salvaging dies for lesser skus. When silicon is bad inside nvidia SM, the whole SM goes away (unless it's gtx 970 as we all know :laugh:)
 
Last edited:
Yes its what I was also saying that surprised me earlier. You can read Raja's tweets and see his surprise at others finding the much better perf/watt delta for HIS OWN products. Its ridiculous, it speaks volumes of the level of dedication they have over at RTG. This is not an R&D problem, its a company culture and people problem. It also echoes everything we've seen for years now on GCN: bad decision making, bad marketing, overselling and misrepresenting your products, and bad optimization all over the place. It also echoes AMD's eternal 'more hardware to brute force software hurdles' problem.
I'll give them til' Navi to see, as it seems Navi is the RTGs Ryzen. Vega is an awesome card. I was able to get 2, both at the $500 price tag. It is smooth on my non freesync panel. I'm currently undergoing a system overhaul on both my pcs and can't bench or test anything which sucks. But in due time i think Vega will be at least 10%faster given or take 3 to 6 months. Sadly AMD keeps repeating this cycle, i thought they learnt from the Polaris release and even so Ryzen. We all knew something was up when Vega was suppose to be released right after TR but the reviewers got the cards like literally 3 days before release date. Then being told to focus on the 56 and not the 64 being released first. Shameful on AMDs part. I just hope they get their act together sooner than later. I (we all) need more/better competition. I'll rock out with Vega until Navi shows itself though. Another sad part is because of AMDs marketing or lack there of even when they have the better option we as consumers don't support them. To some degree i feel us add commanders are partly to blame for this.

The trend with Ryzen is fundamentally breaking that mold but even then i think people are more tired of Intels games more than there actual appeal of Ryzen. Odd way to look at things and maybe even small minded of me, but i can't not think about the Athlon Era of cpus. AMD clearly had the better product yet ppl willingly bought Intel. Same goes for the 5870/5970 Era of gpus. They were the best at just about every level yet consumers still bought Nvidia.
 
The design mistake with Vega is making the cores even more complex than Fiji, which requires higher voltage to operate, and a longer pipeline which makes it less efficient in dynamic workloads such as gaming. This is also why Vega has lower "IPC" than Fiji.
Vega has about 10% higher IPC than Fiji with some 50% higher clockspeed to boot.

Yeah, I suppose all you find is simplified diagram is with all units stacked without any diagram interconnect arrows:
View attachment 91700
No actual diagram for pascal but here is similar setup in maxwell gpu - with all the arrows:
View attachment 91701
and the cuda core itself didn't change much since fermi afaik:
View attachment 91702
As you can see, Nvidia doesn't have fetch/decode/dispatch machinery around every 1 scalar + 4 simd units ... they have it around 32 versatile simd/scalar cores.
But NVIDIA doesn't really give us any details on what's inside the cores except the obvious. In one clock, Vega can theoretically do 4096 scalar and 16,384 SIMD operations. That's not a weakness.
 
Back
Top