Thursday, April 23rd 2020

AMD's Next-Generation Radeon Instinct "Arcturus" Test Board Features 120 CUs

AMD is preparing to launch its next-generation of Radeon Instinct GPUs based on the new CDNA architecture designed for enterprise deployments. Thanks to the popular hardware leaker _rogame (@_rogame) we have some information about the configuration of the upcoming Radeon Instinct MI100 "Arcturus" server GPU. Previously, we obtained the BIOS of the Arcturus GPU that showed a configuration of 128 Compute Units (CUs), which resulted in 8,192 of CDNA cores. That configuration had a specific setup of 1334 MHz GPU clock, SoC frequency of 1091 MHz, and memory speed of 1000 MHz. However, there was another GPU test board spotted which featured a bit different specification.

The reported configuration is an Arcturus GPU with 120 CUs, resulting in a CDNA core count of 7,680 cores. These cores are running at frequencies of 878 MHz for the core clock, 750 MHz SoC clock, and a surprising 1200 MHz memory clock. While the SoC and core clocks are lower than the previous report, along with the CU count, the memory clock is up by 200 MHz. It is important to note that this is just a test board/variation of the MI100, and actual frequencies should be different.
AMD Radeon Instinct MI60
Source: @_rogame (Twitter)
Add your own comment

23 Comments on AMD's Next-Generation Radeon Instinct "Arcturus" Test Board Features 120 CUs

#1
Imsochobo
lynx29
I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.
this wont be in your hand, no problems :P
this is a datacenter only card.
Posted on Reply
#2
ARF
Imsochobo
this wont be in your hand, no problems :p
this is a datacenter only card.
Yes, Navi 2X will be the gaming-centric lineup, while Arcturus is for High Performance Computing (HPC) only.
lynx29
I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.
The drivers are fine.
Posted on Reply
#3
Cheeseball
Not a Potato
ARF
The drivers are fine.
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.
Posted on Reply
#4
AnarchoPrimitiv
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
Posted on Reply
#5
ARF
Cheeseball
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.
What do you mean? What problems do you have and have you reported them via the support centre?
Posted on Reply
#6
windwhirl
ARF
What do you mean? What problems do you have and have you reported them via the support centre?
Probably the somewhat large list of known issues:
Posted on Reply
#7
Shatun_Bear
AnarchoPrimitiv
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.
Posted on Reply
#8
midnightoil
windwhirl
Probably the somewhat large list of known issues:

Ever visited the NVIDIA forums? There are vast numbers of issues.

The 2020 drivers were initially terrible. They're very stable now, though the Radeon Software control panel still isn't 100% - why they feel the need to remove or rejig half its content with each Adrenaline release, then re-add it, I will never understand.
Imsochobo
this wont be in your hand, no problems :p
this is a datacenter only card.
You could buy one. But it probably won't be much good to you. It's not like Radeon VII or Titan Volta .. there's no raster engine. These are dedicated HPC / ML cards. Not rendering / graphics acceleration.

It will be interesting to see what form future Fire or Pro cards take from AMD.
AnarchoPrimitiv
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
RDNA1 could have had more. This was confirmed almost a year ago at Computex. Changes in RDNA (& now CDNA) meant that configurations above 64 CUs would no longer suffer severe bottlenecks.
Posted on Reply
#9
xkm1948
For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.
Posted on Reply
#10
midnightoil
xkm1948
For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.
If it were as bad as you make out, I don't think they'd be winning the huge contracts that they are doing ...
Posted on Reply
#11
ARF
windwhirl
Probably the somewhat large list of known issues:

Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.
Posted on Reply
#12
WeeRab
windwhirl
Probably the somewhat large list of known issues:

i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.
Posted on Reply
#13
windwhirl
ARF
Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.
WeeRab
i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.
I was referencing mostly Cheeseball's post. Honestly, in spite of running a preview build of Windows 10 and using the Radeon beta drivers, I haven't run into problems, granted, it's an RX 580, so I guess it's mostly polished by now, but so far it's been rather solid (or I simply don't fit into the scenarios where problems happen)
Posted on Reply
#14
Valantar
Cheeseball
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.
AnarchoPrimitiv
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.
Shatun_Bear
This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.
That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.
Posted on Reply
#15
claylomax
WeeRab
My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
Wasn't the GTX 960 released in 2015?
Posted on Reply
#16
M2B
Valantar
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.

That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
Posted on Reply
#17
Cheeseball
Not a Potato
Valantar
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.
This is in regards to the newer Navi cards. I can attest that the drivers work for my RX 5700 XT now but as you can see in various forums there are still various reports of TDR/blackscreen issues that AMD is doing their best to address. The older Polaris and Vega cards should be doing fine now but I have also seen some recent reports that say otherwise.

I've experienced the same issues when I had a HD 7870 "XT" (1536 shaders) Tahiti LE back in 2013. They eventually got better over time (addressing bugs) and had further improvements in some games ("Fine Wine").
Posted on Reply
#18
windwhirl
claylomax
Wasn't the GTX 960 released in 2015?
Time-traveling cards? No wonder Nvidia has been the top brand :laugh:
Posted on Reply
#19
Cheeseball
Not a Potato
ARF
What do you mean? What problems do you have and have you reported them via the support centre?
I've had problems with my RX 5700 XT last year, and with @INSTG8R and other Vanguard members help I was able to provide information to AMD's development team directly, which may have contributed to the TDR/blackscreen fixes in 19.10.x on-wards.

I have no problems with the earlier 20.x releases, except for Enhanced Sync which AMD keeps messing up for some reason. However I've foregone the mainline release for the more stable Radeon Pro drivers (20.Q1.2) since I don't have to deal with any of the extra bloat that Adrenalin installs (which I've been reporting to keep separate during install).

Like I said, Adrenalin 2020 is not fine yet. Since you're running on GCN5 hardware, you shouldn't be experiencing any major issues compared to some of the Navi owners.
ARF
Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.
Are you talking about me? Please review my System Specs if you're in doubt. I even have a "unique" configuration.
Posted on Reply
#20
ARF
M2B
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
Vega 64 is bad because its shaders are not fed properly. You have 40-50% of its shaders not receiving any work to do and thus sit idle.
This is part because of its design with many compromises - it was designed for higher throughput which is good for pure number crunching in high performance computing loads but games unfortunately don't care too much about it.
Posted on Reply
#21
Valantar
M2B
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.
M2B
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say
Valantar
AMD couldn't [increase the number of CUs] and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost.
This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.
Posted on Reply
#22
M2B
Valantar
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.

Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say

This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.
First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)


Posted on Reply
#23
Valantar
M2B
First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)



You're right about that 50% number - I mixed up the numbers for the 2060 Max-Q and the normal mobile 2060 - too many open tabs at once I guess. The 2060 MQ also seems to run abnormally slow compared to other MQ models. The normal 2060 is not that far behind an 80W 2080 MQ - about 10% . Still, if higher clocks improved performance more than more CUDA cores, at 80W for both the 2060 ought then to outperform the 2080 Max-Q, which it doesn't - it's noticeably behind. After all, in that comparison you have two GPUs with the same architecture, with the smaller GPU having more memory bandwidth per shader and higher clocks, so if clock speeds improved performance more than a wider GPU layout, the smaller GPU would thus be faster. Binning of course has some effect on this, but not to the tune of explaining away a performance difference of that size.

And while I never said that scaling with more shaders is even close to linear, increased shader counts is responsible for the majority of GPU performance uplift over the past decade - far more than clock speeds, which have increased by less than 3x while shader counts have increased by ~8x and total GPU performance by >5x. Any GPU OC exercise will show that performance scaling with clock speed increases is far below linear - often to the tune of half or less than half in terms of perf % increase vs. clock % increase even when also OC'ing memory. The best balance for increasing performance generation over generation is obviously a combination of both, but in the case of Vega AMD couldn't do that, and instead only pushed clocks higher. The lack of shader count increases forced them to push clocks far past the efficiency sweet spot of that arch+node combo, tanking efficiency in an effort to maximize absolute performance - they had nowhere else to go and a competitor with a significant lead in absolute performance, after all.

The same happened again with the VII; clocks were pushed far beyond the efficiency sweet spot as they couldn't increase the CU count (at this point we were looking at a 331 mm2 die, so they could easily have added 10-20 CUs if they had the ability and stayed within a reasonable die size). Now, the VII has 60 CUs active and not 64, but that is down to nothing more than this GPU being a PR move with likely zero margins utilizing salvaged dice, with all fully enabled Vega 20 dice going to compute accelerators where there was actual money to be made. On the other hand, comparing it with the Vega 56, you have 9.3% more shaders, ~20% higher clock speeds (base - the boost speed difference is larger) and >2x the memory bandwidth, yet it only delivers ~33% more performance. For a severely memory limited arch like Vega, that is a rather poor showing. And again, at the same wattage they could undoubtedly have increased performance more with some more shaders running at a lower speed.

As for AMD saying there was no hard architectural limit of 64 shaders: source, please? Not increasing shader counts at all across three generations and three production nodes while the competition increases theirs by 55% is proof enough that AMD couldn't increase theirs without moving away from GCN.
Posted on Reply
Add your own comment