AMD's Next-Generation Radeon Instinct "Arcturus" Test Board Features 120 CUs

AleksandarK · Apr 23, 2020

AMD is preparing to launch its next-generation of Radeon Instinct GPUs based on the new CDNA architecture designed for enterprise deployments. Thanks to the popular hardware leaker _rogame (@_rogame) we have some information about the configuration of the upcoming Radeon Instinct MI100 "Arcturus" server GPU. Previously, we obtained the BIOS of the Arcturus GPU that showed a configuration of 128 Compute Units (CUs), which resulted in 8,192 of CDNA cores. That configuration had a specific setup of 1334 MHz GPU clock, SoC frequency of 1091 MHz, and memory speed of 1000 MHz. However, there was another GPU test board spotted which featured a bit different specification.

The reported configuration is an Arcturus GPU with 120 CUs, resulting in a CDNA core count of 7,680 cores. These cores are running at frequencies of 878 MHz for the core clock, 750 MHz SoC clock, and a surprising 1200 MHz memory clock. While the SoC and core clocks are lower than the previous report, along with the CU count, the memory clock is up by 200 MHz. It is important to note that this is just a test board/variation of the MI100, and actual frequencies should be different.

View at TechPowerUp Main Site

Imsochobo · Apr 23, 2020

lynx29 said:
I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.

this wont be in your hand, no problems

this is a datacenter only card.

ARF · Apr 23, 2020

Imsochobo said:
this wont be in your hand, no problems
this is a datacenter only card.

Yes, Navi 2X will be the gaming-centric lineup, while Arcturus is for High Performance Computing (HPC) only.

lynx29 said:
I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.

The drivers are fine.

Cheeseball · Apr 23, 2020

ARF said:
The drivers are fine.

Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.

AnarchoPrimitiv · Apr 23, 2020

This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast

ARF · Apr 23, 2020

Cheeseball said:
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.

What do you mean? What problems do you have and have you reported them via the support centre?

windwhirl · Apr 23, 2020

ARF said:
What do you mean? What problems do you have and have you reported them via the support centre?

Probably the somewhat large list of known issues:

Shatun_Bear · Apr 23, 2020

AnarchoPrimitiv said:
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast

This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.

midnightoil · Apr 23, 2020

windwhirl said:
Probably the somewhat large list of known issues:
View attachment 152494

Ever visited the NVIDIA forums? There are vast numbers of issues.

The 2020 drivers were initially terrible. They're very stable now, though the Radeon Software control panel still isn't 100% - why they feel the need to remove or rejig half its content with each Adrenaline release, then re-add it, I will never understand.

Imsochobo said:
this wont be in your hand, no problems
this is a datacenter only card.

You could buy one. But it probably won't be much good to you. It's not like Radeon VII or Titan Volta .. there's no raster engine. These are dedicated HPC / ML cards. Not rendering / graphics acceleration.

It will be interesting to see what form future Fire or Pro cards take from AMD.

AnarchoPrimitiv said:
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast

RDNA1 could have had more. This was confirmed almost a year ago at Computex. Changes in RDNA (& now CDNA) meant that configurations above 64 CUs would no longer suffer severe bottlenecks.

Deleted member 50521 · Apr 23, 2020

For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.

midnightoil · Apr 23, 2020

xkm1948 said:
For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.

If it were as bad as you make out, I don't think they'd be winning the huge contracts that they are doing ...

ARF · Apr 23, 2020

windwhirl said:
Probably the somewhat large list of known issues:
View attachment 152494

Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.

WeeRab · Apr 23, 2020

windwhirl said:
Probably the somewhat large list of known issues:
View attachment 152494

i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.

windwhirl · Apr 23, 2020

ARF said:
Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.

WeeRab said:
i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.

I was referencing mostly Cheeseball's post. Honestly, in spite of running a preview build of Windows 10 and using the Radeon beta drivers, I haven't run into problems, granted, it's an RX 580, so I guess it's mostly polished by now, but so far it's been rather solid (or I simply don't fit into the scenarios where problems happen)

Valantar · Apr 23, 2020

Cheeseball said:
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.

Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

AnarchoPrimitiv said:
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast

The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.

Shatun_Bear said:
This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.

That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.

claylomax · Apr 23, 2020

WeeRab said:
My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.

Wasn't the GTX 960 released in 2015?

M2B · Apr 23, 2020

Valantar said:
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.

That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.

That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.

Cheeseball · Apr 23, 2020

Valantar said:
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

This is in regards to the newer Navi cards. I can attest that the drivers work for my RX 5700 XT now but as you can see in various forums there are still various reports of TDR/blackscreen issues that AMD is doing their best to address. The older Polaris and Vega cards should be doing fine now but I have also seen some recent reports that say otherwise.

I've experienced the same issues when I had a HD 7870 "XT" (1536 shaders) Tahiti LE back in 2013. They eventually got better over time (addressing bugs) and had further improvements in some games ("Fine Wine").

windwhirl · Apr 23, 2020

claylomax said:
Wasn't the GTX 960 released in 2015?

Time-traveling cards? No wonder Nvidia has been the top brand :laugh:

Cheeseball · Apr 23, 2020

ARF said:
What do you mean? What problems do you have and have you reported them via the support centre?

I've had problems with my RX 5700 XT last year, and with @INSTG8R and other Vanguard members help I was able to provide information to AMD's development team directly, which may have contributed to the TDR/blackscreen fixes in 19.10.x on-wards.

I have no problems with the earlier 20.x releases, except for Enhanced Sync which AMD keeps messing up for some reason. However I've foregone the mainline release for the more stable Radeon Pro drivers (20.Q1.2) since I don't have to deal with any of the extra bloat that Adrenalin installs (which I've been reporting to keep separate during install).

Like I said, Adrenalin 2020 is not fine yet. Since you're running on GCN5 hardware, you shouldn't be experiencing any major issues compared to some of the Navi owners.

ARF said:
Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.

Are you talking about me? Please review my System Specs if you're in doubt. I even have a "unique" configuration.

ARF · Apr 23, 2020

M2B said:
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.

Vega 64 is bad because its shaders are not fed properly. You have 40-50% of its shaders not receiving any work to do and thus sit idle.
This is part because of its design with many compromises - it was designed for higher throughput which is good for pure number crunching in high performance computing loads but games unfortunately don't care too much about it.

Valantar · Apr 24, 2020

M2B said:
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.

Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.

M2B said:
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.

Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say

Valantar said:
AMD couldn't [increase the number of CUs] and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost.

This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.

M2B · Apr 24, 2020

Valantar said:
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.

Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say

This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.

First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)

Valantar · Apr 25, 2020

M2B said:
First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)

View attachment 152596

You're right about that 50% number - I mixed up the numbers for the 2060 Max-Q and the normal mobile 2060 - too many open tabs at once I guess. The 2060 MQ also seems to run abnormally slow compared to other MQ models. The normal 2060 is not that far behind an 80W 2080 MQ - about 10% . Still, if higher clocks improved performance more than more CUDA cores, at 80W for both the 2060 ought then to outperform the 2080 Max-Q, which it doesn't - it's noticeably behind. After all, in that comparison you have two GPUs with the same architecture, with the smaller GPU having more memory bandwidth per shader and higher clocks, so if clock speeds improved performance more than a wider GPU layout, the smaller GPU would thus be faster. Binning of course has some effect on this, but not to the tune of explaining away a performance difference of that size.

And while I never said that scaling with more shaders is even close to linear, increased shader counts is responsible for the majority of GPU performance uplift over the past decade - far more than clock speeds, which have increased by less than 3x while shader counts have increased by ~8x and total GPU performance by >5x. Any GPU OC exercise will show that performance scaling with clock speed increases is far below linear - often to the tune of half or less than half in terms of perf % increase vs. clock % increase even when also OC'ing memory. The best balance for increasing performance generation over generation is obviously a combination of both, but in the case of Vega AMD couldn't do that, and instead only pushed clocks higher. The lack of shader count increases forced them to push clocks far past the efficiency sweet spot of that arch+node combo, tanking efficiency in an effort to maximize absolute performance - they had nowhere else to go and a competitor with a significant lead in absolute performance, after all.

The same happened again with the VII; clocks were pushed far beyond the efficiency sweet spot as they couldn't increase the CU count (at this point we were looking at a 331 mm2 die, so they could easily have added 10-20 CUs if they had the ability and stayed within a reasonable die size). Now, the VII has 60 CUs active and not 64, but that is down to nothing more than this GPU being a PR move with likely zero margins utilizing salvaged dice, with all fully enabled Vega 20 dice going to compute accelerators where there was actual money to be made. On the other hand, comparing it with the Vega 56, you have 9.3% more shaders, ~20% higher clock speeds (base - the boost speed difference is larger) and >2x the memory bandwidth, yet it only delivers ~33% more performance. For a severely memory limited arch like Vega, that is a rather poor showing. And again, at the same wattage they could undoubtedly have increased performance more with some more shaders running at a lower speed.

As for AMD saying there was no hard architectural limit of 64 shaders: source, please? Not increasing shader counts at all across three generations and three production nodes while the competition increases theirs by 55% is proof enough that AMD couldn't increase theirs without moving away from GCN.

Processor	9800x3D\| 5800x \| 4800H \| Rog ally
Motherboard	Gb x870 Aorus Elite ice \| Asrack x470d4u \| Asus Tuf A15
Cooling	Air \| Air \| duh laptop
Memory	64gb G.skill SniperX @3600 CL16 \| 128gb \| 32GB \| 192gb
Video Card(s)	RTX 4080 \|Quadro P5000 \| RTX2060M
Storage	Many drives
Display(s)	AW3423dwf.
Case	Jonsbo D41
Power Supply	Corsair RM850x
Mouse	g502x Lightspeed
Keyboard	G913 tkl
Software	win11, proxmox

System Name	Titan
Processor	AMD Ryzen™ 7 7950X3D / AMD Ryzen™ 7 9800X3D
Motherboard	ASRock X870 Taichi Lite
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	G.SKILL Flare X5 Series 2x48GB DDR5-6000 CL30
Video Card(s)	ASRock Steel Legend RX 9070 XTX 16 GB GDDR6 / NVIDIA RTX 5090 FE
Storage	Crucial T500 2TB x 4
Display(s)	LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case	Cooler Master QUBE 500 Flatpack Macaron
Audio Device(s)	HyperX Cloud 3 Wireless
Power Supply	Corsair SF1000
Mouse	Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard	Keychron K2 HE Wireless / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD	Meta Quest 3 512GB
Software	Windows 11 Pro 64-bit 24H2 Build 26100.4061

System Name	Lightbringer
Processor	Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming
Cooling	Enermax Liqmax Iii 360mm AIO
Memory	G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s)	Sapphire RX 5700XT Nitro+
Storage	Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s)	LG 34BK95U-W 34" 5120 x 2160
Case	Lian Li PC-O11 Dynamic (White)
Power Supply	BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse	Glorious Model O (Matte White)
Keyboard	Royal Kludge RK71
Software	Windows 10

System Name	System V
Processor	AMD Ryzen 7 9700X
Motherboard	ASRock X670E Pro Rs
Cooling	Deepcool AK620 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory	2x16GB Kingston 6400MT CL32
Video Card(s)	Gigabyte AORUS Radeon RX 580 8 GB
Storage	SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s)	LG 22MP55 IPS Display
Case	NZXT Source 210
Audio Device(s)	Logitech G430 Headset
Power Supply	XPG Core Reactor 750 W
Software	Whatever build of Windows 11 is being served in Canary channel at the time.

System Name	System V
Processor	AMD Ryzen 7 9700X
Motherboard	ASRock X670E Pro Rs
Cooling	Deepcool AK620 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory	2x16GB Kingston 6400MT CL32
Video Card(s)	Gigabyte AORUS Radeon RX 580 8 GB
Storage	SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s)	LG 22MP55 IPS Display
Case	NZXT Source 210
Audio Device(s)	Logitech G430 Headset
Power Supply	XPG Core Reactor 750 W
Software	Whatever build of Windows 11 is being served in Canary channel at the time.

AMD's Next-Generation Radeon Instinct "Arcturus" Test Board Features 120 CUs

AleksandarK

News Editor

Imsochobo

ARF

Cheeseball

Not a Potato

AnarchoPrimitiv

ARF

windwhirl

Shatun_Bear

midnightoil

Deleted member 50521

Guest

midnightoil

ARF

WeeRab

windwhirl

Valantar

claylomax

M2B

Cheeseball

Not a Potato

windwhirl

Cheeseball

Not a Potato

ARF

Valantar

M2B

Valantar

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Jaspe
Processor	Ryzen 1500X
Motherboard	Asus ROG Strix X370-F Gaming
Cooling	Stock
Memory	16Gb Corsair 3000mhz
Video Card(s)	EVGA GTS 450
Storage	Crucial M500
Display(s)	Philips 1080 24'
Case	NZXT
Audio Device(s)	Onboard
Power Supply	Enermax 425W
Software	Windows 10 Pro

Processor	Intel Core i5-8600K @4.9GHz
Motherboard	MSI Z370 Gaming Pro Carbon
Cooling	Cooler Master MasterLiquid ML240L RGB
Memory	XPG 8GBx2 - 3200MHz CL16
Video Card(s)	Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage	2x Samsung 850 EVO 1TB
Display(s)	BenQ PD3200U
Case	Thermaltake View 71 Tempered Glass RGB Edition
Power Supply	EVGA 650 P2