• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Possibly a better Vega: My take on what AMD should do

Joined
Jul 15, 2006
Messages
312 (0.07/day)
Likes
231
Location
Malaysia
Processor Intel i5 4670K 4.2GHz
Motherboard AsRock Z87 Extreme4
Cooling CoolerMaster Seidon 240M + GentleTyphoon D1225C
Memory 2x8GB Kingston DDR3 1600
Video Card(s) EVGA GeForce GTX 980 Ti 6GB SC+
Storage 120GB Samsung Evo 840 SSD+ 2TB Seagate Barracuda + 2TB Seagate Surveillance
Display(s) 24" LG 24MP59G
Case NZXT Phantom 240
Audio Device(s) Creative X-Fi Titanium HD modded
Power Supply Corsair CX750M
Mouse Logitech G400
Keyboard CM Storm QucikFire Pro
Software Windows 10 Pro x64
#1
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

both.jpg


As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

both2.jpg


Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

3072sp.jpg


It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
 
Joined
Jan 8, 2017
Messages
2,341 (4.70/day)
Likes
1,529
System Name Good enough
Processor AMD FX-6300 - 4.5 Ghz
Motherboard ASRock 970M Pro3
Cooling Scythe Katana 4 - 3x 120mm case fans
Memory 16GB - 4x4GB A-DATA 1866 Mhz (OC)
Video Card(s) ASUS GTX 1060 Turbo 6GB ~ 2139 Mhz / 9.4 Gbps
Storage 1x Samsung 850 EVO 250GB , 1x 1 Tb Seagate something or other
Display(s) 1080p TV
Case Zalman R1
Power Supply 500W
#2
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
 
Joined
Aug 8, 2015
Messages
55 (0.05/day)
Likes
23
Location
Finland
System Name Gaming rig
Processor Core i7 6700K @4.5Ghz turbo / 1.4v delidded
Motherboard Asus Maximus VIII Hero
Cooling Noctua NH-D15S /w dual fans
Memory Corsair Vengeance LPX DDR4 3000 2x8GB
Video Card(s) MSI Gaming X RX 480 8GB
Storage Samsung 850 EVO 500GB, WD Black 4TB, WD Green 1,5TB
Display(s) Dell P2414H
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Software Windows 10 Pro
#3
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
Let's hope the 12nm refresh for Vega (and Zen) will give good results.
 
Joined
Dec 6, 2005
Messages
10,331 (2.27/day)
Likes
4,213
Location
Manchester, NH
System Name Working on it ;)
Processor I7-4790K (Stock speeds right now)
Motherboard MSI Z97 U3 Plus
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64 (Non Reference)
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x2Tb WD Black
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440)
Case Rosewill Challenger
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750 (love it)
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10 upgraded from Win 7 Pro
#4
Let's hope the 12nm refresh for Vega (and Zen) will give good results.
I read somewhere Vega "2" would be 7nm in 2018. More speculation no doubt.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
21,400 (5.38/day)
Likes
6,175
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
#5
This is just some info on what GCN truly is from the start to now.

https://en.m.wikipedia.org/wiki/Graphics_Core_Next
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

View attachment 94953

As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

View attachment 94954

Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

View attachment 94956

It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
 

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,772 (9.89/day)
Likes
18,664
Processor i7 8700K
Motherboard Asus Maximus Hero X WiFi
Cooling Water
Memory 16GB G.Skill 3200Mhz CL14
Video Card(s) GTX 1080
Storage SSD's
Display(s) Nixeus EDG27
Case Thermaltake Core X5
Audio Device(s) Soundblaster Zx
Power Supply Corsair H1000i
Mouse Zowie EC1-B
#6
At this point, hopefully it's an entirely new design that ditches HBM. They tried it, and it has obviously failed.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
21,400 (5.38/day)
Likes
6,175
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
#7
o_O
At this point, hopefully it's an entirely new design that ditches HBM. They tried it, and it has obviously failed.
Gcn being a compute heavy arch, they would need a hybrid that does great at both without the drawbacks of current gcn.
 
Joined
Dec 6, 2005
Messages
10,331 (2.27/day)
Likes
4,213
Location
Manchester, NH
System Name Working on it ;)
Processor I7-4790K (Stock speeds right now)
Motherboard MSI Z97 U3 Plus
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64 (Non Reference)
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x2Tb WD Black
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440)
Case Rosewill Challenger
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750 (love it)
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10 upgraded from Win 7 Pro
#8
Joined
Mar 18, 2008
Messages
3,138 (0.84/day)
Likes
2,175
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) Sapphire R9 Fury X
Storage Samsung 960 Pro 1TB, Crucial MX200 500GB
Display(s) Acer K272HUL, HTC Vive
Case Fractal Design R5
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
Software Windows 10 Professional/Linux Mint
#9
HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.
 
Joined
Aug 8, 2015
Messages
55 (0.05/day)
Likes
23
Location
Finland
System Name Gaming rig
Processor Core i7 6700K @4.5Ghz turbo / 1.4v delidded
Motherboard Asus Maximus VIII Hero
Cooling Noctua NH-D15S /w dual fans
Memory Corsair Vengeance LPX DDR4 3000 2x8GB
Video Card(s) MSI Gaming X RX 480 8GB
Storage Samsung 850 EVO 500GB, WD Black 4TB, WD Green 1,5TB
Display(s) Dell P2414H
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Software Windows 10 Pro
#10
I read somewhere Vega "2" would be 7nm in 2018. More speculation no doubt.
I doubt that, but 12nm versions should come out early 2018 (https://wccftech.com/amd-announces-2nd-gen-ryzen-vega-launching-12nm-2018/). Maybe we will see a new line with the lower end cards being replaced with smaller vega dies with gddr5(x)/gddr6 memory (speculation on my part)?

HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.
Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.
 
Joined
Dec 6, 2005
Messages
10,331 (2.27/day)
Likes
4,213
Location
Manchester, NH
System Name Working on it ;)
Processor I7-4790K (Stock speeds right now)
Motherboard MSI Z97 U3 Plus
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64 (Non Reference)
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x2Tb WD Black
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440)
Case Rosewill Challenger
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750 (love it)
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10 upgraded from Win 7 Pro
#11
I doubt that, but 12nm versions should come out early 2018 (https://wccftech.com/amd-announces-2nd-gen-ryzen-vega-launching-12nm-2018/). Maybe we will see a new line with the lower end cards being replaced with smaller vega dies with gddr5(x)/gddr6 memory (speculation on my part)?



Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.
Funny, wccftech in another article from today: https://wccftech.com/amd-navi-gpu-spotted-in-linux-drivers/amp/

...and yea, it's all speculation

1513625209312.png
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
21,400 (5.38/day)
Likes
6,175
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
#12
HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.
Bring back VLIW as a new version
 
Joined
Mar 18, 2008
Messages
3,138 (0.84/day)
Likes
2,175
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) Sapphire R9 Fury X
Storage Samsung 960 Pro 1TB, Crucial MX200 500GB
Display(s) Acer K272HUL, HTC Vive
Case Fractal Design R5
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
Software Windows 10 Professional/Linux Mint
#13
Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.
If it is not working by now then it probably will never work. The more this drags on the more it seems real that current Vega is indeed a beta product rushed out by RTG under Raja's command.

Or it is probably functional just fine. It is simply not as effeiecent as the Nvidia's tile rendering.

Either way only Vega refresh OR polaris+ will give us the answer.

A polaris+ with 30CU and updated tile rendering as well as beefed up geometry may surprise us in performance. Not to mention it will always be easier and cheaper to design a small die GPU instead of a monster sized one.

So my prediction is RTG now under Lisa Su will probably get a Polaris 30CU out first to compete with in the lower-mid tier in 2018.
 
Joined
Jan 8, 2017
Messages
2,341 (4.70/day)
Likes
1,529
System Name Good enough
Processor AMD FX-6300 - 4.5 Ghz
Motherboard ASRock 970M Pro3
Cooling Scythe Katana 4 - 3x 120mm case fans
Memory 16GB - 4x4GB A-DATA 1866 Mhz (OC)
Video Card(s) ASUS GTX 1060 Turbo 6GB ~ 2139 Mhz / 9.4 Gbps
Storage 1x Samsung 850 EVO 250GB , 1x 1 Tb Seagate something or other
Display(s) 1080p TV
Case Zalman R1
Power Supply 500W
#14
They might revamp GCN but I can assure you they wont change the core design. Not now when the whole industry is slowing switching to similar architectures.


Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.
Tile based rendering does not improve performance , it can only reduce bandwidth usage and as a result also power consumption. And it's not even a new thing like Nvidia said it is , mobile GPUs have used this technology for years.
 

erocker

Senior Moderator
Staff member
Joined
Jul 19, 2006
Messages
42,772 (9.89/day)
Likes
18,664
Processor i7 8700K
Motherboard Asus Maximus Hero X WiFi
Cooling Water
Memory 16GB G.Skill 3200Mhz CL14
Video Card(s) GTX 1080
Storage SSD's
Display(s) Nixeus EDG27
Case Thermaltake Core X5
Audio Device(s) Soundblaster Zx
Power Supply Corsair H1000i
Mouse Zowie EC1-B
#15
HBM is not the problem
I wonder why AMD has such a difficult time running their HBM at it's rated voltage if it's not a problem? I guess that goes to the entire power design of the cards. It's not good. Not compared to its only competition anyways. GCN/Vega needs to die and AMD needs to come up with something competetive. Or, better yet AMD should sell to someone more capable and perhaps stick to CPU's at this point.
 
Joined
Mar 10, 2010
Messages
5,319 (1.78/day)
Likes
1,796
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.5ghz1.525c NB2.44ghz Ht2.64ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 16Gb @1600 cas8
Video Card(s) Rx vega 64 waterblockedEK + Asus Dual OC gtx1060 6Gb
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, LG 49" 4K 60hz
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1200Hxi
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far 6600 3dmark
#16
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

View attachment 94953

As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

View attachment 94954

Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

View attachment 94956

It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
Your forgetting fabrication, bigger chip = less profit, im sure as shit Amd are on the right path , 7nm Navi will likely pave the way, or be the first mcm gpu ,think one to four vegas with 2-16 Gb of memory each on one asic , and Amd are way ahead of the competition with regards to the tech required so Navi onwards should be interesting.

Opinion not fact ty

In reply to others Vega 20 should be Amds Pitcairn style 1280 shader budget chip not vega 64*2.

And it is normal for new features to take time to implement , R9 290 owners just got hybrid sync , for example and tessellation was called something totally different when amd first hid it in silicon, i still eagerly await driver improvement for vega but who knows tbf.
 
Last edited:
Joined
Aug 8, 2015
Messages
55 (0.05/day)
Likes
23
Location
Finland
System Name Gaming rig
Processor Core i7 6700K @4.5Ghz turbo / 1.4v delidded
Motherboard Asus Maximus VIII Hero
Cooling Noctua NH-D15S /w dual fans
Memory Corsair Vengeance LPX DDR4 3000 2x8GB
Video Card(s) MSI Gaming X RX 480 8GB
Storage Samsung 850 EVO 500GB, WD Black 4TB, WD Green 1,5TB
Display(s) Dell P2414H
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Software Windows 10 Pro
#17
They might revamp GCN but I can assure you they wont change the core design. Not now when the whole industry is slowing switching to similar architectures.




Tile based rendering does not improve performance , it can only reduce bandwidth usage and as a result also power consumption. And it's not even a new thing like Nvidia said it is , mobile GPUs have used this technology for years.
Yes it is not new, and it may not directly improve performance, but if the lower power consumption allows you to increase clocks then you get an actual performance increase (even if clock for clock you wouldn't otherwise).
 
Joined
Mar 18, 2008
Messages
3,138 (0.84/day)
Likes
2,175
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) Sapphire R9 Fury X
Storage Samsung 960 Pro 1TB, Crucial MX200 500GB
Display(s) Acer K272HUL, HTC Vive
Case Fractal Design R5
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
Software Windows 10 Professional/Linux Mint
Joined
Jan 8, 2017
Messages
2,341 (4.70/day)
Likes
1,529
System Name Good enough
Processor AMD FX-6300 - 4.5 Ghz
Motherboard ASRock 970M Pro3
Cooling Scythe Katana 4 - 3x 120mm case fans
Memory 16GB - 4x4GB A-DATA 1866 Mhz (OC)
Video Card(s) ASUS GTX 1060 Turbo 6GB ~ 2139 Mhz / 9.4 Gbps
Storage 1x Samsung 850 EVO 250GB , 1x 1 Tb Seagate something or other
Display(s) 1080p TV
Case Zalman R1
Power Supply 500W

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
21,400 (5.38/day)
Likes
6,175
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Joined
Aug 20, 2007
Messages
8,673 (2.21/day)
Likes
7,764
System Name Pioneer
Processor Intel i7 8700k @ 5.0 GHz All-Core + Uncore & AVX Offset @ 0
Motherboard ASRock Z370 Taichi
Cooling Noctua NH-U14S + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ DDR4-3333 14-14-14-34-2T
Video Card(s) NVIDIA Titan XP Star Wars Collectors Edition (Galactic Empire)
Storage HGST UltraStar 7K6000 3.5" HDD 2TB 7200 RPM (w/128MBs of Cache)
Display(s) BenQ BL3200PT (a 1440p VA Panel with decent latency)
Case Thermaltake Core X31
Audio Device(s) Onboard Toslink to Schiit Modi Multibit to Asgard 2 Amp to AKG K7XX Ruby Red Massdrop Headphones
Power Supply Seasonic PRIME 750W 80Plus Titanium
Mouse ROCCAT Kone EMP
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Keycaps, Blue legends
Software Windows 10 Enterprise 2016 LTSB (From former workplace, yay no telemetry)
Benchmark Scores FSExt/TS: FSExt 14625:https://www.3dmark.com/fs/15253894 TS 10496:https://www.3dmark.com/spy/3557134
Joined
Mar 18, 2008
Messages
3,138 (0.84/day)
Likes
2,175
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) Sapphire R9 Fury X
Storage Samsung 960 Pro 1TB, Crucial MX200 500GB
Display(s) Acer K272HUL, HTC Vive
Case Fractal Design R5
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
Software Windows 10 Professional/Linux Mint
Joined
Jul 15, 2006
Messages
312 (0.07/day)
Likes
231
Location
Malaysia
Processor Intel i5 4670K 4.2GHz
Motherboard AsRock Z87 Extreme4
Cooling CoolerMaster Seidon 240M + GentleTyphoon D1225C
Memory 2x8GB Kingston DDR3 1600
Video Card(s) EVGA GeForce GTX 980 Ti 6GB SC+
Storage 120GB Samsung Evo 840 SSD+ 2TB Seagate Barracuda + 2TB Seagate Surveillance
Display(s) 24" LG 24MP59G
Case NZXT Phantom 240
Audio Device(s) Creative X-Fi Titanium HD modded
Power Supply Corsair CX750M
Mouse Logitech G400
Keyboard CM Storm QucikFire Pro
Software Windows 10 Pro x64
#23
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
21,400 (5.38/day)
Likes
6,175
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
#24
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design
Yup its more focused on GPGPU/ Computing taskings than 3D graphics
 
Joined
Mar 10, 2010
Messages
5,319 (1.78/day)
Likes
1,796
Location
Manchester uk
System Name Quad GT evo V
Processor FX8350 @ 4.5ghz1.525c NB2.44ghz Ht2.64ghz
Motherboard Gigabyte 990X Gaming
Cooling 360EK extreme 360Tt rad all push/pull, cpu,NB/Vrm blocks all EK
Memory Corsair vengeance 16Gb @1600 cas8
Video Card(s) Rx vega 64 waterblockedEK + Asus Dual OC gtx1060 6Gb
Storage samsung 840(250), WD 1Tb+2Tb +3Tbgrn 1tb hybrid
Display(s) Samsung uea28"850R 4k freesync, LG 49" 4K 60hz
Case Custom(modded) thermaltake Kandalf
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup
Power Supply corsair 1200Hxi
Mouse CM optane
Keyboard CM optane
Software Win 10 Pro
Benchmark Scores 15.69K best overall sandra so far 6600 3dmark
#25
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design
ALL architectures are a compromise, you haven't solved the yeild issues of aprox 30% more chip per chip.
And you couldn't, no one can at this point that's the singular reason all three big chip pc companies are ALL talking modular mcm chips from here on in, because it increases die area efficiency to do separate bits separately and cheaper if possible.

And as is obvious big chips cost big money on small nodes.