• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Possibly a better Vega: My take on what AMD should do

Joined
Jul 15, 2006
Messages
972 (0.15/day)
Location
Malaysia
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B450M-S2H
Cooling Scythe Kotetsu Mark II
Memory 2 x 16GB SK Hynix OEM DDR4-3200 @ 3666 18-20-18-36
Video Card(s) Colorful RTX 2060 SUPER 8GB
Storage 250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple
Display(s) AOpen 27HC5R 27" 1080p 165Hz
Case COUGAR MX440 Mesh RGB
Audio Device(s) Creative X-Fi Titanium HD + Kurtzweil KS-40A bookshelf
Power Supply Corsair CX750M
Mouse Razer Deathadder Essential
Keyboard Cougar Attack2 Cherry MX Black
Software Windows 10 Pro 22H1 x64
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

both.jpg


As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

both2.jpg


Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

3072sp.jpg


It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
 
Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
 
Joined
Aug 8, 2015
Messages
112 (0.04/day)
Location
Finland
System Name Gaming rig
Processor AMD Ryzen 7 5900X
Motherboard Asus X570-Plus TUF /w "passive" chipset mod
Cooling Noctua NH-D15S
Memory Crucial Ballistix Sport LT 2x16GB 3200C16 @3600C16
Video Card(s) MSI RTX 3060 TI Gaming X Trio
Storage Samsung 970 Pro 1TB, Crucial MX500 2TB, Samsung 860 QVO 4TB
Display(s) Samsung C32HG7x
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Mouse Logitech G502 Hero
Keyboard Logitech G710+
Software Windows 10 Pro
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.

Let's hope the 12nm refresh for Vega (and Zen) will give good results.
 
Joined
Dec 6, 2005
Messages
10,881 (1.63/day)
Location
Manchester, NH
System Name Senile
Processor I7-4790K@4.8 GHz 24/7
Motherboard MSI Z97-G45 Gaming
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x10Tb WD Blue
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440) *FREE_SYNC*
Case Rosewill
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10
Let's hope the 12nm refresh for Vega (and Zen) will give good results.

I read somewhere Vega "2" would be 7nm in 2018. More speculation no doubt.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
40,435 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
This is just some info on what GCN truly is from the start to now.

https://en.m.wikipedia.org/wiki/Graphics_Core_Next
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

View attachment 94953

As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

View attachment 94954

Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

View attachment 94956

It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
 
Joined
Jul 19, 2006
Messages
43,585 (6.74/day)
Processor AMD Ryzen 7 7800X3D
Motherboard ASUS TUF x670e
Cooling EK AIO 360. Phantek T30 fans.
Memory 32GB G.Skill 6000Mhz
Video Card(s) Asus RTX 4090
Storage WD m.2
Display(s) LG C2 Evo OLED 42"
Case Lian Li PC 011 Dynamic Evo
Audio Device(s) Topping E70 DAC, SMSL SP200 Headphone Amp.
Power Supply FSP Hydro Ti PRO 1000W
Mouse Razer Basilisk V3 Pro
Keyboard Tester84
Software Windows 11
At this point, hopefully it's an entirely new design that ditches HBM. They tried it, and it has obviously failed.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
40,435 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
o_O
At this point, hopefully it's an entirely new design that ditches HBM. They tried it, and it has obviously failed.

Gcn being a compute heavy arch, they would need a hybrid that does great at both without the drawbacks of current gcn.
 
Joined
Dec 6, 2005
Messages
10,881 (1.63/day)
Location
Manchester, NH
System Name Senile
Processor I7-4790K@4.8 GHz 24/7
Motherboard MSI Z97-G45 Gaming
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x10Tb WD Blue
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440) *FREE_SYNC*
Case Rosewill
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.
 
Joined
Aug 8, 2015
Messages
112 (0.04/day)
Location
Finland
System Name Gaming rig
Processor AMD Ryzen 7 5900X
Motherboard Asus X570-Plus TUF /w "passive" chipset mod
Cooling Noctua NH-D15S
Memory Crucial Ballistix Sport LT 2x16GB 3200C16 @3600C16
Video Card(s) MSI RTX 3060 TI Gaming X Trio
Storage Samsung 970 Pro 1TB, Crucial MX500 2TB, Samsung 860 QVO 4TB
Display(s) Samsung C32HG7x
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Mouse Logitech G502 Hero
Keyboard Logitech G710+
Software Windows 10 Pro
I read somewhere Vega "2" would be 7nm in 2018. More speculation no doubt.

I doubt that, but 12nm versions should come out early 2018 (https://wccftech.com/amd-announces-2nd-gen-ryzen-vega-launching-12nm-2018/). Maybe we will see a new line with the lower end cards being replaced with smaller vega dies with gddr5(x)/gddr6 memory (speculation on my part)?

HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.

Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.
 
Joined
Dec 6, 2005
Messages
10,881 (1.63/day)
Location
Manchester, NH
System Name Senile
Processor I7-4790K@4.8 GHz 24/7
Motherboard MSI Z97-G45 Gaming
Cooling Be Quiet Pure Rock Air
Memory 16GB 4x4 G.Skill CAS9 2133 Sniper
Video Card(s) GIGABYTE Vega 64
Storage Samsung EVO 500GB / 8 Different WDs / QNAP TS-253 8GB NAS with 2x10Tb WD Blue
Display(s) 34" LG 34CB88-P 21:9 Curved UltraWide QHD (3440*1440) *FREE_SYNC*
Case Rosewill
Audio Device(s) Onboard + HD HDMI
Power Supply Corsair HX750
Mouse Logitech G5
Keyboard Corsair Strafe RGB & G610 Orion Red
Software Win 10
I doubt that, but 12nm versions should come out early 2018 (https://wccftech.com/amd-announces-2nd-gen-ryzen-vega-launching-12nm-2018/). Maybe we will see a new line with the lower end cards being replaced with smaller vega dies with gddr5(x)/gddr6 memory (speculation on my part)?



Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.

Funny, wccftech in another article from today: https://wccftech.com/amd-navi-gpu-spotted-in-linux-drivers/amp/

...and yea, it's all speculation

1513625209312.png
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
40,435 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
HBM is not the problem, GCN is. GCN is too old. It had its glorious days but time to move on. RTG need another from the ground up design ASAP. They can’t just keep on patching GCN and hoping it would work.

Bring back VLIW as a new version
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.

If it is not working by now then it probably will never work. The more this drags on the more it seems real that current Vega is indeed a beta product rushed out by RTG under Raja's command.

Or it is probably functional just fine. It is simply not as effeiecent as the Nvidia's tile rendering.

Either way only Vega refresh OR polaris+ will give us the answer.

A polaris+ with 30CU and updated tile rendering as well as beefed up geometry may surprise us in performance. Not to mention it will always be easier and cheaper to design a small die GPU instead of a monster sized one.

So my prediction is RTG now under Lisa Su will probably get a Polaris 30CU out first to compete with in the lower-mid tier in 2018.
 
Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
They might revamp GCN but I can assure you they wont change the core design. Not now when the whole industry is slowing switching to similar architectures.


Apparently tile based rendering is not enabled or not working on current Vega implementation. How much this would improve perfomance is anyones guess, and/or can they get it working for the 12nm shrink.

Tile based rendering does not improve performance , it can only reduce bandwidth usage and as a result also power consumption. And it's not even a new thing like Nvidia said it is , mobile GPUs have used this technology for years.
 
Joined
Jul 19, 2006
Messages
43,585 (6.74/day)
Processor AMD Ryzen 7 7800X3D
Motherboard ASUS TUF x670e
Cooling EK AIO 360. Phantek T30 fans.
Memory 32GB G.Skill 6000Mhz
Video Card(s) Asus RTX 4090
Storage WD m.2
Display(s) LG C2 Evo OLED 42"
Case Lian Li PC 011 Dynamic Evo
Audio Device(s) Topping E70 DAC, SMSL SP200 Headphone Amp.
Power Supply FSP Hydro Ti PRO 1000W
Mouse Razer Basilisk V3 Pro
Keyboard Tester84
Software Windows 11
HBM is not the problem

I wonder why AMD has such a difficult time running their HBM at it's rated voltage if it's not a problem? I guess that goes to the entire power design of the cards. It's not good. Not compared to its only competition anyways. GCN/Vega needs to die and AMD needs to come up with something competetive. Or, better yet AMD should sell to someone more capable and perhaps stick to CPU's at this point.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.


If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)

View attachment 94953

As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.

So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:

View attachment 94954

Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.

So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it :p)

View attachment 94956

It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.

From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.

I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.

P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
Your forgetting fabrication, bigger chip = less profit, im sure as shit Amd are on the right path , 7nm Navi will likely pave the way, or be the first mcm gpu ,think one to four vegas with 2-16 Gb of memory each on one asic , and Amd are way ahead of the competition with regards to the tech required so Navi onwards should be interesting.

Opinion not fact ty

In reply to others Vega 20 should be Amds Pitcairn style 1280 shader budget chip not vega 64*2.

And it is normal for new features to take time to implement , R9 290 owners just got hybrid sync , for example and tessellation was called something totally different when amd first hid it in silicon, i still eagerly await driver improvement for vega but who knows tbf.
 
Last edited:
Joined
Aug 8, 2015
Messages
112 (0.04/day)
Location
Finland
System Name Gaming rig
Processor AMD Ryzen 7 5900X
Motherboard Asus X570-Plus TUF /w "passive" chipset mod
Cooling Noctua NH-D15S
Memory Crucial Ballistix Sport LT 2x16GB 3200C16 @3600C16
Video Card(s) MSI RTX 3060 TI Gaming X Trio
Storage Samsung 970 Pro 1TB, Crucial MX500 2TB, Samsung 860 QVO 4TB
Display(s) Samsung C32HG7x
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Mouse Logitech G502 Hero
Keyboard Logitech G710+
Software Windows 10 Pro
They might revamp GCN but I can assure you they wont change the core design. Not now when the whole industry is slowing switching to similar architectures.




Tile based rendering does not improve performance , it can only reduce bandwidth usage and as a result also power consumption. And it's not even a new thing like Nvidia said it is , mobile GPUs have used this technology for years.

Yes it is not new, and it may not directly improve performance, but if the lower power consumption allows you to increase clocks then you get an actual performance increase (even if clock for clock you wouldn't otherwise).
 
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Joined
Jan 8, 2017
Messages
8,860 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Don't turn this into a FUD thread.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
40,435 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64

We understand your disdain because of fury but do you have to do this in every thread? This starts sounding like beating a dead horse, no fun...

Go to the AMD forums and say it. To me it starts to sound like a spam bot took over your profile.
 
Joined
Aug 20, 2007
Messages
20,709 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches
Software Windows 11 Enterprise (legit), Gentoo Linux x64
Joined
Mar 18, 2008
Messages
5,717 (0.98/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
Joined
Jul 15, 2006
Messages
972 (0.15/day)
Location
Malaysia
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B450M-S2H
Cooling Scythe Kotetsu Mark II
Memory 2 x 16GB SK Hynix OEM DDR4-3200 @ 3666 18-20-18-36
Video Card(s) Colorful RTX 2060 SUPER 8GB
Storage 250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple
Display(s) AOpen 27HC5R 27" 1080p 165Hz
Case COUGAR MX440 Mesh RGB
Audio Device(s) Creative X-Fi Titanium HD + Kurtzweil KS-40A bookshelf
Power Supply Corsair CX750M
Mouse Razer Deathadder Essential
Keyboard Cougar Attack2 Cherry MX Black
Software Windows 10 Pro 22H1 x64
GCN has been and it's still designed for high instruction/thread parallelism instead of data level parallelism making it more fit for compute rather than traditional graphics processing . Nvidia does more of the opposite , hence the gap between Vega and GP102 despite both of them having theoretically the same raw performance.

Configuring the arrangement of the ALUs in a different way will do pretty much nothing at the end of the day for gaming performance because the compilers are intelligent enough to optimize for whatever arrangement there is equally. What they can't do as good is dealing with a thread oriented architecture like GCN when doing graphics.

What it's most interesting is that at this point pretty much everyone is jumping on such a design , even ARM.

The only thing AMD really needs is a good node that is a better fit for these huge , high frequency chips. 14nm LLP is absolute trash for something like Vega , TSMC's 16nm is way better for this task.
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
40,435 (6.61/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design

Yup its more focused on GPGPU/ Computing taskings than 3D graphics
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
You missed the point where more Shader Engine increases geometry processor. As you can see yes current design can fully utilize all the CU's but that only good for compute thats why they so good in mining. They dont solve the geometry performance of the design and this hurts in gaming performance. Its an unbalanced design
ALL architectures are a compromise, you haven't solved the yeild issues of aprox 30% more chip per chip.
And you couldn't, no one can at this point that's the singular reason all three big chip pc companies are ALL talking modular mcm chips from here on in, because it increases die area efficiency to do separate bits separately and cheaper if possible.

And as is obvious big chips cost big money on small nodes.
 
Top