• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD's next-gen RDNA 2 rumor: 40-50% faster than GeForce RTX 2080 Ti

Joined
Mar 10, 2010
Messages
7,960 (2.09/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Samsung Nvme Pg981, silicon power 1Tb samsung 840 basic as a primocache drive for, WD2Tbgrn +3Tbgrn,
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Iksu force fx
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
RDNA 2 is 50% faster then the 2080TI, just like the 3080TI.
But the problem is the drivers....

I still buy the GTX 3070.
Strange input ,so mythical cards better than old cards ,no proof yet, driver issues on unreleased card means your buying a different unicorn.

Great input, come again.

In fact I'm not sure you should bother.

Your minds made up of unicorn poop.
 
Joined
Apr 14, 2019
Messages
127 (0.26/day)
System Name Violet
Processor AMD Ryzen 3600 4.4Ghz 1.41v
Motherboard ASRock x570 Phantom Gaming X
Cooling Be quiet! Dark Rock Pro 4
Memory G.Skill Flare x 16GB 3400Mhz
Video Card(s) MSI RTX 2070 Super Gaming X Trio 8GB
Storage Western Digital WD Black SN750 1TB
Display(s) 3440x1440
Case NZXT H440 New Edition Window
Power Supply Corsair RM850x
Mouse Razer NAGA TRINITY
Keyboard Corsair K95 RGB
Software Windows 10 64bit
Strange input ,so mythical cards better than old cards ,no proof yet, driver issues on unreleased card means your buying a different unicorn.

Great input, come again.

In fact I'm not sure you should bother.

Your minds made up of unicorn poop.
Strange guy :kookoo:.
 
Joined
Mar 10, 2010
Messages
7,960 (2.09/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Samsung Nvme Pg981, silicon power 1Tb samsung 840 basic as a primocache drive for, WD2Tbgrn +3Tbgrn,
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Iksu force fx
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Strange guy :kookoo:.
You could have just said I'm buying Nvidia regardless.

That represents what you said better and encompasses all the facts and known data points in your post better.

But is still pointless to this thread.
 
Joined
Sep 17, 2014
Messages
12,799 (5.93/day)
Location
Mars
Processor i7 8700k 4.7Ghz @ 1.26v
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) MSI GTX 1080 Gaming X @ 2100/5500
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Eizo Foris FG2421
Case Fractal Design Define C TG
Power Supply EVGA G2 750w
Mouse Logitech G502 Protheus Spectrum
Keyboard Sharkoon MK80 (Brown)
Software W10 x64
Strange guy :kookoo:.
Nah... he's seeing things pretty well if you ask me. If you're buying rumors that card XYZ is whatever percentage faster than whichever other card at this point in time, you simply didn't get it and you probably won't this gen either. Live and learn, come back for the next round and maybe you'll be wiser.
 
Joined
Jan 8, 2017
Messages
5,373 (4.10/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
For a while, with each generation ... AMD lost a tier. With all cards OC'd, nVidia had the top 2 spots with 7xx / 2xx ... then they took a 3rd with the 950 ... then a 4th with the 1060.
You are delirious.

Its strange, isn't it? That although GPUs are super-parallel machines, that its far harder to make a chiplet-GPU design than a chiplet-CPU design.
GPUs are far easier to implement using chiplets, threads only need to communicate among other threads within the same CU, this greatly simplifies everything compared to a CPU where you need to ensure performance when communicating across chiplets.

but Nvidia can charge 700usd for them just because there are no competition
It has been proven time and time again that this is not how it works. AMD could come out with a 2000$ GPU faster than anything else out there and Nvidia would still sell something like a 2080 for 700$, the prices are set by what consumers are willing to pay not by the competition. How is it that Apple charged and still does so much for their phones all these years ? Can you say they had no competition ?

Here is another fallacy for you : if prices would go down every time there was "competition" then that would mean profits would slowly tend towards zero, because eventually there will always be competition and prices would have to drop no matter what according to this wonderful logic. Obviously that doesn't happen since these companies grow by each passing year, even the ones that aren't "competitive", prices go down when consumers no longer buy the same volume.

Nvidia's volume kept increasing and so did their prices.
AMD's volume either stagnated or went down, their products either remained the same or became cheaper.

Coincidence ? Nvidia's prices will only go down when they hit a plateau in terms of volume sold, which they will inevitably at some point. There is a limited amount of potential consumers and their cash.
 
Last edited:
Joined
Apr 14, 2019
Messages
127 (0.26/day)
System Name Violet
Processor AMD Ryzen 3600 4.4Ghz 1.41v
Motherboard ASRock x570 Phantom Gaming X
Cooling Be quiet! Dark Rock Pro 4
Memory G.Skill Flare x 16GB 3400Mhz
Video Card(s) MSI RTX 2070 Super Gaming X Trio 8GB
Storage Western Digital WD Black SN750 1TB
Display(s) 3440x1440
Case NZXT H440 New Edition Window
Power Supply Corsair RM850x
Mouse Razer NAGA TRINITY
Keyboard Corsair K95 RGB
Software Windows 10 64bit
But is still pointless to this thread
Uhmm look at the title and the history from AMD and Nvidia ....:wtf:
if you had a little more experience in the hardware, you would have known why I react the way I react.
So it is not pointless.
 
Joined
Nov 11, 2016
Messages
262 (0.19/day)
System Name The de-ploughminator
Processor I7 8700K @ 5.1Ghz
Motherboard Gigabyte Z370 Gaming 5
Cooling Custom Watercooling
Memory 4x8GB G.Skill Trident Neo 3600mhz 15-15-15-30
Video Card(s) RTX 2080 Ti + Heatkiller IV wb
Storage Plextor 512GB nvme SSD
Display(s) LG 34GN850-B
Case Lian Li 011D Dynamic
Audio Device(s) Creative AE-5
Power Supply Corsair RM1000
It has been proven time and time again that this is not how it works. AMD could come out with a 2000$ GPU faster than anything else out there and Nvidia would still sell something like a 2080 for 700$, the prices are set by what consumers are willing to pay not by the competition. How is it that Apple charged and still does so much for their phones all these years ? Can you say they had no competition ?

Here is another fallacy for you : if prices would go down every time there was "competition" then that would mean profits would slowly tend towards zero, because eventually there will always be competition and prices would have to drop no matter what according to this wonderful logic. Obviously that doesn't happen since these companies grow by each passing year, even the ones that aren't "competitive", prices go down when consumers no longer buy the same volume.

Nvidia's volume kept increasing and so did their prices.
AMD's volume either stagnated or went down, their products either remained the same or became cheaper.

Coincidence ? Nvidia's prices will only go down when they hit a plateau in terms of volume sold, which they will inevitably at some point. There is a limited amount of potential consumers and their cash.
Aren't you mixing MRSP and price gouging ? retailer price gouging all the time with products on high demand. Nvidia, AMD and even AIBs don't get to pocket those money.

Well there have been rumor that Nvidia and AMD have been mingling in some price fix scheme to maintain a healthy profit margins. It's also vital for Nvidia to keep their opponent alive otherwise they would be subjected to antitrust law. That is the sad reality of a Duopoly, had Intel been successful with their GPU you will see much better prices/performance.

Currently there are no competitor to 2080 Super and 2080 Ti in term of performance, Nvidia can set whatever MSRP they want with those. But of course they have to know what price customer are willing to pay for that kind of GPU performance, this is done through market research. The 2080 Ti has been selling at over 1200usd for almost 2 years now, do you think because they are selling in big volume ?
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
Joined
Apr 24, 2020
Messages
247 (2.23/day)
GPUs are far easier to implement using chiplets, threads only need to communicate among other threads within the same CU, this greatly simplifies everything compared to a CPU where you need to ensure performance when communicating across chiplets.
I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

1596737868955.png


In contrast, there hasn't been GPU or large-scale SIMD system, to my knowledge, that ever was made using a chiplet design.

Ergo: GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now. Its not like we're missing on demand either. The fastest multiGPU / SIMD system available so far today is NVidia's DGX-2, but even that doesn't use chiplets yet.

----------

Again: this is because CPU to CPU communications is lower-bandwidth than what GPU-to-GPU communications are. The DGX-2 NVLink / NVSwitch system is 300GBps bandwidth chip-to-chip. In contrast, AMD Zen CPUs are only around ~50GBps for IFOPs (Infinity Fabric On Package) links.

GPUs need more bandwidth in their core-to-core communications than CPUs do. I expect this higher GPU-bandwidth requirement makes making GPU chiplets harder in practice.
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
How so ? AMD and others have always claimed that they haven't found a way for GPU chiplets to be implemented because of CrossFire type of issues?
 
Joined
Mar 10, 2010
Messages
7,960 (2.09/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Samsung Nvme Pg981, silicon power 1Tb samsung 840 basic as a primocache drive for, WD2Tbgrn +3Tbgrn,
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Iksu force fx
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

View attachment 164801

In contrast, there hasn't been GPU or large-scale SIMD system, to my knowledge, that ever was made using a chiplet design.

Ergo: GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now. Its not like we're missing on demand either. The fastest multiGPU / SIMD system available so far today is NVidia's DGX-2, but even that doesn't use chiplets yet.
Xe, Hopper?!

Intel uses tiles ie chiplets

Nvidia's Hopper features similar but called something else no doubt, like game cores !?

Some rumours of Rdna3 point to chiplets.
 
Joined
Apr 24, 2020
Messages
247 (2.23/day)
Xe, Hopper?!
I do think that Intel is onto something with its EMIB technology. If Intel is the first one to figure something out for GPU-chiplets, I wouldn't be surprised.

Hopper is pretty secretive. I haven't found much information on it.

------

I'm certain that GPUs will eventually be chiplets. The issues at 7nm and 5nm have made the chiplet methodology the clear path forward. But I don't believe that it will be an easy journey. There will be architectural changes and new issues brought up.
 

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
Xe, Hopper?!

Intel uses tiles ie chiplets

Nvidia's Hopper features similar but called something else no doubt, like game cores !?

Some rumours of Rdna3 point to chiplets.
The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
 
Joined
Mar 23, 2005
Messages
3,747 (0.67/day)
Location
Ancient Greece, Acropolis (Time Lord)
System Name RiseZEN Gaming PC
Processor AMD Ryzen 7 1700X @ stock - (Plus ZEN3 7nm+ Prototype)
Motherboard ASRock Fatal1ty X370 GAMING X AM4
Cooling Corsair H115i PRO RGB, 280mm Radiator, Dual 140mm ML Series PWM Fans
Memory G.Skill TridentZ 32GB (2 x 16GB) DDR4 3200
Video Card(s) Sapphire Radeon RX 580 8GB Nitro+ SE + (RDNA2 7nm+ Prototype)
Storage Corsair Force MP500 480GB M.2 (OS) + Force MP510 480GB M.2 (Steam/Games)
Display(s) Asus 27" (MG278Q) 144Hz WQHD 1440p + 1 x Asus 24" (VG245H) FHD 75Hz 1080p
Case Corsair Obsidian Series 450D Gaming Case
Audio Device(s) SteelSeries 5Hv2 w/ ASUS Xonar DGX PCI-E GX2.5 Audio Engine Sound Card
Power Supply Corsair TX750W Power Supply
Mouse Razer DeathAdder PC Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G15 Classic Gaming Keyboard
Software Windows 10 Pro - 64-Bit Edition
Benchmark Scores WHO? I'm the Doctor. The Definition of Gaming is PC Gaming...
The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
The Chiplets are good for Ryzen, ZEN2 proved this as being a massive success.
But ZEN3 is going to be a new design. Probably not fully Chiplets based.

If AMD can utilize the Chiplets with RDNA3 and has high performance and low power draw, it's a good thing.
 

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
The Chiplets are good for Ryzen, ZEN2 proved this as being a massive success.
But ZEN3 is going to be a new design. Probably not fully Chiplets based.

If AMD can utilize the Chiplets with RDNA3 and has high performance and low power draw, it's a good thing.
Zen 2 proved it's better than the mediocre Intel counterparts.
But if you are wiser, just get your mighty Renoir laptop with a 15-watt APU that is as fast as a 65-watt desktop counter-part.
 
Joined
Mar 10, 2010
Messages
7,960 (2.09/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Samsung Nvme Pg981, silicon power 1Tb samsung 840 basic as a primocache drive for, WD2Tbgrn +3Tbgrn,
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Iksu force fx
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
Economies of scale weigh against that argument, once you scale up chiplets on the smallest node.

Monolithic may be better but it's scaling is limited, so is output possibly and it doesn't help with heat management, it will remain the choice used for most device's though due to costs.

N Ryzen works fine.
 

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
Economies of scale weigh against that argument, once you scale up chiplets on the smallest node.

Monolithic may be better but it's scaling is limited, so is output possibly and it doesn't help with heat management, it will remain the choice used for most device's though due to costs.

N Ryzen works fine.
What do you mean by economies of scale?
Ryzen with chiplets may be or may not be cheaper for production than the monolithic Renoir.
Renoir is 156 mm^2. Ryzen 3000 is 80 + 120 mm^2.

You can offer the 15-watt Ryzen 7 4800U as a full replacement to the 65-watt Ryzen 5 3600, and then use the chiplets for everything above.
 
Joined
Mar 10, 2010
Messages
7,960 (2.09/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Samsung Nvme Pg981, silicon power 1Tb samsung 840 basic as a primocache drive for, WD2Tbgrn +3Tbgrn,
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Iksu force fx
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
What do you mean by economies of scale?
Ryzen with chiplets may be or may not be cheaper for production than the monolithic Renoir.
Renoir is 156 mm^2. Ryzen 3000 is 80 + 120 mm^2.

You can offer the 15-watt Ryzen 7 4800U as a full replacement to the 65-watt Ryzen 5 3600, and then use the chiplets for everything above.
We're talking about graphics cards!?

To attain 8K 120Hz frame rates of a raytraced version of battlefield 7 we are not going to find that easy to run, yet some are working towards this, tile based rendering helps by the fact the tiles split the workload.

And as epyc is proving ,for high throughput computation , chiplets can work well.

On chip Mgpu that's invisible to the user.
 

ARF

Joined
Jan 28, 2020
Messages
1,228 (6.23/day)
System Name ARF System 1 (retro build) | Portable 1 (energy efficient and portable)
Processor AMD Athlon 64 4400+ X2 | AMD Ryzen 5 2500U
Motherboard ASRock 939A790GMH 790GX SATA2 |
Cooling Arctic Freezer 13 | Dual-fan, dual heat-pipe Acer inbuilt
Memory 4 x 1GB DDR-400 | 2 x 8GB DDR4-2400
Video Card(s) Radeon ASUS EAH4670/DI/512MD3 | Radeon RX 560X 4G & Vega 8
Storage ADATA XPG SX900 128GB SATA3@SATA2 SSD | Western Digital Blue 3D NAND M.2 SSD 500GB
Display(s) | LG 24UD58-B & Panasonic TX-50CX670E
Case Cooler Master HAF 912 Plus | 15-inch notebook chassis
Audio Device(s) Superlux HD681 EVO
Mouse Genius NetScroll 100X | Genius NetScroll 100X
Keyboard | Logitech Wave
Software Windows 7U SP1| Windows 10 Pro 2004
Benchmark Scores CPU-Z 17.01.64 - ST: 392.4, MT: 2075.1 CPU-Z 15.01.64 - ST: 2055, MT: 8120
We're talking about graphics cards!?

To attain 8K 120Hz frame rates of a raytraced version of battlefield 7 we are not going to find that easy to run, yet some are working towards this, tile based rendering helps by the fact the tiles split the workload.

And as epyc is proving ,for high throughput computation , chiplets can work well.

On chip Mgpu that's invisible to the user.
Let's first focus on 4K. Because it hasn't been made popular enough just yet.
 
Joined
Jan 8, 2017
Messages
5,373 (4.10/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
I expect this higher GPU-bandwidth requirement makes making GPU chiplets harder in practice.
If you begin with 1 chiplet and n memory chips you can then move onto m chiplets and n * m memory modules. It can all scales linearly need be, this isn't a problem, especially now that we have HBM. In fact you can take a look at existing GPUs and you'll see than usually bandwidth does not need to increase linearly with number compute units, it's sublinear. I wont go into why that's the case but the point is memory bandwidth is not the reason why MCM GPU haven't been made.

I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now.
A SIMD chiplet doesn't even make sense, SIMD needs centralized instruction dispatch and logic, modern GPUs aren't SIMD , meaning there isn't a 64*32bit wide vector register physically on the chip. It's all scalar and it's compartmentalized in CUs which is why you can easily spread CUs in multiple chiplets. With a GPU you are guaranteed that the CU in one chiplet does not need to communicate with a CU in another chiplet. In other words they wouldn't suffer from the same issues CPUs do where a core would need to read/write to a cache line in another core, or even worse, if it encounters the issue of false sharing.

CPUs are undoubtedly harder to implement using chiplets than GPUs but here's why they became a thing before GPUs : There was a need for it that couldn't be resolved in any other way. With GPUs because of the way software is written for them you can just scale the problem up by using multiple GPUs that don't necessarily need to communicate with each other (because the way GPGPU algorithms have to be implemented makes it a requirement from the start that you can't communicate above a certain level, which is the CU).

Therefore you can stuff a lot of GPUs on a single motherboard. On the other hand CPUs are intended for different tasks that can't be scaled up in the same way, socket to socket communication is basically a death sentence for achieving high performance so the only solution is to stuff as many CPUs on a single socket , those would be the "chiplets".

this is because CPU to CPU communications is lower-bandwidth than what GPU-to-GPU communications are.
GPUs don't have to communicate with each other the same way CPUs do, as I explained above. That's why AMD and Nvidia largely gave up putting multiple GPUs on a single board where theoretically you could have achieved higher GPU to GPU bandwidth, because it's a waste of time.
 
Joined
Apr 24, 2020
Messages
247 (2.23/day)
If you begin with 1 chiplet and n memory chips you can then move onto m chiplets and n * m memory modules.
While CUDA seems to have the programming API for this, I don't believe this is common in DirectX (11 or 12), OpenGL, or Vulcan code. Even then, I don't think that people typically use CUDA's memory management interface like this, because its only relevant on the extremely niche DGX-class of computers.

In contrast, CPU shared memory is almost completely transparent to the programmer. The OS could easily migrate your process to other chips (affinity settings notwithstanding). In fact, affinity settings were invented to prevent the OS from moving your process around.

I say "almost" completely transparent, because NUMA does exist if you really want to go there. But CPU programmers have gotten surprisingly far without ever worrying about NUMA details (unless you're the cream-of-the-crop optimizer. Its a very niche issue where most programmers simply trust the OS to do the right thing).

The software ecosystem that would support a multi-chipset architecture, with each chipset having an independent memory space, simply does not exist. Therein lies the problem: we either have to make a NUMA-like API where each chiplet has a NUMA-like memory space that the programmer has to manage. OR, we build a crossbar, similar to AMD's Infinity Fabric (IFOP) which transparently copies data between chips... providing the programmer an illusion that all of the memory is in the same space.

50GBps is sufficient for AMD Infinity Fabric. For the same thing to happen on GPUs, NVidia has demonstrated that 300GBps is needed in their DGX-2 computers.

This isn't an easy problem, by any stretch of the imagination. I do imagine that it will be solved eventually, but I'm grossly interested in seeing how its done. I'm betting that NVidia will shrink their NVLink and NVSwitch system down and make it cheaper somehow.

A SIMD chiplet doesn't even make sense, SIMD needs centralized instruction dispatch and logic, modern GPUs aren't SIMD , meaning there isn't a 64*32bit wide vector register physically on the chip.
We've discussed this before Vya. Your understanding of GPU architecture is off.


1596761260080.png


AMD Vega (and all GCN processors) had the above memory diagram. The 256 VGPR registers were arranged in a 64 x 32-bit array called "SIMD 0". Vega's compute units are pretty complicated and there is also SIMD1, SIMD2, SIMD3 with independent instruction pointers.

The entire class of VGPRs operate in a SIMD fashion, as demonstrated by chapter 6.

1596761352011.png


So when you do an "v_add_F32 0, 1", this means all 64-values in VGPR#1 are added to all 64-values in VGPR#0, and then the result is stored into VGPR#0. Its a 64-wide SIMD operation. All "v" operations on Vega are 64-wide. RDNA changed this to a 32-wide operation instead. But the concept is similar.

I'm less familiar with NVidia's architecture, but I assume a similar effect happens with their PTX instructions.

With a GPU you are guaranteed that the CU in one chiplet does not need to communicate with a CU in another chiplet.
At a minimum, video games share textures. If Chiplet#1 has the texture for Gordon Freeman's face, but Chiplet#2 doesn't have it, how do you expect Chiplet#2 to render Gordon Freeman's face?

GPUs, as currently architected, have a unified memory space where all information is shared. Crossfire halved the effective memory, because to solve the above issue, they simply copied the texture to both GPUs. (IE: two 4GB GPUs will have a total of 4GBs of VRAM, because every piece of data will be replicated between the two systems). It was a dumb and crappy solution, but it worked for the purposes of Crossfire.

This is why inter-chip communications might happen. If you want Gordon Freeman's face to be rendered on Chiplet#1 and Chiplet#2 in parallel, you need a way to share that face-texture data between the two chips. This is the approach of NVidia's NVSwitch in the DGX-2 computer.

Alternatively, you could tell the programmer that Chiplet#2 cannot render Gordon Freeman's face because the data is unavailable. This would be a NUMA-like solution (the data exists only on chiplet#1). Its a harder programming model, but it can be done.

Or maybe a mix of the two approaches can happen. Or maybe a new system is invented in the next year or two. I dunno, but its a problem. And I'm excited to wait and see what the GPU-architects will invent to solve the problem whenever chiplets arrive.
 
Last edited:
Joined
Jan 8, 2017
Messages
5,373 (4.10/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
In contrast, CPU shared memory is almost completely transparent to the programmer.
Which is why it's slow and why GPU kernels can run orders of magnitude faster.

We've discussed this before Vya. Your understanding of GPU architecture is off.
SIMD "fashion" does not mean physical SIMD hardware, Terrascale was the last SIMD-like architecture which is why it also relied on VLIW to work effectively. My understanding of GPU architecture isn't off, your is simply outdated. You have to understand the way CUs work in both Nvidia and AMD hardware is analogous to SIMD but not the same, there are things that are impossible to do with regular SIMD. A CU can issue the same instructions in lock-step but on data which was addressed from multiple places indirectly, a single instruction can also generate multiple paths which can't be done in regular SIMD. For this very reason, physically there is no single "2048 bit" ALU, that would be insane, that's why they say "32 x 64", because that's how it's implemented, there are 64 separate ALUs/FPUs/etc that execute wavefronts.

Think for a moment, in Turing you can have both integer and floating point being issued within the same clock cycle using that "2048 bit unit", that wouldn't be possible with a SIMD arrangement.

If Chiplet#1 has the texture for Gordon Freeman's face, but Chiplet#2 doesn't have it, how do you expect Chiplet#2 to render Gordon Freeman's face?
It seems that you don't understand how any of this works at all or you are making a colossal confusion. Chiplet #1 or #2 has no problem accessing both textures, because as you said global memory is shared, what isn't shared is the memory that each CU has. Now there is no reason why a CU would need to access the memory of a CU from another chiplet because the programming model prohibits this, that's what you don't seem to understand. The premise from the beginning, is that none of this stuff can happen. If you need a texture for something, why would two chiplets try and apply the same instructions on the same data ? They wouldn't, at worst they would just each pull the portion of the texture that they need, because that's how shaders/kernels work.

No sharing of CU memory means no synchronization across CUs, which in the case of GPUs where you can have thousandths of threads in flight means no performance penalty of using chiplets.
 
Top