• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD's next-gen RDNA 2 rumor: 40-50% faster than GeForce RTX 2080 Ti

Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
RDNA 2 is 50% faster then the 2080TI, just like the 3080TI.
But the problem is the drivers....

I still buy the GTX 3070.
Strange input ,so mythical cards better than old cards ,no proof yet, driver issues on unreleased card means your buying a different unicorn.

Great input, come again.

In fact I'm not sure you should bother.

Your minds made up of unicorn poop.
 
Joined
Dec 31, 2009
Messages
19,366 (3.71/day)
Benchmark Scores Faster than yours... I'd bet on it. :)
RDNA 2 is 50% faster then the 2080TI, just like the 3080TI.
But the problem is the drivers....

I still buy the GTX 3070.
300px-Carnac.jpg
 
Joined
Apr 14, 2019
Messages
221 (0.12/day)
System Name Violet
Processor AMD Ryzen 5800X
Motherboard ASRock x570 Phantom Gaming X
Cooling Be quiet! Dark Rock Pro 4
Memory G.Skill Flare x 32GB 3400Mhz
Video Card(s) MSI 6900XT Gaming X Trio
Storage Western Digital WD Black SN750 1TB
Display(s) 3440x1440
Case Lian Li LANCOOL II MESH Performance
Power Supply Corsair RM850x
Mouse EVGA X15
Keyboard Corsair K95 RGB
Software Windows 10 64bit
Strange input ,so mythical cards better than old cards ,no proof yet, driver issues on unreleased card means your buying a different unicorn.

Great input, come again.

In fact I'm not sure you should bother.

Your minds made up of unicorn poop.
Strange guy :kookoo:.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Strange guy :kookoo:.
You could have just said I'm buying Nvidia regardless.

That represents what you said better and encompasses all the facts and known data points in your post better.

But is still pointless to this thread.
 
Joined
Sep 17, 2014
Messages
20,898 (5.97/day)
Location
The Washing Machine
Processor i7 8700k 4.6Ghz @ 1.24V
Motherboard AsRock Fatal1ty K6 Z370
Cooling beQuiet! Dark Rock Pro 3
Memory 16GB Corsair Vengeance LPX 3200/C16
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Samsung 850 EVO 1TB + Samsung 830 256GB + Crucial BX100 250GB + Toshiba 1TB HDD
Display(s) Gigabyte G34QWC (3440x1440)
Case Fractal Design Define R5
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse XTRFY M42
Keyboard Lenovo Thinkpad Trackpoint II
Software W10 x64
Strange guy :kookoo:.

Nah... he's seeing things pretty well if you ask me. If you're buying rumors that card XYZ is whatever percentage faster than whichever other card at this point in time, you simply didn't get it and you probably won't this gen either. Live and learn, come back for the next round and maybe you'll be wiser.
 
Joined
Jan 8, 2017
Messages
8,924 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
For a while, with each generation ... AMD lost a tier. With all cards OC'd, nVidia had the top 2 spots with 7xx / 2xx ... then they took a 3rd with the 950 ... then a 4th with the 1060.

You are delirious.

Its strange, isn't it? That although GPUs are super-parallel machines, that its far harder to make a chiplet-GPU design than a chiplet-CPU design.

GPUs are far easier to implement using chiplets, threads only need to communicate among other threads within the same CU, this greatly simplifies everything compared to a CPU where you need to ensure performance when communicating across chiplets.

but Nvidia can charge 700usd for them just because there are no competition

It has been proven time and time again that this is not how it works. AMD could come out with a 2000$ GPU faster than anything else out there and Nvidia would still sell something like a 2080 for 700$, the prices are set by what consumers are willing to pay not by the competition. How is it that Apple charged and still does so much for their phones all these years ? Can you say they had no competition ?

Here is another fallacy for you : if prices would go down every time there was "competition" then that would mean profits would slowly tend towards zero, because eventually there will always be competition and prices would have to drop no matter what according to this wonderful logic. Obviously that doesn't happen since these companies grow by each passing year, even the ones that aren't "competitive", prices go down when consumers no longer buy the same volume.

Nvidia's volume kept increasing and so did their prices.
AMD's volume either stagnated or went down, their products either remained the same or became cheaper.

Coincidence ? Nvidia's prices will only go down when they hit a plateau in terms of volume sold, which they will inevitably at some point. There is a limited amount of potential consumers and their cash.
 
Last edited:
Joined
Apr 14, 2019
Messages
221 (0.12/day)
System Name Violet
Processor AMD Ryzen 5800X
Motherboard ASRock x570 Phantom Gaming X
Cooling Be quiet! Dark Rock Pro 4
Memory G.Skill Flare x 32GB 3400Mhz
Video Card(s) MSI 6900XT Gaming X Trio
Storage Western Digital WD Black SN750 1TB
Display(s) 3440x1440
Case Lian Li LANCOOL II MESH Performance
Power Supply Corsair RM850x
Mouse EVGA X15
Keyboard Corsair K95 RGB
Software Windows 10 64bit
But is still pointless to this thread
Uhmm look at the title and the history from AMD and Nvidia ....:wtf:
if you had a little more experience in the hardware, you would have known why I react the way I react.
So it is not pointless.
 
Joined
Nov 11, 2016
Messages
3,062 (1.13/day)
System Name The de-ploughminator Mk-II
Processor i7 13700KF
Motherboard MSI Z790 Carbon
Cooling ID-Cooling SE-226-XT + Phanteks T30
Memory 2x16GB G.Skill DDR5 7200Cas34
Video Card(s) Asus RTX4090 TUF
Storage Kingston KC3000 2TB NVME
Display(s) LG OLED CX48"
Case Corsair 5000D Air
Power Supply Corsair HX850
Mouse Razor Viper Ultimate
Keyboard Corsair K75
Software win11
It has been proven time and time again that this is not how it works. AMD could come out with a 2000$ GPU faster than anything else out there and Nvidia would still sell something like a 2080 for 700$, the prices are set by what consumers are willing to pay not by the competition. How is it that Apple charged and still does so much for their phones all these years ? Can you say they had no competition ?

Here is another fallacy for you : if prices would go down every time there was "competition" then that would mean profits would slowly tend towards zero, because eventually there will always be competition and prices would have to drop no matter what according to this wonderful logic. Obviously that doesn't happen since these companies grow by each passing year, even the ones that aren't "competitive", prices go down when consumers no longer buy the same volume.

Nvidia's volume kept increasing and so did their prices.
AMD's volume either stagnated or went down, their products either remained the same or became cheaper.

Coincidence ? Nvidia's prices will only go down when they hit a plateau in terms of volume sold, which they will inevitably at some point. There is a limited amount of potential consumers and their cash.

Aren't you mixing MRSP and price gouging ? retailer price gouging all the time with products on high demand. Nvidia, AMD and even AIBs don't get to pocket those money.

Well there have been rumor that Nvidia and AMD have been mingling in some price fix scheme to maintain a healthy profit margins. It's also vital for Nvidia to keep their opponent alive otherwise they would be subjected to antitrust law. That is the sad reality of a Duopoly, had Intel been successful with their GPU you will see much better prices/performance.

Currently there are no competitor to 2080 Super and 2080 Ti in term of performance, Nvidia can set whatever MSRP they want with those. But of course they have to know what price customer are willing to pay for that kind of GPU performance, this is done through market research. The 2080 Ti has been selling at over 1200usd for almost 2 years now, do you think because they are selling in big volume ?
 
Last edited:
Joined
Apr 24, 2020
Messages
2,559 (1.76/day)
GPUs are far easier to implement using chiplets, threads only need to communicate among other threads within the same CU, this greatly simplifies everything compared to a CPU where you need to ensure performance when communicating across chiplets.

I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

1596737868955.png


In contrast, there hasn't been GPU or large-scale SIMD system, to my knowledge, that ever was made using a chiplet design.

Ergo: GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now. Its not like we're missing on demand either. The fastest multiGPU / SIMD system available so far today is NVidia's DGX-2, but even that doesn't use chiplets yet.

----------

Again: this is because CPU to CPU communications is lower-bandwidth than what GPU-to-GPU communications are. The DGX-2 NVLink / NVSwitch system is 300GBps bandwidth chip-to-chip. In contrast, AMD Zen CPUs are only around ~50GBps for IFOPs (Infinity Fabric On Package) links.

GPUs need more bandwidth in their core-to-core communications than CPUs do. I expect this higher GPU-bandwidth requirement makes making GPU chiplets harder in practice.
 
Last edited:

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
How so ? AMD and others have always claimed that they haven't found a way for GPU chiplets to be implemented because of CrossFire type of issues?
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

View attachment 164801

In contrast, there hasn't been GPU or large-scale SIMD system, to my knowledge, that ever was made using a chiplet design.

Ergo: GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now. Its not like we're missing on demand either. The fastest multiGPU / SIMD system available so far today is NVidia's DGX-2, but even that doesn't use chiplets yet.
Xe, Hopper?!

Intel uses tiles ie chiplets

Nvidia's Hopper features similar but called something else no doubt, like game cores !?

Some rumours of Rdna3 point to chiplets.
 
Joined
Apr 24, 2020
Messages
2,559 (1.76/day)
Xe, Hopper?!

I do think that Intel is onto something with its EMIB technology. If Intel is the first one to figure something out for GPU-chiplets, I wouldn't be surprised.

Hopper is pretty secretive. I haven't found much information on it.

------

I'm certain that GPUs will eventually be chiplets. The issues at 7nm and 5nm have made the chiplet methodology the clear path forward. But I don't believe that it will be an easy journey. There will be architectural changes and new issues brought up.
 

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
Xe, Hopper?!

Intel uses tiles ie chiplets

Nvidia's Hopper features similar but called something else no doubt, like game cores !?

Some rumours of Rdna3 point to chiplets.

The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
 
Joined
Mar 23, 2005
Messages
4,061 (0.58/day)
Location
Ancient Greece, Acropolis (Time Lord)
System Name RiseZEN Gaming PC
Processor AMD Ryzen 7 5800X @ Auto
Motherboard Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s) ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage Corsair Force MP500 480GB M.2 & MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X NVMe 1TB
Display(s) ASUS ROG Strix 34” XG349C 180Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case Corsair Obsidian Series 450D Gaming Case
Audio Device(s) SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply Corsair RM750x Power Supply
Mouse Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software Windows 11 Pro - 64-Bit Edition
Benchmark Scores I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...
The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
The Chiplets are good for Ryzen, ZEN2 proved this as being a massive success.
But ZEN3 is going to be a new design. Probably not fully Chiplets based.

If AMD can utilize the Chiplets with RDNA3 and has high performance and low power draw, it's a good thing.
 

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
The Chiplets are good for Ryzen, ZEN2 proved this as being a massive success.
But ZEN3 is going to be a new design. Probably not fully Chiplets based.

If AMD can utilize the Chiplets with RDNA3 and has high performance and low power draw, it's a good thing.

Zen 2 proved it's better than the mediocre Intel counterparts.
But if you are wiser, just get your mighty Renoir laptop with a 15-watt APU that is as fast as a 65-watt desktop counter-part.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
The Zen chiplets are good for EPYC, not for Ryzen. For CDNA, not for RDNA.
Renoir (monolithic) is faster and more efficient than Matisse (chiplets).
Economies of scale weigh against that argument, once you scale up chiplets on the smallest node.

Monolithic may be better but it's scaling is limited, so is output possibly and it doesn't help with heat management, it will remain the choice used for most device's though due to costs.

N Ryzen works fine.
 

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
Economies of scale weigh against that argument, once you scale up chiplets on the smallest node.

Monolithic may be better but it's scaling is limited, so is output possibly and it doesn't help with heat management, it will remain the choice used for most device's though due to costs.

N Ryzen works fine.

What do you mean by economies of scale?
Ryzen with chiplets may be or may not be cheaper for production than the monolithic Renoir.
Renoir is 156 mm^2. Ryzen 3000 is 80 + 120 mm^2.

You can offer the 15-watt Ryzen 7 4800U as a full replacement to the 65-watt Ryzen 5 3600, and then use the chiplets for everything above.
 
Joined
Mar 10, 2010
Messages
11,878 (2.31/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
What do you mean by economies of scale?
Ryzen with chiplets may be or may not be cheaper for production than the monolithic Renoir.
Renoir is 156 mm^2. Ryzen 3000 is 80 + 120 mm^2.

You can offer the 15-watt Ryzen 7 4800U as a full replacement to the 65-watt Ryzen 5 3600, and then use the chiplets for everything above.
We're talking about graphics cards!?

To attain 8K 120Hz frame rates of a raytraced version of battlefield 7 we are not going to find that easy to run, yet some are working towards this, tile based rendering helps by the fact the tiles split the workload.

And as epyc is proving ,for high throughput computation , chiplets can work well.

On chip Mgpu that's invisible to the user.
 

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
We're talking about graphics cards!?

To attain 8K 120Hz frame rates of a raytraced version of battlefield 7 we are not going to find that easy to run, yet some are working towards this, tile based rendering helps by the fact the tiles split the workload.

And as epyc is proving ,for high throughput computation , chiplets can work well.

On chip Mgpu that's invisible to the user.

Let's first focus on 4K. Because it hasn't been made popular enough just yet.
 
Joined
Jan 8, 2017
Messages
8,924 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I expect this higher GPU-bandwidth requirement makes making GPU chiplets harder in practice.

If you begin with 1 chiplet and n memory chips you can then move onto m chiplets and n * m memory modules. It can all scales linearly need be, this isn't a problem, especially now that we have HBM. In fact you can take a look at existing GPUs and you'll see than usually bandwidth does not need to increase linearly with number compute units, it's sublinear. I wont go into why that's the case but the point is memory bandwidth is not the reason why MCM GPU haven't been made.

I'll believe it when I see it.

There have been multiple CPUs implemented as chiplets. Not only the recent Zen chips, but also IBM's Power5 back in 2004 used a chiplet.

GPU / SIMD systems as chiplets must be harder. Otherwise, we would have made a SIMD-chiplet by now.

A SIMD chiplet doesn't even make sense, SIMD needs centralized instruction dispatch and logic, modern GPUs aren't SIMD , meaning there isn't a 64*32bit wide vector register physically on the chip. It's all scalar and it's compartmentalized in CUs which is why you can easily spread CUs in multiple chiplets. With a GPU you are guaranteed that the CU in one chiplet does not need to communicate with a CU in another chiplet. In other words they wouldn't suffer from the same issues CPUs do where a core would need to read/write to a cache line in another core, or even worse, if it encounters the issue of false sharing.

CPUs are undoubtedly harder to implement using chiplets than GPUs but here's why they became a thing before GPUs : There was a need for it that couldn't be resolved in any other way. With GPUs because of the way software is written for them you can just scale the problem up by using multiple GPUs that don't necessarily need to communicate with each other (because the way GPGPU algorithms have to be implemented makes it a requirement from the start that you can't communicate above a certain level, which is the CU).

Therefore you can stuff a lot of GPUs on a single motherboard. On the other hand CPUs are intended for different tasks that can't be scaled up in the same way, socket to socket communication is basically a death sentence for achieving high performance so the only solution is to stuff as many CPUs on a single socket , those would be the "chiplets".

this is because CPU to CPU communications is lower-bandwidth than what GPU-to-GPU communications are.

GPUs don't have to communicate with each other the same way CPUs do, as I explained above. That's why AMD and Nvidia largely gave up putting multiple GPUs on a single board where theoretically you could have achieved higher GPU to GPU bandwidth, because it's a waste of time.
 
Joined
Apr 24, 2020
Messages
2,559 (1.76/day)
If you begin with 1 chiplet and n memory chips you can then move onto m chiplets and n * m memory modules.

While CUDA seems to have the programming API for this, I don't believe this is common in DirectX (11 or 12), OpenGL, or Vulcan code. Even then, I don't think that people typically use CUDA's memory management interface like this, because its only relevant on the extremely niche DGX-class of computers.

In contrast, CPU shared memory is almost completely transparent to the programmer. The OS could easily migrate your process to other chips (affinity settings notwithstanding). In fact, affinity settings were invented to prevent the OS from moving your process around.

I say "almost" completely transparent, because NUMA does exist if you really want to go there. But CPU programmers have gotten surprisingly far without ever worrying about NUMA details (unless you're the cream-of-the-crop optimizer. Its a very niche issue where most programmers simply trust the OS to do the right thing).

The software ecosystem that would support a multi-chipset architecture, with each chipset having an independent memory space, simply does not exist. Therein lies the problem: we either have to make a NUMA-like API where each chiplet has a NUMA-like memory space that the programmer has to manage. OR, we build a crossbar, similar to AMD's Infinity Fabric (IFOP) which transparently copies data between chips... providing the programmer an illusion that all of the memory is in the same space.

50GBps is sufficient for AMD Infinity Fabric. For the same thing to happen on GPUs, NVidia has demonstrated that 300GBps is needed in their DGX-2 computers.

This isn't an easy problem, by any stretch of the imagination. I do imagine that it will be solved eventually, but I'm grossly interested in seeing how its done. I'm betting that NVidia will shrink their NVLink and NVSwitch system down and make it cheaper somehow.

A SIMD chiplet doesn't even make sense, SIMD needs centralized instruction dispatch and logic, modern GPUs aren't SIMD , meaning there isn't a 64*32bit wide vector register physically on the chip.

We've discussed this before Vya. Your understanding of GPU architecture is off.


1596761260080.png


AMD Vega (and all GCN processors) had the above memory diagram. The 256 VGPR registers were arranged in a 64 x 32-bit array called "SIMD 0". Vega's compute units are pretty complicated and there is also SIMD1, SIMD2, SIMD3 with independent instruction pointers.

The entire class of VGPRs operate in a SIMD fashion, as demonstrated by chapter 6.

1596761352011.png


So when you do an "v_add_F32 0, 1", this means all 64-values in VGPR#1 are added to all 64-values in VGPR#0, and then the result is stored into VGPR#0. Its a 64-wide SIMD operation. All "v" operations on Vega are 64-wide. RDNA changed this to a 32-wide operation instead. But the concept is similar.

I'm less familiar with NVidia's architecture, but I assume a similar effect happens with their PTX instructions.

With a GPU you are guaranteed that the CU in one chiplet does not need to communicate with a CU in another chiplet.

At a minimum, video games share textures. If Chiplet#1 has the texture for Gordon Freeman's face, but Chiplet#2 doesn't have it, how do you expect Chiplet#2 to render Gordon Freeman's face?

GPUs, as currently architected, have a unified memory space where all information is shared. Crossfire halved the effective memory, because to solve the above issue, they simply copied the texture to both GPUs. (IE: two 4GB GPUs will have a total of 4GBs of VRAM, because every piece of data will be replicated between the two systems). It was a dumb and crappy solution, but it worked for the purposes of Crossfire.

This is why inter-chip communications might happen. If you want Gordon Freeman's face to be rendered on Chiplet#1 and Chiplet#2 in parallel, you need a way to share that face-texture data between the two chips. This is the approach of NVidia's NVSwitch in the DGX-2 computer.

Alternatively, you could tell the programmer that Chiplet#2 cannot render Gordon Freeman's face because the data is unavailable. This would be a NUMA-like solution (the data exists only on chiplet#1). Its a harder programming model, but it can be done.

Or maybe a mix of the two approaches can happen. Or maybe a new system is invented in the next year or two. I dunno, but its a problem. And I'm excited to wait and see what the GPU-architects will invent to solve the problem whenever chiplets arrive.
 
Last edited:
Joined
Jan 8, 2017
Messages
8,924 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
In contrast, CPU shared memory is almost completely transparent to the programmer.

Which is why it's slow and why GPU kernels can run orders of magnitude faster.

We've discussed this before Vya. Your understanding of GPU architecture is off.

SIMD "fashion" does not mean physical SIMD hardware, Terrascale was the last SIMD-like architecture which is why it also relied on VLIW to work effectively. My understanding of GPU architecture isn't off, your is simply outdated. You have to understand the way CUs work in both Nvidia and AMD hardware is analogous to SIMD but not the same, there are things that are impossible to do with regular SIMD. A CU can issue the same instructions in lock-step but on data which was addressed from multiple places indirectly, a single instruction can also generate multiple paths which can't be done in regular SIMD. For this very reason, physically there is no single "2048 bit" ALU, that would be insane, that's why they say "32 x 64", because that's how it's implemented, there are 64 separate ALUs/FPUs/etc that execute wavefronts.

Think for a moment, in Turing you can have both integer and floating point being issued within the same clock cycle using that "2048 bit unit", that wouldn't be possible with a SIMD arrangement.

If Chiplet#1 has the texture for Gordon Freeman's face, but Chiplet#2 doesn't have it, how do you expect Chiplet#2 to render Gordon Freeman's face?

It seems that you don't understand how any of this works at all or you are making a colossal confusion. Chiplet #1 or #2 has no problem accessing both textures, because as you said global memory is shared, what isn't shared is the memory that each CU has. Now there is no reason why a CU would need to access the memory of a CU from another chiplet because the programming model prohibits this, that's what you don't seem to understand. The premise from the beginning, is that none of this stuff can happen. If you need a texture for something, why would two chiplets try and apply the same instructions on the same data ? They wouldn't, at worst they would just each pull the portion of the texture that they need, because that's how shaders/kernels work.

No sharing of CU memory means no synchronization across CUs, which in the case of GPUs where you can have thousandths of threads in flight means no performance penalty of using chiplets.
 

ARF

Joined
Jan 28, 2020
Messages
3,928 (2.55/day)
Location
Ex-usa
Joined
Jan 8, 2017
Messages
8,924 (3.36/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Chiphell leaks it:

View attachment 171150

What the hell is non RTX performance, RTX is a brand name. Maybe non-DXR ? Even then, what's that supposed to include.
 
Top