• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
That's not how I remember it
Crossfire was easier to purchase due to not needing special, sometimes nvidia-only boards.
But getting crossfire to actually do its job and improve performance was a mess. SLI was not much better, but I remember with AMD I had to use specific versions of their drivers because they'd regress per game.
 
something like 5 TB/s or higher, then yes.
Categorically not, not even CPU L1 caches are that fast a lot of the time lol.
 
Categorically not, not even CPU L1 caches are that fast a lot of the time lol.

The Ryzen shows otherwise:
10.5 TB/s read - 5.3 TB/s write - 10.0 TB/s copy.

1731100111998.png
 
The Ryzen
It's like one of fastest CPUs around, a lot of CPUs out there have slower caches. It doesn't matter, it's still totally absurd, you do not need TB/s of inter chip bandwidth.
 
It doesn't matter

It matters.

1731104801819.png



it's still totally absurd, you do not need TB/s of inter chip bandwidth.

Bandwidth is essential. Otherwise, the chiplets won't work as expected and will fail, because of low performance.

Learn about inter GPU bandwidths.

1731104874399.png


1731104901899.png


 
Bandwidth is essential.
This is about hypothetical inter chip communication requirements not just simply memory access speed, those benchmarks have literally nothing to do with "inter GPU bandwidths" I don't know if you even properly read and understood what you posted.

Memory access patterns on GPUs are almost always contiguous, each core read/writes to a separate chunk of VRAM, if you break up a monolithic die into chiplets the memory bandwidth requirements stay the same. GPU threads do not communicate between each other the same way CPU cores do, they don't even have the proper hardware for complex synchronization besides simple barriers, you can't even synchronize threads globally, it's simply not how these things are designed. You can really only communicate between threads on the same GPU core, which will always access the same chunk of memory that it's memory controller has access to. GPU cores on different chiplets would not need to access VRAM that's only accessible though a different chiplet. You're the one that needs to read more on GPU architectures.

CPUs with chiplets do need more memory bandwidth either, this makes no sense. But on CPUs it is a different matter, there usually is a lot of inter thread commutation, this does pose a problem with threads on different cores but it's more a matter of latency rather than bandwidth.

Btw SLI works in a totally different matter, it's completely irrelevant to this subject. Each GPU stored a copy of what was contained in the VRAM, so every time frame buffers were updated that had to be copied between the cards.
 
Last edited:
I will tell you one thing:
1. Navi 31 failed, just in the same way like CrossFire failed.

but it's more a matter of latency rather than bandwidth

Higher bandwidth means lower latency.
So, now explain how cutting a monolithic chip into partitions improves latency ?
 
Back
Top