Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

bitsandboots · Oct 31, 2024

kapone32 said:
This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.

That's not how I remember it
Crossfire was easier to purchase due to not needing special, sometimes nvidia-only boards.
But getting crossfire to actually do its job and improve performance was a mess. SLI was not much better, but I remember with AMD I had to use specific versions of their drivers because they'd regress per game.

3valatzy · Nov 8, 2024

Vya Domus said:
This was long speculated and of course sooner or later it will happen.

Maybe one day when they invest and invent a fast enough communication channel - something like 5 TB/s or higher, then yes.

At least, we will know soon.

It's official. New AMD Radeon RDNA 4 graphics cards will be on their way shortly

AMD CEO, Lisa Su, has confirmed that we will be getting the next-gen Radeon RX 8000 gaming GPU range at some point early next year.

www.pcgamesn.com

Vya Domus · Nov 8, 2024

3valatzy said:
something like 5 TB/s or higher, then yes.

Categorically not, not even CPU L1 caches are that fast a lot of the time lol.

3valatzy · Nov 8, 2024

Vya Domus said:
Categorically not, not even CPU L1 caches are that fast a lot of the time lol.

The Ryzen shows otherwise:
10.5 TB/s read - 5.3 TB/s write - 10.0 TB/s copy.

Vya Domus · Nov 8, 2024

3valatzy said:
The Ryzen

It's like one of fastest CPUs around, a lot of CPUs out there have slower caches. It doesn't matter, it's still totally absurd, you do not need TB/s of inter chip bandwidth.

AnotherReader · Nov 8, 2024

Vya Domus said:
It's like one of fastest CPUs around, a lot of CPUs out there have slower caches. It doesn't matter, it's still totally absurd, you do not need TB/s of inter chip bandwidth.

The MI300X has over 2 TB/s of inter chip bandwidth for each partition, but I agree that with enough caching, bandwidth requirements can be reduced.

3valatzy · Nov 8, 2024

Vya Domus said:
It doesn't matter

It matters.

https://www.reddit.com/r/buildapc/comments/15ury94

Vya Domus said:
it's still totally absurd, you do not need TB/s of inter chip bandwidth.

Bandwidth is essential. Otherwise, the chiplets won't work as expected and will fail, because of low performance.

Learn about inter GPU bandwidths.

GitHub - RRZE-HPC/gpu-benches: collection of benchmarks to measure basic GPU capabilities

collection of benchmarks to measure basic GPU capabilities - RRZE-HPC/gpu-benches

github.com

Vya Domus · Nov 8, 2024

3valatzy said:
Bandwidth is essential.

This is about hypothetical inter chip communication requirements not just simply memory access speed, those benchmarks have literally nothing to do with "inter GPU bandwidths" I don't know if you even properly read and understood what you posted.

Memory access patterns on GPUs are almost always contiguous, each core read/writes to a separate chunk of VRAM, if you break up a monolithic die into chiplets the memory bandwidth requirements stay the same. GPU threads do not communicate between each other the same way CPU cores do, they don't even have the proper hardware for complex synchronization besides simple barriers, you can't even synchronize threads globally, it's simply not how these things are designed. You can really only communicate between threads on the same GPU core, which will always access the same chunk of memory that it's memory controller has access to. GPU cores on different chiplets would not need to access VRAM that's only accessible though a different chiplet. You're the one that needs to read more on GPU architectures.

CPUs with chiplets do need more memory bandwidth either, this makes no sense. But on CPUs it is a different matter, there usually is a lot of inter thread commutation, this does pose a problem with threads on different cores but it's more a matter of latency rather than bandwidth.

Btw SLI works in a totally different matter, it's completely irrelevant to this subject. Each GPU stored a copy of what was contained in the VRAM, so every time frame buffers were updated that had to be copied between the cards.

3valatzy · Nov 9, 2024

I will tell you one thing:
1. Navi 31 failed, just in the same way like CrossFire failed.

Vya Domus said:
but it's more a matter of latency rather than bandwidth

Higher bandwidth means lower latency.
So, now explain how cutting a monolithic chip into partitions improves latency ?

Vya Domus · Nov 9, 2024

3valatzy said:
Higher bandwidth means lower latency.

No. Lol.

System Name	SOCIETY
Processor	AMD Ryzen 9 7800x3D
Motherboard	MSI MAG X670E TOMAHAWK
Cooling	Arctic Liquid Freezer II 420
Memory	64GB 6000mhz
Video Card(s)	Nvidia RTX 3090
Storage	WD SN850X 4TB, Micron 1100 2TB, ZFS NAS over 10gbe network
Display(s)	27" Dell S2721DGF, 24" ASUS IPS, 24" Dell IPS
Case	Corsair 750D
Power Supply	Cooler Master 1200W Gold
Mouse	Razer Deathadder
Keyboard	ROG Falchion
VR HMD	Varjo Aero, Quest Pro, Pimax 8KX
Software	Windows 10 with Debian VM

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C