• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Patents Chiplet-based GPU Design With Active Cache Bridge

Raevenlord

News Editor
Staff member
Joined
Aug 12, 2016
Messages
3,485 (2.05/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 7 1700
Motherboard MSI X370 Gaming Pro Carbon
Cooling Arctic Cooling Liquid Freezer 120
Memory 16 GB G.Skill Trident Z F4-3200 (2x 8 GB)
Video Card(s) TPU's Awesome MSI GTX 1070 Gaming X
Storage Boot: Crucial MX100 128GB; Gaming: Crucial MX 300 525GB; Storage: Samsung 1TB HDD, Toshiba 2TB HDD
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case NOX Hummer MC Black
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD on April 1st published a new patent application that seems to show the way its chiplet GPU design is moving towards. Before you say it, it's a patent application; there's no possibility for an April Fool's joke on this sort of move. The new patent develops on AMD's previous one, which only featured a passive bridge connecting the different GPU chiplets and their processing resources. If you want to read a slightly deeper dive of sorts on what chiplets are and why they are important for the future of graphics (and computing in general), look to this article here on TPU.

The new design interprets the active bridge connecting the chiplets as a last-level cache - think of it as L3, a unifying highway of data that is readily exposed to all the chiplets (in this patent, a three-chiplet design). It's essentially AMD's RDNA 2 Infinity Cache, though it's not only used as a cache here (and for good effect, if the Infinity Cache design on RDNA 2 and its performance uplift is anything to go by); it also serves as an active interconnect between the GPU chiplets that allow for the exchange and synchronization of information, whenever and however required. This also allows for the registry and cache to be exposed as a unified block for developers, abstracting them from having to program towards a system with a tri-way cache design. There are also of course yield benefits to be taken here, as there are with AMD's Zen chiplet designs, and the ability to scale up performance without any monolithic designs that are heavy in power requirements. The integrated, active cache bridge would also certainly help in reducing latency and maintaining chiplet processing coherency.



View at TechPowerUp Main Site
 

Mussels

Moderprator
Staff member
Joined
Oct 6, 2004
Messages
49,485 (8.21/day)
Location
Australalalalalaia.
System Name Rainbow Sparkles
Processor Ryzen R7 5800X
Motherboard Asus x570 Gaming-F
Cooling EK 240mm RGB AIO | Custom 280mm EK loop
Memory 64GB DDR4 3600 Corsair Vengeance RGB @ 3800 C16
Video Card(s) Galax RTX 3090 SG 24GB (0.8v 1.8GHz) - EK ARGB block
Storage 1TB Sasmsung 970 Pro NVME + 500GB 850 Evo
Display(s) Gigabyte G32QC + Phillips 328m6fjrmb (32" 1440p 165Hz/144Hz curved )
Case Fractal Design R6
Audio Device(s) Razer Leviathan + Corsair Void pro RGB, Blue Yeti mic
Power Supply Corsair HX 750i (Platinum, fan off til 300W)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE
Software Windows 10 pro x64 (all systems)
Benchmark Scores Lots of RGB, so you know it's fast.
I'll pretend i understand this and just say "wooo progress!"
 
Joined
Oct 22, 2014
Messages
11,608 (4.91/day)
Location
Sunshine Coast
System Name Black Box
Processor Intel i5-9600KF
Motherboard NZXT N7 Z370 Black
Cooling Cooler Master 240 RGB AIO / Stock
Memory Thermaltake Toughram 16GB 4400MHz DDR4 or Gigabyte 16GB 3600MHz DDR4 or Adata 8GB 2133Mhz DDR4
Video Card(s) Asus Dual 1060 6GB
Storage Kingston A2000 512Gb NVME
Display(s) AOC 24" Freesync 1m.s. 75Hz
Case Corsair 450D High Air Flow.
Audio Device(s) No need.
Power Supply FSP Aurum 650W
Mouse Yes
Keyboard Of course
Software W10 Pro 64 bit
Joined
Jan 8, 2017
Messages
6,593 (4.25/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
The cache hierarchy is already something that programmers do not have to deal with directly, that mechanism is hidden from you.

So more MH/s?
Not really, hashing algorithms are memory bound, so unless you increase the memory bandwidth it's not gonna matter how many chiplets there are.
 
Joined
Sep 28, 2012
Messages
633 (0.20/day)
System Name Potato PC
Processor AMD Ryzen 5 3600
Motherboard ASRock B550M Steel Legend
Cooling ID Cooling SE 224XT Basic
Memory 32GB Team Dark Alpha DDR4 3600Mhz
Video Card(s) MSI RX 5700XT Mech OC
Storage Kingston A2000 1TB + 8 TB Toshiba X300
Display(s) Mi Gaming Curved 3440x1440 144Hz
Case Cougar MG120-G
Audio Device(s) Plantronic RIG 400
Power Supply Seasonic X650 Gold
Mouse Logitech G903
Keyboard Logitech G613
Benchmark Scores Who need bench when everything already fast?
At first glance I find it quite "challenging" to feed all cores with data, there will be scenario that GPU cores could "starve". But there is CPU access in the schematic, maybe as a command prefetcher or just DMA. AMD already has R-BAR so the CPU could play a big portion here.

-= edited=-
Remind me of hUMA, it all makes sense now why are they waiting to bring this to new AM5 platform with DDR5 RAM.
 
Last edited:
Joined
Feb 3, 2017
Messages
2,947 (1.93/day)
Processor R5 5600X
Motherboard ASUS ROG STRIX B550-I GAMING
Cooling Alpenföhn Black Ridge
Memory 2*16GB DDR4-2666 VLP @3800
Video Card(s) Geforce RTX 3070 FE
Storage 1TB Samsung 970 Pro, 2TB Intel 660p
Display(s) ASUS PG279Q, Eizo EV2736W
Case Dan Cases A4-SFX
Power Supply Corsair SF600
Mouse Corsair Ironclaw Wireless RGB
Keyboard Corsair K60
Not really, hashing algorithms are memory bound, so unless you increase the memory bandwidth it's not gonna matter how many chiplets there are.
Sure it matters. As long as AMD has a 4+GB caching chiplet it'll be awesome for mining :D
 
Joined
Feb 20, 2019
Messages
2,362 (3.02/day)
System Name Flavour of the month. I roll through hardware like it's not even mine (it often isn't).
Processor 3900X, 5800X, 2700U
Motherboard Aorus X570 Elite, B550 DS3H
Cooling Alphacool CPU+GPU soft-tubing loop (Laing D5 360mm+140mm), AMD Wraith Prism
Memory 32GB Patriot 3600CL17, 32GB Corsair LPX 3200CL16, 16GB HyperX 2400CL14
Video Card(s) 2070S, 5700XT, Vega10
Storage 1TB WD S100G, 2TB Adata SX8200 Pro, 1TB MX500, 500GB Hynix 2242 bastard thing, 16TB of rust + backup
Display(s) Dell SG3220 165Hz VA, Samsung 65" Q9FN 120Hz VA
Case NZXT H440NE, Silverstone GD04 (almost nothing original left inside, thanks 3D printer!)
Audio Device(s) CA DacMagic+ with Presonus Eris E5, Yamaha RX-V683 with Q Acoustics 3000-series, Sony MDR-1A
Power Supply BeQuiet StraightPower E9 680W, Corsair RM550, and a 45W Lenovo DC power brick, I guess.
Mouse G303, MX Anywhere 2, Another MX Anywhere 2.
Keyboard CM QuickFire Stealth (Cherry MX Brown), Logitech MX Keys (not Cherry MX at all)
Software W10
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
 
Last edited:
Joined
Jul 16, 2014
Messages
4,777 (1.94/day)
Location
SE Michigan
System Name Dumbass
Processor AMD-9370BE @4.6
Motherboard ASUS SABERTOOTH 990FX R2.0 +SB950
Cooling CM Nepton 280L
Memory G.Skill Sniper 16gb DDR3 2400
Video Card(s) GreenTeam 1080 Gaming X 8GB
Storage C:\SSD (240GB), D:\Seagate (2TB), E:\Western Digital (1TB)
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Logitech G700s
Keyboard Logitech G910 Orion Spark
Software windows 10
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
Ravenlord said:
Before you say it, it's a patent application; there's no possibility for an April Fool's joke on this sort of move.

So this is a delayed April Fool Article? j/k :roll: :p

I expect the patent trolls are already digging for that one line of code or whatever so they can sue.

Infinitycache is Infinity Fabric for GPUs
not like they can use the same name, that serves, essentially, the same function.
 
Joined
Feb 20, 2019
Messages
2,362 (3.02/day)
System Name Flavour of the month. I roll through hardware like it's not even mine (it often isn't).
Processor 3900X, 5800X, 2700U
Motherboard Aorus X570 Elite, B550 DS3H
Cooling Alphacool CPU+GPU soft-tubing loop (Laing D5 360mm+140mm), AMD Wraith Prism
Memory 32GB Patriot 3600CL17, 32GB Corsair LPX 3200CL16, 16GB HyperX 2400CL14
Video Card(s) 2070S, 5700XT, Vega10
Storage 1TB WD S100G, 2TB Adata SX8200 Pro, 1TB MX500, 500GB Hynix 2242 bastard thing, 16TB of rust + backup
Display(s) Dell SG3220 165Hz VA, Samsung 65" Q9FN 120Hz VA
Case NZXT H440NE, Silverstone GD04 (almost nothing original left inside, thanks 3D printer!)
Audio Device(s) CA DacMagic+ with Presonus Eris E5, Yamaha RX-V683 with Q Acoustics 3000-series, Sony MDR-1A
Power Supply BeQuiet StraightPower E9 680W, Corsair RM550, and a 45W Lenovo DC power brick, I guess.
Mouse G303, MX Anywhere 2, Another MX Anywhere 2.
Keyboard CM QuickFire Stealth (Cherry MX Brown), Logitech MX Keys (not Cherry MX at all)
Software W10
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
not like they can use the same name, that serves, essentially, the same function.
That's what I was implying though, they're not the same function.
  • Infinity Fabric connects cores to memory controllers, and cores manage their cache.
  • Infinity cache connects cache to memory controllers, and cores manage their memory controllers.
I mean, sure - they both connect things which is the same function - but so do nails, tape, and string - yet those things are allowed to have different names? :p
 
Joined
Mar 30, 2021
Messages
10 (0.91/day)
System Name Dell Alienware Aurora R10
Processor Ryzen 5600x
Motherboard Dell 570 or B550
Cooling Alienware AIO sandwiched between two Corsair ML120 Pro's
Memory G.SKILL Ripjaws V Series 32GB cl16
Video Card(s) Radeon RX 6800 XT
Storage Western Digital WD BLACK SN750 NVMe M.2 2280 2TB
Display(s) GIGABYTE G34WQC 34" 144Hz (plus 2 Dell 19" 1280x1024 to flank it)
Case Alienware Auraor r10
Audio Device(s) onboard
Power Supply Dell 1KW
Mouse Logitech Trackman Marble
Keyboard blue glowy thinhy 104 key KB
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
 
Joined
Dec 23, 2012
Messages
1,498 (0.49/day)
Location
Somewhere Over There!
System Name COVID Bonus Build
Processor Ryzen R9 3900XT 4.5 Ghz @ 1.275V
Motherboard Asus ROG Crosshair Viii Hero Wifi
Cooling Lian Li 360 Galahad
Memory G.Skill Trident Z Neo 32gb OC @ 3733 mhz CL14-13-13-21 1T @ 1.43V
Video Card(s) Sapphire RX 6900 XT Nitro+
Storage Seagate 520 1TB + Samsung 970 Evo Plus 1TB + lots of HDD's
Display(s) Samsung Odyssey G7
Case Lian Li PC-O11D XL White
Audio Device(s) Onboard
Power Supply Super Flower Leadex SE Platinum 1000W
Mouse Logitech MX Master 2S
Keyboard ABKO K660 ARC GAMING
Software Windows 10 Pro
Benchmark Scores Have tried but can't beat the leaders :)
So more MH/s?
I dont think so. Look at the 6000 series. vs rtx 3000. rtx 3000 have higher memory bandwidth thats why they have more MH/S. Miners like memory speed vs Core speed
 
Joined
Apr 5, 2021
Messages
5 (0.83/day)
Location
Brazil - São Paulo
System Name Windows 10 Enterprise LTSC x64 modified by me
Processor AMD A10 7800
Motherboard Gigabyte GA-F2A88XM-D3HP
Cooling Air colled
Memory 2x HyperX 8GB DDR 3
Video Card(s) Radeon RX580 8GB Power Color
Storage SSD SanDisk 240 GB + ST 1000LM 024 HN-M101MBB SATA Disk
Display(s) 1 Sansung 18,5" LCD + 1 AOC 18,5" LCD
Case Deep Cool Tesseract
Audio Device(s) Power Amplifier made by me
Power Supply Corsair CX 750M
Mouse Microsoft Wireless mobile mouse 3500
Keyboard LogitechY-ST39
Software several
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
hello yes i totally agree with your reasoning
 
Joined
Oct 12, 2005
Messages
129 (0.02/day)
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.

They use the L3 cache for that, this save a lot of time but when you have multiple L3 cache, you need to have mechanism that detect if the data is in another L3 cache and then collect it. (very simplified explanation)

Having it in the bridge is probably the best solution as it will be aware of all others chiplets. But, connecting that to each chiplets will add latency and will have reduced bandwidth. But chip design is all about compromise and making the best choice that give the best performance overall.

We will see
 
Joined
Apr 30, 2011
Messages
1,833 (0.50/day)
Location
Greece
Processor AMD Ryzen 5 2600X@95W
Motherboard MSI B450 Tomahawk MAX
Cooling Deepcool Gammaxx 400 Black
Memory 2*8GB PATRIOT PVS416G373C7K@3333MT_C16
Video Card(s) Sapphire Radeon RX 5700 Pulse 8GB
Storage Sandisk SSD 120GB, INTEL 540S SSDSCKKW180H6 180GB, Samsung F1 1TB, Hitachi HUS724040ALE640 4TB
Display(s) AOC 27G2U/BK IPS 144Hz
Case SHARKOON M25-W 7.1 BLACK
Audio Device(s) Realtek 7.1 onboard
Power Supply Zalman Z550
Mouse Sharkoon SHARK Force Black
Keyboard Trust GXT280
Software Win 7 sp1 64bit/Win 10 pro 64bit
Benchmark Scores CB R15 64bit: single core 173p, multicore 1306p
So more MH/s?
AMDs new cache for RDNA2 reduced mining performance and me thinks this one isn't one to help that type of workloads either...
 
Joined
Jun 3, 2010
Messages
1,746 (0.44/day)
I think AMD is going to leverage Infinity Cache to compete with Nvidia because they have been behind in the cache bandwidth race since Maxwell.
AMD had been successively expanding the chip resources, albeit never found the medium to express what it can do unequivocally.
 
Joined
Dec 29, 2010
Messages
2,087 (0.56/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1300 G2
Mouse Logitech G600
Keyboard Corsair K95
I think AMD is going to leverage Infinity Cache to compete with Nvidia because they have been behind in the cache bandwidth race since Maxwell.
AMD had been successively expanding the chip resources, albeit never found the medium to express what it can do unequivocally.
Huh? Did you even read the OP? This is gpu chiplet.
 
Joined
Jan 8, 2017
Messages
6,593 (4.25/day)
System Name Good enough
Processor AMD Ryzen R7 1700X - 4.0 Ghz / 1.350V
Motherboard ASRock B450M Pro4
Cooling Deepcool Gammaxx L240 V2
Memory 16GB - Corsair Vengeance LPX - 3333 Mhz CL16
Video Card(s) OEM Dell GTX 1080 with Kraken G12 + Water 3.0 Performer C
Storage 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) 4K Samsung TV
Case Deepcool Matrexx 70
Power Supply GPS-750C
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.
CPU cores often need to share data, GPU cores do not, what they need to execute is usually data independent.
 
Joined
Dec 29, 2010
Messages
2,087 (0.56/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1300 G2
Mouse Logitech G600
Keyboard Corsair K95
I'll pretend i understand this and just say "wooo progress!"
The biggest issue with gpu chiplets like SLI are the developers. Thus they have to architect a way to do it seamlessly w/o relying on devs to make it work. And here we are one step closer.
 
Joined
Apr 5, 2021
Messages
5 (0.83/day)
Location
Brazil - São Paulo
System Name Windows 10 Enterprise LTSC x64 modified by me
Processor AMD A10 7800
Motherboard Gigabyte GA-F2A88XM-D3HP
Cooling Air colled
Memory 2x HyperX 8GB DDR 3
Video Card(s) Radeon RX580 8GB Power Color
Storage SSD SanDisk 240 GB + ST 1000LM 024 HN-M101MBB SATA Disk
Display(s) 1 Sansung 18,5" LCD + 1 AOC 18,5" LCD
Case Deep Cool Tesseract
Audio Device(s) Power Amplifier made by me
Power Supply Corsair CX 750M
Mouse Microsoft Wireless mobile mouse 3500
Keyboard LogitechY-ST39
Software several
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.

They use the L3 cache for that, this save a lot of time but when you have multiple L3 cache, you need to have mechanism that detect if the data is in another L3 cache and then collect it. (very simplified explanation)

Having it in the bridge is probably the best solution as it will be aware of all others chiplets. But, connecting that to each chiplets will add latency and will have reduced bandwidth. But chip design is all about compromise and making the best choice that give the best performance overall.

We will see
yes I also agree with you, but in my view this already comes from the first chips you remember the memories of 512KB or even 1MB were also very expensive and I think this will not change so soon unfortunately; hmm on the other hand is the price of constant evolution that we have to pay...
 
Joined
Jan 21, 2021
Messages
8 (0.10/day)
Location
Wales, UK
Processor AMD Ryzen 7 4750G
Motherboard Gigabyte B550I AORUS PRO AX
Cooling Noctua NH-L9a Low Profile
Memory Patriot Viper Steel DDR4-4400 16GB
Video Card(s) ATI Radeon RX Vega 56 (eGPU) | Vega 8 (renoir) (iGPU)
Storage 1x Seagate Firecuda 510 500GB and 1x Samsung Evo 970 250GB
Display(s) BenQ EW3270U 4K HDR 32 inch
Power Supply Corsair SF450
Software Windows 10 Pro x64
On one of the diagrams there’s an arrow going in from the CPU into the SDF. It appears the CPU will have direct access to the Scalable Data Fabric (which already makes up part of Infinity Fabric we see on Ryzen and Vega onwards GPUs) which will grant the ability of the CPU to read and write data to, from and between GPU chiplets thus connecting everything together. Which MAY allow for a more efficient and coherent data transfer between the CPU and GPU chiplets and between the GPU chiplets. The new (?maybe) interconnect within the GPU chiplet is the GDF lets call it Graphics Data Fabric which I dont know anything about yet which appears to offer all the WorkGroup Processors within the GPU chiplet coherency between them and the Level 2 cache. Interesting glimpse into the future.
 
Joined
Oct 12, 2005
Messages
129 (0.02/day)
CPU cores often need to share data, GPU cores do not, what they need to execute is usually data independent.
This is mostly true altought less and less true as there are more and more technique that reuse generated data. This is also why SLI/Crossfire is dead. The latency to move these data was just way too big. Temporal AA, ScreenSpace reflection, etc...
 
Joined
Jul 13, 2016
Messages
1,009 (0.58/day)
Processor Ryzen 3700X
Motherboard ASRock X570 Taichi
Cooling Le Grand Macho
Memory 32GB DDR4 3600 CL16
Video Card(s) EVGA 1080 Ti
Storage Too much
Display(s) Acer 144Hz 1440p IPS 27"
Case Thermaltake Core X9
Audio Device(s) JDS labs The Element II, Dan Clark Audio Aeon II
Power Supply EVGA 850w P2
Mouse G305
Keyboard iGK64 w/ 30n optical switches
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.

Yes bouncing data around the dies will increase latency but that's easily mitigated by keeping data processing for each job within the die it's being worked on.
 
Joined
Mar 10, 2010
Messages
8,761 (2.16/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R7 3800X@4.350/525/ Intel 8750H
Motherboard Crosshair hero7 @bios 2703/?
Cooling 360EK extreme rad+ 360$EK slim all push, cpu Monoblock Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in two sticks./16Gb
Video Card(s) Sapphire refference Rx vega 64 EK waterblocked/Rtx 2060
Storage Silicon power qlc nvmex3 in raid 0/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd
Display(s) Samsung UAE28"850R 4k freesync, LG 49" 4K 60hz ,Oculus
Case Lianli p0-11 dynamic
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
While I agree with most of your points, I so think your wrong on efficiency and IPC because people (Not AMD but scientists I can't recall including those of Nvidia)have already proven that it can be both more efficient and give higher IPC, forget people even, AMD themselves also proved it with the Zen architecture
 
Top