• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Patents Chiplet-based GPU Design With Active Cache Bridge

AMD may be experimenting with ways to separate processing cores, built on the latest tech they can get their hands on, and cache. The cache could be built using second best - now GlobalFoundries' 12mm, later something like TSMC 7nm. Static RAM doesn't scale well with node shrinks - at least the surface area doesn't scale well, I don't know about performance and power. So the cache is possibly a good candidate for being offloaded to a cheaper die, the latency would obviously go up but maintaining cache coherence would be an easier task, higher latency can also be mitigated with increased size, and AMD needs to keep buying something from GloFo anyway.
 
The biggest gains will be in clock speed, multiple domains for multiple chiplets and each can be engineered for IPC, clock speed, and or latency as required.

Imagine 4 chiplets with 4Ghz boost speeds, a 2Ghz cache that is massively parallel with compression technology, a couple tiny chiplets for video encode/decode and for low power applications.

Now add on the stacked die tech that has been learned to create a parallel pipeline for pure vector math for Ray tracing stacked on each of the main 4 chiplets that can read and write to caches on the primary die. Ray tracing with the only performance penalty being extra heat and a fraction of the latency.
 
Possibly a glimpse of RDNA4's future, doubt we'll see this in RDNA3. Mostly likely will go up against Hopper which was delayed and replaced by Lovelace for next gen.
 
Doesn't it say MCM adds as much as +1GHz!

Correct. By separating the CPU cores into a separate die you gain the ability to further bin which CPU die ends up on which CPU. This is how AMD is able to have it's 16 core 5950X that consumes less power than it's 12 core while also using less power. The 5950X is about 28% more power efficient than other Ryzen 5000 series CPUs through binning alone. AMD likely decided to go for efficiency instead of extra clocks for two reasons 1) Intel doesn't have anything competitive to it's 12 and 16 core mainstream CPUs 2) The power consumption goes up much faster above the sweet spot. Increasing the GHz would improve ST performance but at a cost. AMD likely calculated that given Intel's current prospects, it would be better to focus on efficiency.
 
This is mostly true altought less and less true as there are more and more technique that reuse generated data. This is also why SLI/Crossfire is dead. The latency to move these data was just way too big. Temporal AA, ScreenSpace reflection, etc...

Can't you have one chiplet dealing with frame/scene level calculations after you've powered through the more easily parallelizable tasks? As in 1 Bigger (perhaps on the hub chip to reduce latency to the cache) + N Small(er)?
 
Increasing the GHz would improve ST performance but at a cost.
You approach from a cpu stand point. On a gpu, the ST isn't the only factor, internal bandwidth is a major proponent. The bandwidth is a lot on a gpu however bandwidth per CU needs a lot of use to leverage fully, since the memory unit is external to the chip. Running it faster solves that problem.
Bets: 3.5GHz gpus over the horizon, or not?
 
You approach from a cpu stand point. On a gpu, the ST isn't the only factor, internal bandwidth is a major proponent. The bandwidth is a lot on a gpu however bandwidth per CU needs a lot of use to leverage fully, since the memory unit is external to the chip. Running it faster solves that problem.
Bets: 3.5GHz gpus over the horizon, or not?
I'd say it's equally as possible that we see MCM GPU architectures that simply target the frequency sweetspot and spend any extra power budget add more cores, cache, ect. It really depends though, for all we know AMD or Nvidia could design their GPU chiplets to clock very high and thus the sweetspot would follow suite. I'm not knowledgeable enough on the topic to say to the extent that Nvidia / AMD and TSMC can influence ideal GPU clockspeed based on design / node.
 
I'd say it's equally as possible that we see MCM GPU architectures that simply target the frequency sweetspot and spend any extra power budget add more cores, cache, ect. It really depends though, for all we know AMD or Nvidia could design their GPU chiplets to clock very high and thus the sweetspot would follow suite. I'm not knowledgeable enough on the topic to say to the extent that Nvidia / AMD and TSMC can influence ideal GPU clockspeed based on design / node.
Me neither, although some would consider me an old timer.
Gpus, do associate with high frequency because the power cost is already paid for. Remember Hawaii series? AMD never integrated tiled 'buffered' rasterization up until Vega and thus the memory interface never slowed down since it was always running in immediate mode whereas Nvidia can keep tabs at various memory clocks.
It could improve utilization if the shaders request at a higher rate - gpus are throughput oriented, after all...
 
Imagine 4 chiplets with 4Ghz boost speeds

There is going to be a long time before we'll see that if ever. Every kind of chip seems to start scaling horribly past the 3 Ghz mark, a GPU in particular will be horrendous efficiency wise at those kinds of speeds.
 
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
What they did to Intel? You mean, as soon as they got competitive, they also became both more expensive and hard to get in the first place - what a fantastic prospect for the already beleaguered graphics cards market indeed!
 
What they did to Intel? You mean, as soon as they got competitive, they also became both more expensive and hard to get in the first place - what a fantastic prospect for the already beleaguered graphics cards market indeed!
You DO realize that this is market forces at work right?

Demand outstripped supply so far that even though TSMC is running FLAT OUT they still cannot keep up!
They now spending 100 BILLION DOLLATRS over the next three years to build more plants so they can deal with the demand.

Then you have people buying them by the millisecond so fast with their bots that you cannot buy them through normal channels making a bad situation even worse.
But hey they do it because they can make 25 to 50% profit selling on ebay and through the gray market.

AMD made the decision to focus on supplying computer manufacturers and not direct sellers like newegg and amazon.
I just got a 6800xt and 5600x from Dell.
Placed my order, waited a month and here it is! AND I got both for what appears to be MSRP or close to it.


Be sure you are looking at the BIG PICTURE before lambasting people and companies for things that are out of their control.
 
Last edited:
Every kind of chip seems to start scaling horribly past the 3 Ghz mark, a GPU in particular will be horrendous efficiency wise at those kinds of speeds.
This could bring a split multiplier to run internal caches faster than the gpu. Don't dismiss it, the scaling isn't linear because memory is external and not helpful in the gpu pipeline flow directly - gpu speed, however, is. Nothing outside of cache speed changes that(maybe texture caching, too).
 
This could bring a split multiplier to run internal caches faster than the gpu. Don't dismiss it, the scaling isn't linear because memory is external and not helpful in the gpu pipeline flow directly - gpu speed, however, is. Nothing outside of cache speed changes that(maybe texture caching, too).

Caches are power hogs, very high energy density per area, for that reason they usually run slower than the processor itself. The only portions of memory that run as fast the processor are the registers, everything else, including L1 caches typically run slower.
 
Caches are power hogs, very high energy density per area, for that reason they usually run slower than the processor itself. The only portions of memory that run as fast the processor are the registers, everything else, including L1 caches typically run slower.
Well, guess what consumes power at an even higher rate than the caches - memory devices. The futility with saving power by cutting the effective rate is self explanatory. There is a way that is uses buffering to reduce accesses to memory and texture caching to supplant memory by sram. It ties with actual data flow across the die whereas the memory devices don't solve any bottlenecks, they are last level.
I'm not well versed enough, but there is no free lunch. SRAM offers much more than its substitutes.
 
Well, guess what consumes power at an even higher rate than the caches - memory devices. The futility with saving power by cutting the effective rate is self explanatory. There is a way that is uses buffering to reduce accesses to memory and texture caching to supplant memory by sram. It ties with actual data flow across the die whereas the memory devices don't solve any bottlenecks, they are last level.
I'm not well versed enough, but there is no free lunch. SRAM offers much more than its substitutes.
Yes access to global memory is very inefficient power wise and cache hits improves that. But the problem is caches live on die and need to be cooled and eat away at the power budget of the chip.

1617720116916.png


Remember how the Infinity cache is placed around the CUs and not between them as to how you'd expect it to be ? I think it was a deliberate choice to place this huge chunk of cache on the extremities of the chip to reduce heat spots.
 
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
Didn't they say they'll take this chiplet approach on CDNA first and not RDNA?

they also became both more expensive and hard to get in the first place
Wasn't the case until Zen 3 and this chipocalypse... Zen 2 swept the floor with Intel and it was a real market disruptor.

All AMD did was force Intel to get off their ass and make reasonable products at a more reasonable price, and even force down the price on their 10th gens, which is always good for everyone. If it weren't for them I wouldn't have a 12 core in my system right now, and would probably have to make do with 6 cores from Intel, on my old 8700.

Now, if they could make Ngreedia do the same, that'd be great... but I'm not having high hopes here. Unlike Intel, NVIDIA has never been sleeping. They are a worthy competitor to AMD. We'll see how this approach works on CDNA first - doubt the next RDNA gen will have this. Maybe the one after.
 
Last edited by a moderator:
Yes access to global memory is very inefficient power wise and cache hits improves that. But the problem is caches live on die and need to be cooled and eat away at the power budget of the chip.

View attachment 195489

Remember how the Infinity cache is placed around the CUs and not between them as to how you'd expect them to be ? I think it was a deliberate choice to place this huge chunk of cache on the extremities of the chip to reduce heat spots.
Thanks for citing fancy references. I agree with most points, but I think we are being repetitive.
 
Back
Top