From what I can see of this 3D cache and results it can have a fairly dramatic effect on the 1% low's and at lower resolutions especially evident. How that all translates with infinity cache and with the GPU upscale should be neat as well. In fact I think GPU upscale is only due to get better in future GPU generations so this 3D stacked cache should help even further in the next generation of GPU's. Beyond the upscale for GPU tech is variable rate shading and/or mesh shading that can bring down some of the peak on demand bandwidth within scenes too that will help with this overall cache design because smaller chunks of data that can fit within a cache and not be accessed by slower system memory is much more desirable for overall performance. Individual frames up to 96MB or a touch below it will be able to fit within the cache as well while on another CPU with smaller L3 cache that wouldn't be possible and that's a big gain to overall latency across many frames. This chip could open up a lot of improvements to post process techniques that otherwise might be more taxing on the CPU side.
Something else to mention is NTFS compression. I stumbled upon this review the other day at Igor's Lab that had some ATTO disk benchmark results on a NVME device on a 5950X CPU.
NVME SSD benchmark with 5950X CPU.
It was a NVME review, but I don't see ATTO Disk Benchmark used too much in general and noticed a 5950X got utilized. The way that ties in with results is right in line with what I'd suspected, but hadn't seen anything to really verify much on a more capable system with a better L3 cache. If you look at the results they top off at the 64MB mark which is exactly the size of the 5950X's L3 cache. From the results it appears Igor didn't utilize NTFS compression which I believe is the right call for a NVME benchmark test so as to not skew results. If you were to compress it with NTFS compression and windows highest NTFS compression unit allocation size the read performance would improve dramatically though right up to a 64MB I/O size and file size beyond it would drop off dramatically as it then fetches from slower system memory.
In essence the L3 cache serves as a bit of a dynamic ram disk at or below the L3 cache size and file sizes. I guess in the case of Primo Cache for block level cache it would do similarly with the block level chunk sizes and probably a bigger deal in regard to older slower mechanical drives. Still a 96MB chunk size in the case of a 5800X3D for a mechanical drive is great and alleviates there biggest drawback heavily or similarly for a 64MB chunk size with 5950X.
View attachment 243363
How it translates to games is interesting anything 96MB size or below compressed or uncompressed will be very quick at low latency. The larger file sizes will enable bigger files quicker access directly by the L3 cache and bypassing the additional latency of slower system memory. The CPU L3 fit larger image up in the L3 cache at or below below 96MB compressed or compressed w/o having to even touch system memory. It also allows for larger data for use with mesh shading/variable rate shading and upscale and general game data related file sizes including audio at or below 96MB w/o having to access slower latency system memory. Just imagine how those 768MB L3 cache EPYC are in certain scenario's. Things are going to get really interesting in the coming years as more L3 cache is made available and at more consumer friendly price levels.
I made a post on that prospects of what AMD could do with it's take on big LITTLE about a week or two ago. What AMD could do is possibly is utilize OS processor scheduling assignment and assign foreground/background to individual chiplets in the same manner. They could have your highly parallel chiplet and another chiplet that's got few cores, but much of the remaining die area space for a bit larger L2 cache and 3D stacked L3 cache. Both of those caches could have TSV to connect and share them with the parallel higher core count chiplet as well. It bit be a bit bifurcation segmented assignment between two chiplet's in a 25%/75% split and irreversible in terms of which gets the larger swath of L3 cache as well perhaps as a or neutral balanced 50%/50% split. AMD would probably want to work in tandem with Microsoft a little on how that can be done and operate, but seems like it would work nicely. The foreground/background CPU's might also have a +1to +2 / -1 to -2 to the boost multiplier depending on foreground/background while neutral perhaps doesn't adjust it.
If they wanted two BCLK's might even be possible for assigning a separate one to each chiplet for efficiency reasons and/or silicone lottery and let the BIOS set each chiplet up with it's own. The BIOS could sync them or make them both dynamic for each chiplet. That could actually even allow you mix different ram speed kits together using the faster ram kit for the foreground chiplet. It would work equally well for performance and efficiency.
What I see interesting with the 5800X3D result is the low 1% percentile results. How this chip performs at 720P is indicative of where things are headed more and more with GPU technology as a whole. It'll tie in nicely with infinity cache as well and with NTFS compression and GPU upscale from 720p to higher resolution points. It'll obviously help in turn for 1080p and upscale a well, but will be more pronounced at lower resolutions in particular for now at least. Give it some time however and with better GPU compression they might eek a touch more out of it. I definitely anticipate even better upscale in coming years and this cache will be able to readily make good use of it. How good we'll be able to upscale 720p upward in the next GPU architecture is something to look forward to. I look forward to seeing different example cases of where the cache makes a difference. I wonder how if it's something that would impact raid scaling performance tapering off or not.