Thursday, May 15th 2025

AMD "Zen 7" Rumors: Three Core Classes, 2 MB L2, 7 MB V‑Cache, and TSMC A14 Node
AMD is already looking ahead to its Zen 7 generation and is planning the final details for its next generation of Zen IP. The first hints come from YouTuber "Moore's Law Is Dead," which points to a few interesting decisions. AMD plans to extend its multi‑class core strategy that began with Zen 4c and continued into Zen 5. Zen 7 will reportedly include three types of cores: the familiar performance cores, dense cores built for maximum throughput, and a new low‑power variant aimed at energy‑efficient tasks, just like Intel and its LP/E-Cores. There is even an unspecified "PT" and "3D" core. By swapping out pipeline modules and tweaking their internal libraries, AMD can fine‑tune each core so it performs best in its intended role, from running virtual machines in the cloud to handling AI workloads at the network edge.
On the manufacturing front, Zen 7 compute chiplets (CCDs) are expected to be made on TSMC's A14 process, which will now include a backside power delivery network. This was initially slated for the N2 node but got shifted to the A16/A14 line. The 3D V‑Cache SRAM chiplets underneath the CCDs will remain on TSMC's N4 node. It is a conservative choice, since TSMC has talked up using N2‑based chiplets for stacked memory in advanced packaging, but AMD appears to be playing it safe. Cache sizes should grow, too. Each core will get 2 MB of L2 cache instead of the current 1 MB, and L3 cache per core could expand to 7 MB through stacked V‑Cache slices. Standard CCDs without V‑Cache will still have around 32 MB of shared L3. A bold rumor suggests an EPYC model could feature 33 cores per CCD, totaling 264 cores across eight CCDs. Zen 7 tape‑out is planned for late 2026 or early 2027, and we probably won't see products on shelves until 2028 or later. As always with early-stage plans, take these details with a healthy dose of skepticism. The final Zen 7 lineup could look quite different once AMD locks down its roadmap.
Sources:
Moore's Law Is Dead, via HardwareLuxx
On the manufacturing front, Zen 7 compute chiplets (CCDs) are expected to be made on TSMC's A14 process, which will now include a backside power delivery network. This was initially slated for the N2 node but got shifted to the A16/A14 line. The 3D V‑Cache SRAM chiplets underneath the CCDs will remain on TSMC's N4 node. It is a conservative choice, since TSMC has talked up using N2‑based chiplets for stacked memory in advanced packaging, but AMD appears to be playing it safe. Cache sizes should grow, too. Each core will get 2 MB of L2 cache instead of the current 1 MB, and L3 cache per core could expand to 7 MB through stacked V‑Cache slices. Standard CCDs without V‑Cache will still have around 32 MB of shared L3. A bold rumor suggests an EPYC model could feature 33 cores per CCD, totaling 264 cores across eight CCDs. Zen 7 tape‑out is planned for late 2026 or early 2027, and we probably won't see products on shelves until 2028 or later. As always with early-stage plans, take these details with a healthy dose of skepticism. The final Zen 7 lineup could look quite different once AMD locks down its roadmap.
113 Comments on AMD "Zen 7" Rumors: Three Core Classes, 2 MB L2, 7 MB V‑Cache, and TSMC A14 Node
I cannot see them realistically moving away from the current chiplet designs due to the ease of scalability and benefits of being able to grade dies between consumer and enterprise with ease.
Currently AMD needs to bulk order 3 things to cover the majority of their product stack, All of enterprise and HEDT are under 1 IO die, All of Consumer Desktop is under another IO die while sharing a common CCD design. Now you can add Zenxc CCDs to cater for specific enterprise designs but they are willing to pay top money for those parts so they are profit makers in comparison. Then there is mobile/specialist falling the majority under strix point in this generation
Adding another CCD design removes all the benefits of scale and the new CCD design would be considerably larger leading to both higher cost and also a higher defect rate (increasing costs further)
For context a Zen 5 CCD is ~71mm2, Strix point is ~178mm2 and Strix Halo is massive in comparison at 307mm2.
I thought about reworking the current CCD design to cut out some non essential stuff but unless you are willing to remove ALL GPU context from the IO die and go back to the days where your CPU was purely a CPU and cannot give you a screen out for diagnostics/extra screen output etc there isnt really much to remove that realistically as all "non essential" aspects are either required by the standards expected to be used (Audio DSP etc) or are "expected" by user in normal use case (USB-C monitors/dongles requirings all the USB functionality ontop of the display IO)
IF you did sacrifice the GPU aspect there is a fair amount of IO die space that is usable as more memory controllers or more realistically additional PCI-e lanes and I would at that point argue that AMD could push ALL of the Misc IO into the chipset dies and just grant them either dedicated 4x per chipset die vs the daisy chain they do currently or even widen the connections to 8x and make them capable to support high speed networking (10Gbe) or host multiple NVME drives off them for bulk storage on a grade lower pci-e speed.
Currently an Intel Core 3 N355 (latest E core designs) is comparable to somewhere between a 9th to 11th Gen core at similar wattage if you use things like Passmark etc to try and compare them but trying to compare "e cores" only vs other core types is pretty awkward.
So sure no where near as good as 15th Gen P cores but not slow enough to prevent discord/youtube/malware scan in the background from running while the P cores do their thing on the really intense app in use. Problem was the way Intel utilised them and the ensuing bad rap they got from the intial deployments of them.
www.techpowerup.com/330317/intel-nx50-series-twin-lake-pure-e-core-processor-line-powered-by-skymont-surfaces
With 2x CCDs, one of them should be full fat cores (with 3D Cache, preferably), and the other should be fully compact cores.
No scheduling nonsense necessary, as the cores would all have the same instruction set support, and CPPC would be reporting which cores are faster.
Now, if Windows scheduler could handle it...
Having a few ultra low powered cores in the IO Die, exclusively for OS background tasks, could really make sense.
Thanks to the cost of synchronization, it will never be feasible for every little interaction in a program to spawn dozens of worker threads, as that would just create more latency, so it's only when you do a larger batch job that takes several seconds or more you get this nice nearly linear scaling with more cores. And what are the other threads doing? Variouis async helper tasks etc. So having enough cores is important, but having faster cores will always be more important for user-interactive workloads. (Otherwise we'd all just buy old 60 core Xeons on ebay…) And if you dove deep into how multithreaded programming works, you'd also see that having faster cores with more consistent performance is actually a key to increase the multithreaded scaling in an application. :) No, E-cores are extremely weak, as they share resources, so with very front-end-heavy workloads they are nowhere close.
Please don't use big words like IPC when you don't even know what it means. It has never meant performance per clock, it's instructions per clock. IPC is just one of several factors. Another often overlooked factor is ISA improvements, e.g. AVX and APX, but also smaller additions. These often achieves a new level of performance with fewer instructions, and it's completely nonsensical to estimate "IPC" across workloads with different instructions.
Their backend and overall memory subsystem is what makes them "weak".