Thursday, May 15th 2025

AMD "Zen 7" Rumors: Three Core Classes, 2 MB L2, 7 MB V‑Cache, and TSMC A14 Node

AMD is already looking ahead to its Zen 7 generation and is planning the final details for its next generation of Zen IP. The first hints come from YouTuber "Moore's Law Is Dead," which points to a few interesting decisions. AMD plans to extend its multi‑class core strategy that began with Zen 4c and continued into Zen 5. Zen 7 will reportedly include three types of cores: the familiar performance cores, dense cores built for maximum throughput, and a new low‑power variant aimed at energy‑efficient tasks, just like Intel and its LP/E-Cores. There is even an unspecified "PT" and "3D" core. By swapping out pipeline modules and tweaking their internal libraries, AMD can fine‑tune each core so it performs best in its intended role, from running virtual machines in the cloud to handling AI workloads at the network edge.

On the manufacturing front, Zen 7 compute chiplets (CCDs) are expected to be made on TSMC's A14 process, which will now include a backside power delivery network. This was initially slated for the N2 node but got shifted to the A16/A14 line. The 3D V‑Cache SRAM chiplets underneath the CCDs will remain on TSMC's N4 node. It is a conservative choice, since TSMC has talked up using N2‑based chiplets for stacked memory in advanced packaging, but AMD appears to be playing it safe. Cache sizes should grow, too. Each core will get 2 MB of L2 cache instead of the current 1 MB, and L3 cache per core could expand to 7 MB through stacked V‑Cache slices. Standard CCDs without V‑Cache will still have around 32 MB of shared L3. A bold rumor suggests an EPYC model could feature 33 cores per CCD, totaling 264 cores across eight CCDs. Zen 7 tape‑out is planned for late 2026 or early 2027, and we probably won't see products on shelves until 2028 or later. As always with early-stage plans, take these details with a healthy dose of skepticism. The final Zen 7 lineup could look quite different once AMD locks down its roadmap.
Sources: Moore's Law Is Dead, via HardwareLuxx
Add your own comment

113 Comments on AMD "Zen 7" Rumors: Three Core Classes, 2 MB L2, 7 MB V‑Cache, and TSMC A14 Node

#101
Panther_Seraphin
Nhonho*SNIP*
This is just pretty much just Strix point with the TDP increased to desktop levels.

I cannot see them realistically moving away from the current chiplet designs due to the ease of scalability and benefits of being able to grade dies between consumer and enterprise with ease.

Currently AMD needs to bulk order 3 things to cover the majority of their product stack, All of enterprise and HEDT are under 1 IO die, All of Consumer Desktop is under another IO die while sharing a common CCD design. Now you can add Zenxc CCDs to cater for specific enterprise designs but they are willing to pay top money for those parts so they are profit makers in comparison. Then there is mobile/specialist falling the majority under strix point in this generation

Adding another CCD design removes all the benefits of scale and the new CCD design would be considerably larger leading to both higher cost and also a higher defect rate (increasing costs further)

For context a Zen 5 CCD is ~71mm2, Strix point is ~178mm2 and Strix Halo is massive in comparison at 307mm2.




I thought about reworking the current CCD design to cut out some non essential stuff but unless you are willing to remove ALL GPU context from the IO die and go back to the days where your CPU was purely a CPU and cannot give you a screen out for diagnostics/extra screen output etc there isnt really much to remove that realistically as all "non essential" aspects are either required by the standards expected to be used (Audio DSP etc) or are "expected" by user in normal use case (USB-C monitors/dongles requirings all the USB functionality ontop of the display IO)

IF you did sacrifice the GPU aspect there is a fair amount of IO die space that is usable as more memory controllers or more realistically additional PCI-e lanes and I would at that point argue that AMD could push ALL of the Misc IO into the chipset dies and just grant them either dedicated 4x per chipset die vs the daisy chain they do currently or even widen the connections to 8x and make them capable to support high speed networking (10Gbe) or host multiple NVME drives off them for bulk storage on a grade lower pci-e speed.
Posted on Reply
#102
rattlehead99
NhonhoDesktop applications are poorly optimized for multicore processing or cannot be optimized for more than 1 core or thread. They require cores with very high IPC.

Pile up "E-cores" on a desktop CPU is a dumb idea.

The ideal has always been to develop cores with very high IPC to they perform tasks as quickly as possible and then enter a low power consumption state.
Intel's new E-Cores are on par with the old P-Cores in terms of PPC - performance per clock(what you call IPC), Darkmonth will be on par with Lion Cove(Arrow Lake P-Cores) per clock.
Posted on Reply
#103
Nhonho
rattlehead99Intel's new E-Cores are on par with the old P-Cores in terms of PPC - performance per clock(what you call IPC)
Show us.
Posted on Reply
#104
Panther_Seraphin
NhonhoShow us.
I personally disagree with P/E cores and want as much raw performance as I can get my hands on but intel was in such dire straits they had nothing else they could do.

Currently an Intel Core 3 N355 (latest E core designs) is comparable to somewhere between a 9th to 11th Gen core at similar wattage if you use things like Passmark etc to try and compare them but trying to compare "e cores" only vs other core types is pretty awkward.

So sure no where near as good as 15th Gen P cores but not slow enough to prevent discord/youtube/malware scan in the background from running while the P cores do their thing on the really intense app in use. Problem was the way Intel utilised them and the ensuing bad rap they got from the intial deployments of them.
Posted on Reply
#105
rattlehead99
Panther_SeraphinI personally disagree with P/E cores and want as much raw performance as I can get my hands on but intel was in such dire straits they had nothing else they could do.

Currently an Intel Core 3 N355 (latest E core designs) is comparable to somewhere between a 9th to 11th Gen core at similar wattage if you use things like Passmark etc to try and compare them but trying to compare "e cores" only vs other core types is pretty awkward.

So sure no where near as good as 15th Gen P cores but not slow enough to prevent discord/youtube/malware scan in the background from running while the P cores do their thing on the really intense app in use. Problem was the way Intel utilised them and the ensuing bad rap they got from the intial deployments of them.
Doesn't the Core 3 N355 have Crestmont cores and NOT Skymont?
Posted on Reply
#107
OkieDan
I'd be fine with a 12 core x3d ccd and ~24 little cores to make it simple for windows scheduler. It should be simpler than an x950X3D where the non X3D CCD is higher frequency than the X3D CCD anyhow.
Posted on Reply
#108
outlw6669
OkieDanI'd be fine with a 12 core x3d ccd and ~24 little cores to make it simple for windows scheduler. It should be simpler than an x950X3D where the non X3D CCD is higher frequency than the X3D CCD anyhow.
I have been thinking something similar as well.

With 2x CCDs, one of them should be full fat cores (with 3D Cache, preferably), and the other should be fully compact cores.
No scheduling nonsense necessary, as the cores would all have the same instruction set support, and CPPC would be reporting which cores are faster.

Now, if Windows scheduler could handle it...
Having a few ultra low powered cores in the IO Die, exclusively for OS background tasks, could really make sense.
Posted on Reply
#111
rattlehead99
sLowEndThat article is wrong. Twin Lake is just Alder Lake-N with a clock speed bump, so still Gracemont
Damn, I want E-Core only Skymont CPU. 8 of them.
Posted on Reply
#112
efikkan
NhonhoDesktop applications are poorly optimized for multicore processing or cannot be optimized for more than 1 core or thread. They require cores with very high IPC.

The ideal has always been to develop cores with very high IPC to they perform tasks as quickly as possible and then enter a low power consumption state.
This is a common misunderstanding. Programs aren't either "single threaded" or scales almost "infinitely" with more cores; when we talk about "single threaded performance" you should think of it as peak performance per core. Pretty much all programs today uses multiple threads, but the load across them varies.

Thanks to the cost of synchronization, it will never be feasible for every little interaction in a program to spawn dozens of worker threads, as that would just create more latency, so it's only when you do a larger batch job that takes several seconds or more you get this nice nearly linear scaling with more cores. And what are the other threads doing? Variouis async helper tasks etc. So having enough cores is important, but having faster cores will always be more important for user-interactive workloads. (Otherwise we'd all just buy old 60 core Xeons on ebay…) And if you dove deep into how multithreaded programming works, you'd also see that having faster cores with more consistent performance is actually a key to increase the multithreaded scaling in an application. :)
rattlehead99Intel's new E-Cores are on par with the old P-Cores in terms of PPC - performance per clock(what you call IPC), Darkmonth will be on par with Lion Cove(Arrow Lake P-Cores) per clock.
No, E-cores are extremely weak, as they share resources, so with very front-end-heavy workloads they are nowhere close.
Please don't use big words like IPC when you don't even know what it means. It has never meant performance per clock, it's instructions per clock. IPC is just one of several factors. Another often overlooked factor is ISA improvements, e.g. AVX and APX, but also smaller additions. These often achieves a new level of performance with fewer instructions, and it's completely nonsensical to estimate "IPC" across workloads with different instructions.
Posted on Reply
#113
igormp
efikkanNo, E-cores are extremely weak, as they share resources, so with very front-end-heavy workloads they are nowhere close.
I kinda disagree a bit on that. The overall instruction dispatch perf is a bit poor since it lacks a µop cache or anything of the sort, along with the short-ish µop queue, but the decoders architecture is pretty impressive and still delivers a really respectable performance even with those limitations taken into account.
Their backend and overall memory subsystem is what makes them "weak".
Posted on Reply
Add your own comment
Jul 29th, 2025 05:20 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts