• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD "Zen 7" Rumors: Three Core Classes, 2 MB L2, 7 MB V‑Cache, and TSMC A14 Node

Vermeer had chiplets and Alder Lake was monolithic

As I have already mentioned here, to achieve higher IPC easily, AMD should make its domestic CPUs with the memory controller (IMC) on the same die as the x86 cores and make another cheaper die for the other simpler components (USB, SATA and other legacy components).

Something like this:
zf4pqGi.png
 

This is just pretty much just Strix point with the TDP increased to desktop levels.

I cannot see them realistically moving away from the current chiplet designs due to the ease of scalability and benefits of being able to grade dies between consumer and enterprise with ease.

Currently AMD needs to bulk order 3 things to cover the majority of their product stack, All of enterprise and HEDT are under 1 IO die, All of Consumer Desktop is under another IO die while sharing a common CCD design. Now you can add Zenxc CCDs to cater for specific enterprise designs but they are willing to pay top money for those parts so they are profit makers in comparison. Then there is mobile/specialist falling the majority under strix point in this generation

Adding another CCD design removes all the benefits of scale and the new CCD design would be considerably larger leading to both higher cost and also a higher defect rate (increasing costs further)

For context a Zen 5 CCD is ~71mm2, Strix point is ~178mm2 and Strix Halo is massive in comparison at 307mm2.




I thought about reworking the current CCD design to cut out some non essential stuff but unless you are willing to remove ALL GPU context from the IO die and go back to the days where your CPU was purely a CPU and cannot give you a screen out for diagnostics/extra screen output etc there isnt really much to remove that realistically as all "non essential" aspects are either required by the standards expected to be used (Audio DSP etc) or are "expected" by user in normal use case (USB-C monitors/dongles requirings all the USB functionality ontop of the display IO)

IF you did sacrifice the GPU aspect there is a fair amount of IO die space that is usable as more memory controllers or more realistically additional PCI-e lanes and I would at that point argue that AMD could push ALL of the Misc IO into the chipset dies and just grant them either dedicated 4x per chipset die vs the daisy chain they do currently or even widen the connections to 8x and make them capable to support high speed networking (10Gbe) or host multiple NVME drives off them for bulk storage on a grade lower pci-e speed.
 
Desktop applications are poorly optimized for multicore processing or cannot be optimized for more than 1 core or thread. They require cores with very high IPC.

Pile up "E-cores" on a desktop CPU is a dumb idea.

The ideal has always been to develop cores with very high IPC to they perform tasks as quickly as possible and then enter a low power consumption state.
Intel's new E-Cores are on par with the old P-Cores in terms of PPC - performance per clock(what you call IPC), Darkmonth will be on par with Lion Cove(Arrow Lake P-Cores) per clock.
 
I personally disagree with P/E cores and want as much raw performance as I can get my hands on but intel was in such dire straits they had nothing else they could do.

Currently an Intel Core 3 N355 (latest E core designs) is comparable to somewhere between a 9th to 11th Gen core at similar wattage if you use things like Passmark etc to try and compare them but trying to compare "e cores" only vs other core types is pretty awkward.

So sure no where near as good as 15th Gen P cores but not slow enough to prevent discord/youtube/malware scan in the background from running while the P cores do their thing on the really intense app in use. Problem was the way Intel utilised them and the ensuing bad rap they got from the intial deployments of them.
 
I personally disagree with P/E cores and want as much raw performance as I can get my hands on but intel was in such dire straits they had nothing else they could do.

Currently an Intel Core 3 N355 (latest E core designs) is comparable to somewhere between a 9th to 11th Gen core at similar wattage if you use things like Passmark etc to try and compare them but trying to compare "e cores" only vs other core types is pretty awkward.

So sure no where near as good as 15th Gen P cores but not slow enough to prevent discord/youtube/malware scan in the background from running while the P cores do their thing on the really intense app in use. Problem was the way Intel utilised them and the ensuing bad rap they got from the intial deployments of them.
Doesn't the Core 3 N355 have Crestmont cores and NOT Skymont?
 
Doesn't the Core 3 N355 have Crestmont cores and NOT Skymont?
From what I have read they are using Skymont in the 3xx

 
I'd be fine with a 12 core x3d ccd and ~24 little cores to make it simple for windows scheduler. It should be simpler than an x950X3D where the non X3D CCD is higher frequency than the X3D CCD anyhow.
 
I'd be fine with a 12 core x3d ccd and ~24 little cores to make it simple for windows scheduler. It should be simpler than an x950X3D where the non X3D CCD is higher frequency than the X3D CCD anyhow.
I have been thinking something similar as well.

With 2x CCDs, one of them should be full fat cores (with 3D Cache, preferably), and the other should be fully compact cores.
No scheduling nonsense necessary, as the cores would all have the same instruction set support, and CPPC would be reporting which cores are faster.

Now, if Windows scheduler could handle it...
Having a few ultra low powered cores in the IO Die, exclusively for OS background tasks, could really make sense.
 
From what I have read they are using Skymont in the 3xx

Would like to see how it performs.
 
From what I have read they are using Skymont in the 3xx

That article is wrong. Twin Lake is just Alder Lake-N with a clock speed bump, so still Gracemont
 
Desktop applications are poorly optimized for multicore processing or cannot be optimized for more than 1 core or thread. They require cores with very high IPC.

The ideal has always been to develop cores with very high IPC to they perform tasks as quickly as possible and then enter a low power consumption state.
This is a common misunderstanding. Programs aren't either "single threaded" or scales almost "infinitely" with more cores; when we talk about "single threaded performance" you should think of it as peak performance per core. Pretty much all programs today uses multiple threads, but the load across them varies.

Thanks to the cost of synchronization, it will never be feasible for every little interaction in a program to spawn dozens of worker threads, as that would just create more latency, so it's only when you do a larger batch job that takes several seconds or more you get this nice nearly linear scaling with more cores. And what are the other threads doing? Variouis async helper tasks etc. So having enough cores is important, but having faster cores will always be more important for user-interactive workloads. (Otherwise we'd all just buy old 60 core Xeons on ebay…) And if you dove deep into how multithreaded programming works, you'd also see that having faster cores with more consistent performance is actually a key to increase the multithreaded scaling in an application. :)

Intel's new E-Cores are on par with the old P-Cores in terms of PPC - performance per clock(what you call IPC), Darkmonth will be on par with Lion Cove(Arrow Lake P-Cores) per clock.
No, E-cores are extremely weak, as they share resources, so with very front-end-heavy workloads they are nowhere close.
Please don't use big words like IPC when you don't even know what it means. It has never meant performance per clock, it's instructions per clock. IPC is just one of several factors. Another often overlooked factor is ISA improvements, e.g. AVX and APX, but also smaller additions. These often achieves a new level of performance with fewer instructions, and it's completely nonsensical to estimate "IPC" across workloads with different instructions.
 
Back
Top