Wednesday, May 8th 2024

Core Configurations of Intel Core Ultra 200 "Arrow Lake-S" Desktop Processors Surface

Intel is giving its next-generation desktop processor lineup the Core Ultra 200 series processor model numbering. We detailed the processor numbering in our older report. The Core Ultra 200 series would be the company's first desktop processors with AI capabilities thanks to an integrated 50 TOPS-class NPU. At the heart of these processors is the "Arrow Lake" microarchitecture. Its development is the reason the company had to refresh "Raptor Lake" to cover its 2023-24 processor lineup. The company's "Meteor Lake" microarchitecture topped off at CPU core counts of 6P+8E, which would have proven to be a generational regression in multithreaded application performance over "Raptor Lake." The new "Arrow Lake-S" desktop processor has a maximum CPU core configuration of 8P+16E, which means consumers can expect at least the same core-counts at given price-points to carry over.

According to a report by Chinese tech publication Benchlife.info, the introduction of "Arrow Lake" would see Intel's desktop processor model numbering align with that of its mobile processor numbering, and incorporate the Core Ultra brand to denote the latest microarchitecture for a given processor generation. Since "Arrow Lake" is a generation ahead of "Meteor Lake," processor models in the series get numbered under Core Ultra 200 series.
Intel will likely debut the lineup with overclocker-friendly K and KF SKUs. The lineup is led by the Core Ultra 9 285K (and possibly the 285KF), which comes with an 8P+16E core configuration, a processor base power value of 125 W, and a maximum P-core boost frequency of 5.50 GHz. This is followed by the Core Ultra 7 265K (and 265KF), with an 8P+12E core configuration; and the Core Ultra 5 245K, with a 6P+8E core-configuration.

There are also some 65 W non-K models in the middle, although these don't have similar processor model numbers to the K/KF parts. There's the Core Ultra 9 275 (8P+16E, 65 W); the Core Ultra 7 255 (8P+12E, 65 W); and the Core Ultra 5 240 (6P+4E, 65 W).

"Arrow Lake" is a chiplet-based processor, just like "Meteor Lake." Its compute tile, the piece of silicon with the CPU cores, packs up to 8 "Lion Cove" performance cores (P-cores), and up to 16 "Skymont" efficiency cores (E-cores). The processor is also expected to feature a 50 TOPS-class NPU for on-device AI acceleration, and a truncated version of the Xe-LPG iGPU the company is using with "Meteor Lake," which could be branded differently from the Arc Graphics branding Intel is using on the Core Ultra 100 series mobile chips. "Arrow Lake" is also expected to debut a new CPU socket on the desktop platform, the LGA1851, with more I/O capabilities than the LGA1700 and "Raptor Lake."
Sources: BenchLife, VideoCardz
Add your own comment

101 Comments on Core Configurations of Intel Core Ultra 200 "Arrow Lake-S" Desktop Processors Surface

#76
efikkan
As always, I'm a bit skeptical about leaked clock speeds for a simple reason; they never know the final clock speeds until they got the final stepping in a significant volume, which leads to the following logical deductions;
a) The CPUs are ready for "imminent" release (within the next couple of months)
or
b) This is yet another fake leak
Please keep this in mind.
Zyll GoliatHmmm....So let me get this straight no more Hyper Threading U9 will have "only" 8 performance cores but 16 efficient cores there are rumors around the net that we could expect better improvements in IPC from 5% to 15% with P-cores 'tho some people claim it will be much better improvements when it comes to the E-cores then again U9 285 have in total 24 Threads compared to the I9 14900k that have 32 Threads....hmmm is it going to be better in multithreads apps at all???
You are raising excellent questions, which no one can answer until we get a deep-dive into actual finalized products, but I can still point out a few important aspects most people miss;
Firstly, given Arrow Lake is presumably a very different microarchitecture, we don't know its performance characteristics at all, and even if we got confirmed IPC figures, base clocks and boost clocks, amount of cache etc., it only gives us an idea of the overall performance, but still very little whether this is an all-round excellent performer, or only excels in computationally heavy (but logically simple) SIMD, or very good at mixed loads but not at heavy SIMD. It may very well end up like a stellar performer in synthetic or very specific benchmarks, and just being a modest upgrade in real world tasks, only time will tell. When it comes to E-cores, those are already mostly a gimmick. They serve two purposes; make the specs look nice, like having >5 GHz 20 cores at 65W (the big PC vendors loves this), and to make certain benchmarks like Cinebench look good (which have little or no relevance for end-users).

We also need to keep in mind when they do (presumably) larger architectural overhauls there might actually be areas with significant downsides too, especially with the "first iteration", so be mentally prepared for that, and don't completely dismiss large advancements in some areas if there are some regressions too. Additionally, despite IPC and rated clock speeds, the microarchitecture and the node ultimately decides which performance will be achieved in specific workloads. Contrary to popular belief, IPC is actually an average amount of instructions, not performance at all. Plus, the node and the microarchitecture might allow the CPU to run a specific workload at a higher than expected sustained real clock speed than a competitor with similar or even higher "IPC". This was the case back with Zen 2 vs. Coffee Lake/Comet Lake, where in many multithreaded workloads Zen 2 achieved much higher actual clock speeds, while the Skylake-family throttled heavily despite higher IPC, resulting in lower performance for Intel. And IPC estimates based on rated clock speeds is useless, as rated clock speeds on current CPUs is mostly a gimmick anyways. This is why I always say; performance is what ultimately matters, how it's achieved is just details for those interested. ;)
AMDK11Two different diagrams of the LionCove core from LunarLake graphics:
Lots of good info there. Just keep in mind that any graphics used in promotional material prior to release may very well be based on approximations, not the final design. ;)
Posted on Reply
#77
AMDK11
Previously, the Meteorlake graphic was very loose on RedwoodCove's core and cache structures, which is clearly visible in the graphic.

The Lunarlake graphic represents LionCove's diagrams very accurately. Why do I think it's very accurate? Because the main LionCove project has been completed for some time and will also be implemented in ArrowLake.

I dare say that the same diagrams come from the preparation for the presentation of the LionCove microarchitecture.
Posted on Reply
#78
Dr. Dro
AssimilatorSorry for doing this but MCM literally means "multi-chiplet module", so you just wrote "multi-chiplet multi-chiplet module" :p
LOL but actually it's multi-chip module :laugh:
Posted on Reply
#79
persondb
mkppoThis is your quote: "In the age of massive core counts HT/SMT is not needed. Without it you can clock higher, use less voltage, and design more secure processors."

First of all, 'HT/SMT is not needed' is incorrect, regardless of core counts. It depends on the arch/workload.
Secondly, SMT will not automatically lower clocks/need higher voltages. Disabling SMT in a CPU that is designed with SMT in mind might allow higher clocks/lower voltages, but you lose performance and CPU utilization which in turn might allow those clocks. But if you design an arch without SMT, there are too many variables in play to determine whether clocks will actually increase or decrease. So no, not having SMT will not automatically increase clocks.

Security part is true, SMT does require added security measures. I guess given intel's track record it's probably a good thing they're not going to have HT.

Here's a link to one of his articles on SMT: www.anandtech.com/show/16261/investigating-performance-of-multithreading-on-zen-3-and-amd-ryzen-5000/5
The issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.
but you lose performance and CPU utilization which in turn might allow those clocks
You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.
Posted on Reply
#80
Assimilator
persondbThe issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.


You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.
When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
Posted on Reply
#81
FoulOnWhite
AssimilatorWhen SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
AMD will stick with X3D, it's the only bow to their fiddle.

At least Intel is trying something new, moving from monolithic, and no more HT. I think Arrow Lake will surprise us all.
Posted on Reply
#82
dirtyferret
to paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?
Posted on Reply
#83
Dr. Dro
dirtyferretto paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?
I'm fairly sure we are not talking about the same comedian, but it's a busy couch :laugh:
AssimilatorWhen SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
I agree, but the Skylake++++ thing ended with Comet. Rocket was a small regression because it was the first processor of the "Cove" era, and as the first-generation P-core design backported to 14 nm, it just didn't hold up to the established Skylake core back then. The "Cove" era will still live on in Core Ultra's first few generations, I reckon.
Posted on Reply
#84
Noyand
efikkanWhen it comes to E-cores, those are already mostly a gimmick. They serve two purposes; make the specs look nice, like having >5 GHz 20 cores at 65W (the big PC vendors loves this), and to make certain benchmarks like Cinebench look good (which have little or no relevance for end-users).
You might as well say that high core count CPU are a gimmick too. A core I9 with 16 P-cores wouldn't have been as fast as many people seems to imagine...and Puget did those test with PL1 = 125w PL2= 253w 56sec. The 16 core Xeon was chugging 240w all the time. E-cores were really the only way for Intel to make a CPU that would be great as ST task and MT task at the same time. The latest and greatest XEON are kind of shit if you don't absolutely require a lot of memory.

Posted on Reply
#85
persondb
AssimilatorWhen SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
The first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
Posted on Reply
#86
AnotherReader
NoyandYou might as well say that high core count CPU are a gimmick too. A core I9 with 16 P-cores wouldn't have been as fast as many people seems to imagine...and Puget did those test with PL1 = 125w PL2= 253w 56sec. The 16 core Xeon was chugging 240w all the time. E-cores were really the only way for Intel to make a CPU that would be great as ST task and MT task at the same time. The latest and greatest XEON are kind of shit if you don't absolutely require a lot of memory.

To be fair, they should be comparing the 12900k to the Xeon as the Xeon uses Golden Cove cores.
persondbThe first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
We aren't privy to their reasons, but SMT is lightweight from a die area perspective. However, keep in mind that a lot of Xeons are now sold to the cloud vendors, and their business model is built upon sharing resources. SMT is rather susceptible to side channel attacks so that would have been a major consideration. Validation time would have been another concern.
Posted on Reply
#87
dgianstefani
TPU Proofreader
persondbThe first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
Yeah it really astounds me that some people/armchair critics seem to think they know better than Intel researchers and engineers, who have delivered some incredible advancements over the years, and are some of the smartest, most innovative people in the industry.

These guys look to be the first to bring backside power delivery/powerVIA and gate all around/ribbonFET transistors to mass market. This is not an insignificant achievement. Intel were also the first to bring hybrid architecture to consumer mainstream x86 PCs with the SoC lakefield, which incorporated foveros, two types of cores, IO and DRAM on a single chip in 2020 before M1.

No, they aren't choosing to end HT from a complete lack of understanding of how these things work. Please, get real.
Posted on Reply
#88
persondb
AnotherReaderWe aren't privy to their reasons, but SMT is lightweight from a die area perspective
Well, can you source that?

A lot of details are hard and requires adjustements. I would suggest to take a look at the SMT section of this article:

Loongson 3A6000: A Star among Chinese CPUs – Chips and Cheese

It might not seem much, but it can affect the schedulers and everything which becomes significantly more complex as they now have to check and schedule from two separated threads.

Obviously, when you consider L2, that is going to be as big if not bigger than the core. But the impact might be similar or somewhat greater than the PS5 FPU nerf. If they can then use that transistor budget for other things, you could see some benefit.

The Nerfed FPU in PS5’s Zen 2 Cores – Chips and Cheese
Posted on Reply
#89
AnotherReader
persondbWell, can you source that?

A lot of details are hard and requires adjustements. I would suggest to take a look at the SMT section of this article:

Loongson 3A6000: A Star among Chinese CPUs – Chips and Cheese

It might not seem much, but it can affect the schedulers and everything which becomes significantly more complex as they now have to check and schedule from two separated threads.

Obviously, when you consider L2, that is going to be as big if not bigger than the core. But the impact might be similar or somewhat greater than the PS5 FPU nerf. If they can then use that transistor budget for other things, you could see some benefit.

The Nerfed FPU in PS5’s Zen 2 Cores – Chips and Cheese
The most recent figure comes from Marvell. The corresponding figure for Intel's much larger CPUs should be less as they spend far more area on wide vector execution than the ThunderX3.

Posted on Reply
#90
mkppo
persondbThe issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.


You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.
You actually agreed to my points. I never said that intel are wrong to not have SMT in their future gens, that wasn't my point at all. I think from the link I provided you thought I was trying to refute that intel are incorrect to disable SMT in future gens? No no..I was responding to our resident TPU staff and his blanket statement of 'HT is not needed for modern CPU's with lots of cores' and 'CPU's can clock higher with lower volts without SMT', both of which are factually incorrect. Clocks can be higher or lower depending on the design of the arch, and having a high number of cores doesn't have anything to do with SMT but it's rather core utilization. That entirely depends on how they design the arch, and not the blanket statement I was refuting which I quoted earlier.
dgianstefaniYeah it really astounds me that some people/armchair critics seem to think they know better than Intel researchers and engineers, who have delivered some incredible advancements over the years, and are some of the smartest, most innovative people in the industry.

These guys look to be the first to bring backside power delivery/powerVIA and gate all around/ribbonFET transistors to mass market. This is not an insignificant achievement. Intel were also the first to bring hybrid architecture to consumer mainstream x86 PCs with the SoC lakefield, which incorporated foveros, two types of cores, IO and DRAM on a single chip in 2020 before M1.

No, they aren't choosing to end HT from a complete lack of understanding of how these things work. Please, get real.
You were factually incorrect and I mentioned as to why you were. Rather than a reply or discussion, you resort to indirectly calling others armchair critics. Have a discussion, there's nothing wrong with being incorrect and admitting to it. And I wasn't even saying they choose to end HT from a lack of understanding, so i'm not even sure who you are replying to in the last sentence.
Posted on Reply
#91
dgianstefani
TPU Proofreader
mkppoYou were factually incorrect and I mentioned as to why you were. Rather than a reply or discussion, you resort to indirectly calling others armchair critics. Have a discussion, there's nothing wrong with being incorrect and admitting to it. And I wasn't even saying they choose to end HT from a lack of understanding, so i'm not even sure who you are replying to in the last sentence.
mkppoThis is your quote: "In the age of massive core counts HT/SMT is not needed. Without it you can clock higher, use less voltage, and design more secure processors."

First of all, 'HT/SMT is not needed' is incorrect, regardless of core counts. It depends on the arch/workload.
Secondly, SMT will not automatically lower clocks/need higher voltages. Disabling SMT in a CPU that is designed with SMT in mind might allow higher clocks/lower voltages, but you lose performance and CPU utilization which in turn might allow those clocks. But if you design an arch without SMT, there are too many variables in play to determine whether clocks will actually increase or decrease. So no, not having SMT will not automatically increase clocks.

Security part is true, SMT does require added security measures. I guess given intel's track record it's probably a good thing they're not going to have HT.

Here's a link to one of his articles on SMT: www.anandtech.com/show/16261/investigating-performance-of-multithreading-on-zen-3-and-amd-ryzen-5000/5


I mean, the disadvantage of having cores with different ISA's are on an entirely different level compared to having two different CCD's with the same cores. Even having Zen4c's on a different CCD is better than intel's approach for pretty much any server workload and sometimes causes issues on the client side as well.

"all the same" isn't really the same.
Firstly is a "it depends"
Secondly is context, you're implying when tuning and turning off HT, voltage/frequency imrprovements are "not automatic", right, this does not make my original statement "factually incorrect".

This is a different situation to having out of the box HT disabled/not architecturally designed in, therefore voltage/frequency improvements would be "baked in" to the microcode.

Your statement "but you lose performance" is also, how do you put it "factually incorrect" in the same way my original statement was. It depends. In many processes and games, disabling HT even with no other changes/tunes made, will improve performance on something like a 13900K.

Finally, you are assuming it's you I'm referring to when I say "people/armchair critics seem to think they know better than Intel researchers and engineers", this is a projection on your part. I was not even thinking of you when I wrote this.

The bottom line is HT benefits software when additional MT performance is needed, but has drawbacks when that MT performance is not needed and there are enough cores/threads even without HT. How often do you think that is the case in a 24 core CPU?

If you want to test the "Without it you can clock higher, use less voltage" assertion.

Get a Raptor Lake CPU, set a static frequency. Now tune the voltage until you're unstable. Note that voltage.

Now turn off HT.

Tune the voltage again.

Note the voltage.

You can do the same thing for clocks etc.

If Intel delivers a CPU without HT that performs better in applications and games that the previous generation, and is more secure. It's a win.
Posted on Reply
#92
mkppo
dgianstefaniFirstly is a "it depends"
Secondly is context, you're implying when tuning and turning off HT, voltage/frequency imrprovements are "not automatic", right, this does not make my original statement "factually incorrect".

This is a different situation to having out of the box HT disabled/not architecturally designed in, therefore voltage/frequency improvements would be "baked in" to the microcode.

Your statement "but you lose performance" is also, how do you put it "factually incorrect" in the same way my original statement was. It depends. In many processes and games, disabling HT even with no other changes/tunes made, will improve performance on something like a 13900K.

Finally, you are assuming it's you I'm referring to when I say "people/armchair critics seem to think they know better than Intel researchers and engineers", this is a projection on your part. I was not even thinking of you when I wrote this.

The bottom line is HT benefits software when additional MT performance is needed, but has drawbacks when that MT performance is not needed and there are enough cores/threads even without HT. How often do you think that is the case in a 24 core CPU?
Your first statement that was incorrect was that SMT is not needed for modern high core count CPU's. That's incorrect, because it depends if an architecture is designed with SMT in mind. I've explained it before so I won't go into further details.

Secondly, you said without SMT you can design CPU's that can clock higher, use less volts. That's not true as it again, depends on the arch.

When I said lose performance, that's incorrect and it's not what I was trying to say. If you read my post, I was trying to imply that maybe an architecture designed with SMT in mind might lose performance which in turn will allow those clocks. Or it might be the transistors that are idle now and depending on the grey silicon can help with thermals/hotspots. It might be reduced utilization, or CPU cores not fighting for cache. It can be a multitude of things. But it doesn't change the fact that you said having no SMT will lead to higher clocks/lower volts which is still incorrect because, again, it depends on how a CPU is designed and may or may not lead to higher clocks.

Regarding your last sentence alluding to HT only increasing multithreaded performance and when you already have 24 cores you don't need more of it, that's not really true is it? If the architecture is designed with SMT in mind, then regardless of the fact that consumers don't need more than 24 cores it will be on by default because it'll lead to better numbers per core because they need to extract TLP as the core can handle two concurrent instruction streams and still not be starved of resources (relatively). But maybe intel saw that they don't need SMT anymore because they can design an arch that will go around the reasons as to why SMT is needed in the first place and that's fine.

edit: I see that you're still trying to argue with the clock higher part in an edited post of yours. Let me try to make it easy for you - you said without SMT you can design a core that uses less volts and have higher clocks. I said that's not correct because it entirely depends on how the architecture is designed. Having an arch that is perfect and extracts the maximum from a thread will not require SMT, but it doesn't mean it'll also clock higher in the process. You're now trying to say disabling SMT leads to higher clocks, which is a different thing entirely and I said as much - there are a number of reasons why that might be the case
Posted on Reply
#93
dgianstefani
TPU Proofreader
mkppoYour first statement that was incorrect was that SMT is not needed for modern high core count CPU's. That's incorrect, because it depends if an architecture is designed with SMT in mind. I've explained it before so I won't go into further details.

Secondly, you said without SMT you can design CPU's that can clock higher, use less volts.

When I said lose performance, that's incorrect and it's not what I was trying to say. If you read my post, I was trying to imply that maybe an architecture designed with SMT in mind might lose performance which in turn will allow those clocks. Or it might be the transistors that are idle now and depending on the grey silicon can help with thermals/hotspots. It might be reduced utilization, or CPU cores not fighting for cache. It can be a multitude of things. But it doesn't change the fact that you said having no SMT will lead to higher clocks/lower volts which is still incorrect because, again, it depends on how a CPU is designed and may or may not lead to higher clocks.

Regarding your last sentence alluding to HT only increasing multithreaded performance and when you already have 24 cores you don't need more of it, that's not really true is it? If the architecture is designed with SMT in mind, then regardless of the fact that consumers don't need more than 24 cores it will be on by default because it'll lead to better numbers per core because they need to extract TLP as the core can handle two concurrent instruction streams. But maybe intel saw that they don't need SMT anymore because they can design an arch that will go around the reasons as to why SMT is needed in the first place and that's fine.

edit: I see that you're still trying to argue with the clock higher part in an edited post of yours. Let me try to make it easy for you - you said without SMT you can design a core that uses less volts and have higher clocks. I said that's not correct because it entirely depends on how the architecture is designed. Having an arch that is perfect and extracts the maximum from a thread will not require SMT, but it doesn't mean it'll also clock higher in the process. You're now trying to say disabling SMT leads to higher clocks, which is a different thing entirely and I said as much - there are a number of reasons why that might be the case
I'm not interested in arguing hypotheticals with you.

Have a nice day.
Posted on Reply
#94
mkppo
dgianstefaniI'm not interested in arguing hypotheticals with you.

Have a nice day.
The hypotheticals exist because you were incorrect. If you were correct, there would be no discussing hypotheticals, or any of the 'it depends' which refute your initial claim.
Posted on Reply
#95
dgianstefani
TPU Proofreader
mkppoThe hypotheticals exist because you were incorrect. If you were correct, there would be no discussing hypotheticals, or any of the 'it depends' which refute your initial claim.
Whatever you say buddy.
Posted on Reply
#96
R0H1T
AssimilatorSorry for doing this but MCM literally means "multi-chiplet module"
It's Multi chip module, chiplet is something we've seen only post Zen.
Posted on Reply
#97
Six_Times
I've been waiting so long for this socket series my hair has turned gray.
Posted on Reply
#98
stimpy88
dirtyferretto paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?
When they can put enough of them on to the die, they will delete the P cores.
Posted on Reply
#99
efikkan
AssimilatorWhen SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode.
Correct, and I may add that the complexity of implementing SMT in the pipeline has grown greatly with ever more superscalar CPU designs. Not to mention the biggest problem; all the security issues, which requires lots of constraints for the designers to avoid. Thirdly, there is also the fact that modern CPUs have much more capable front-ends, which are better and better at keeping the execution units saturated. This was originally one of the core motivations of SMT, but going forward the potential gain here is going to shrink relatively speaking.
AssimilatorWhile Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more.
If you're talking of architectural engineering decisions, then I disagree. Their designs have generally been held back 2-3 years due to production issues, which probably still have some lasting delays. When it comes to their production however, there has been lots of bad decisions…

As to a "clean room" design, I doubt any of big CPU designers will start that much from scratch, but they do however have to make the big design decisions in the very beginning of the design process, like how threading will work, how cores are interacting etc., as all other design decisions are resulting from that, although they probably don't have the resources to redesign and finetune every tiny part of the CPU design in the first try. So deciding to ditch SMT certainly was done early on, but I would expect them to need a few "attempts" to fully break free from all the design constraints and unleash new levels of IPC. :)

Looking forward, there will be a lot of advancements in superscalar execution. I know Intel are looking into strategies to lessen the impact of branch mispredictions and avoid pipeline stalls and flushes. I believe some of this was supposed to show up in Meteor Lake, but I haven't studied whether it is and the success of it. But over the next generations, we should expect there to be significant gains.
Dr. DroRocket was a small regression because it was the first processor of the "Cove" era, and as the first-generation P-core design backported to 14 nm, it just didn't hold up to the established Skylake core back then…
Just for the sake of being correct, Rocket Lake wasn't a regression in terms of overall performance, it offered ~19% IPC gains and similar clocks, but sacrificed 2 cores vs. Comet Lake, which leads to people thinking it was inferior. Rocket Lake which was a "backport" of Ice Lake to 14nm was greatly held back by this "inferior" node. The whole family is called "Sunny Cove", with Ice Lake being released in 2019 (server only, very limited availability), followed by Tiger Lake which was a small architectural improvement. Rocket Lake surprisingly seems to be a derivative of Ice Lake-S(never finalized) rather than Tiger Lake, I assume because Tiger Lake never was designed for this purpose and it was much quicker to backport Ice Lake-S instead.
Posted on Reply
#100
Dr. Dro
efikkanCorrect, and I may add that the complexity of implementing SMT in the pipeline has grown greatly with ever more superscalar CPU designs. Not to mention the biggest problem; all the security issues, which requires lots of constraints for the designers to avoid. Thirdly, there is also the fact that modern CPUs have much more capable front-ends, which are better and better at keeping the execution units saturated. This was originally one of the core motivations of SMT, but going forward the potential gain here is going to shrink relatively speaking.


If you're talking of architectural engineering decisions, then I disagree. Their designs have generally been held back 2-3 years due to production issues, which probably still have some lasting delays. When it comes to their production however, there has been lots of bad decisions…

As to a "clean room" design, I doubt any of big CPU designers will start that much from scratch, but they do however have to make the big design decisions in the very beginning of the design process, like how threading will work, how cores are interacting etc., as all other design decisions are resulting from that, although they probably don't have the resources to redesign and finetune every tiny part of the CPU design in the first try. So deciding to ditch SMT certainly was done early on, but I would expect them to need a few "attempts" to fully break free from all the design constraints and unleash new levels of IPC. :)

Looking forward, there will be a lot of advancements in superscalar execution. I know Intel are looking into strategies to lessen the impact of branch mispredictions and avoid pipeline stalls and flushes. I believe some of this was supposed to show up in Meteor Lake, but I haven't studied whether it is and the success of it. But over the next generations, we should expect there to be significant gains.


Just for the sake of being correct, Rocket Lake wasn't a regression in terms of overall performance, it offered ~19% IPC gains and similar clocks, but sacrificed 2 cores vs. Comet Lake, which leads to people thinking it was inferior. Rocket Lake which was a "backport" of Ice Lake to 14nm was greatly held back by this "inferior" node. The whole family is called "Sunny Cove", with Ice Lake being released in 2019 (server only, very limited availability), followed by Tiger Lake which was a small architectural improvement. Rocket Lake surprisingly seems to be a derivative of Ice Lake-S(never finalized) rather than Tiger Lake, I assume because Tiger Lake never was designed for this purpose and it was much quicker to backport Ice Lake-S instead.
Apologies I should have been more specific, I'm referring to gaming performance. Most games still favor the i9-10900K over the 11900K.
Posted on Reply
Add your own comment
Jun 2nd, 2024 23:34 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts