• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Core Ultra 9 285K

It's well known that CCX-to-CCX data transfers incur a latency penalty in multi-chiplet Ryzen processors, hence the reason why it's a good idea to keep a program locked onto the same CCX. But if we are to take what Chips and Cheese say, there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!
 
It's well known that CCX-to-CCX data transfers incur a latency penalty in multi-chiplet Ryzen processors, hence the reason why it's a good idea to keep a program locked onto the same CCX. But if we are to take what Chips and Cheese say, there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!
Yeah it's a very surprising finding; I can't fathom why these transfers would be so slow given that they all are on one ring. Note that some P cores have slightly lower latency to E cores, perhaps because the neighbouring E cores are relatively close in terms of rings hops. Chips and Cheese thinks it's because of the increased levels in the P core cache hierarchy, but that's supposition.

I suspect Lion Cove’s cache design plays a large role too. Cache coherency protocols ensure only one core can have a particular address cached in modified state. Thus the requesting core in this test will miss in all of its private cache levels. When the core with modified data gets a probe, it has to check all of its own private cache levels, both to read out the modified data and ensure the address is invalidated from its own caches. Lion Cove goes from two levels of core-private data caches to three, adding another step in the process.
 
there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!
There is always a latency penalty when transferring a process from one CPU to another regardless of CPU type or model, even from one core to another on the same CPU die. This is especially true for CPU's with segregated L2/L3 caches, which is everything these days. Not so "yikes". Par for the course.
 
There is always a latency penalty when transferring a process from one CPU to another regardless of CPU type or model, even from one core to another on the same CPU die. This is especially true for CPU's with segregated L2/L3 caches, which is everything these days. Not so "yikes". Par for the course.
What I'm saying "Yikes!" about is that the latency penalty on Arrow Lake to transfer from one core to a neighboring core in on the order of the same latency penalty that Ryzen gets when going from one CCX to another. I can understand the latency hit going from one CCX to another, but damn... to have the same latency going from one core to a neighboring core is just... bad. How did Intel screw this up so badly?
 
What I'm saying "Yikes!" about is that the latency penalty on Arrow Lake to transfer from one core to a neighboring core in on the order of the same latency penalty that Ryzen gets when going from one CCX to another. I can understand the latency hit going from one CCX to another, but damn... to have the same latency going from one core to a neighboring core is just... bad. How did Intel screw this up so badly?
What I'm saying is that it's an over-reaction. Those latency times are not a problem.
Optimal? No. Anything close to a serious issue? Also no.
 
What I'm saying is that it's an over-reaction. Those latency times are not a problem.
Optimal? No. Anything close to a serious issue? Also no.
I don't think you understand what I'm trying to say. I understand there's going to be latency involved but it's like Intel is having the latency that AMD incurs when, again, going from one CCX to another on the same damn die! How is that something you don't seem to be understanding?

Core-to-core latency according to Chips and Cheese is as presented in this chart...
1733586357873.png

Note that going from just Core 1 to Core 2 incurs an 85 ns latency hit. P-Core to P-Core inter-core latency is nearly double the latency that Zen 5 endures going from Core 1 to Core 2.

Meanwhile, this is AMD's latency chart from the same article...
1733586442932.png

Note while within the same CCX, going from Core 1 to Core 2 is about 32 ns on average. It's only where you have to cross CCX boundries where you incur an Intel Arrow Lake latency performance hit of 85 ns.

To quote the article...
Each P-Core gets a ring bus stop, as does each quad core E-Core cluster. Add another ring stop for cross-die transfer, and Arrow Lake’s ring bus likely has 13 stops. But latency is worst between Arrow Lake’s P-Cores, so ring bus length is only part of the story.

I suspect Lion Cove’s cache design plays a large role too. Cache coherency protocols ensure only one core can have a particular address cached in modified state. Thus the requesting core in this test will miss in all of its private cache levels. When the core with modified data gets a probe, it has to check all of its own private cache levels, both to read out the modified data and ensure the address is invalidated from its own caches. Lion Cove goes from two levels of core-private data caches to three, adding another step in the process.

Again, this is what I'm trying to get you to understand here dude. Intel really fucked up here and as Chips and Cheese surmise, it may be because the ring bus is so long. My thoughts are that Intel is going to have to design a new core-to-core interconnect to replace the aging ring bus.
 
Last edited:
How is that something you don't seem to be understanding?
I understand what you're saying, you're not following what I'm saying.

The latency's are not so dramatic that they are of ANY serious concern. Windows(or Linux) is compiled specifically to minimize dynamic latency so the graphs above are a reference only for when the thread or OS demands a process change to another core. It just doesn't happen frequently enough to be a problem.

The graphs above show latency's in nanoseconds(billionths of a second). Process's changing core/cpus happens so infrequently that it's measured in full seconds, double digit seconds or even minutes, depending on the process. Thus the latency's are such a small factor that it's effectively insignificant. As such, a few nanoseconds is just not important in the big picture. It's not worth debating.
 
As such, a few nanoseconds is just not important in the big picture.
I'm reminded of a phrase... "0.68 seconds sir. For an android, that is nearly an eternity."

These access latencies are the only thing that I can surmise is the reason why this new chip is just godawful when compared to that of the older 14th gen.
 
I can surmise is the reason why this new chip is just godawful
What?!? How is this CPU "just godawful"?!? Did you actually read this review? And if so, how on this green Earth are you arriving at "godawful"??

Or am I missing something?

EDIT:
Seriously, how does this, this, this, this, this, or this qualify as "godawful"?!? And before you lay in with the game performance complaints, moose muffins. 1080p is STILL the most used gaming resolution on Earth and this was the result. For a CPU that is a shift in arch, that is not a bad shout. Still top tier!

So no, I haven't missed anything. Would you like to revise your nonsense statement?
 
Last edited:
When they’re 15% slower than a 14900K, there’s something wrong!!!
 
More than 15%. ~18%.

1733597962038.png

1733597920508.png

1733597939114.png


 
More than 15%. ~18%.

View attachment 374762
View attachment 374760
View attachment 374761

Nice cherry-pick there bud. Please embarrass yourself further with some additional cherry-picking. Also...
...read the WHOLE page..
 
The 285K really seems to be a step forward and two steps backwards vs. the i9-13900K (I won't even say "14th" because those are not a new architecture and don't really have any changes), and that is really bad considering Raptor Cove is basically just Golden Cove with an improved and enlarged cache subsystem - Alder Lake's Golden Cove architecture dates all the way to 2021, placing it in and around the Zen 3 era. I did expect much more out of it, but it doesn't really offer me anything that I don't already have with my Raptor Lake chip, perhaps that is what is most bitterly disappointing.

Intel promised some microcode and firmware level updates that would supposedly improve Arrow Lake's performance figures where it seems to be falling woefully short of the previous generation's performance, but it has been some time and they have not delivered on this. Would not really be surprised if this did not come at all.

That is not to say that this product does not have some very interesting technology, I think it does. It just hasn't delivered the expected performance for a chip of this caliber and price. Sometimes architectures seemingly don't really have anything wrong with them and if you run the math, they should perform exceptionally well, but it for some reason ultimately doesn't pan out the way it was expected to - RDNA 3 is another great example of it.

Chipmaking is complicated. But these misfires can hurt the company's reputation and bottom line a lot, and Intel certainly wasn't in their strongest position, especially after all the negative press about the Raptor Lake bugs, delays in their foundries and constant controversy surrounding their CEOs.
 
Note that going from just Core 1 to Core 2 incurs an 85 ns latency hit.
Core 1 may be adjacent to core 2, but it appears that the ring bus is unidirectional and the signal has to go a full circle. Unlike server processors which have bidirectional half rings.
 
I understand what you're saying, you're not following what I'm saying.

The latency's are not so dramatic that they are of ANY serious concern. Windows(or Linux) is compiled specifically to minimize dynamic latency so the graphs above are a reference only for when the thread or OS demands a process change to another core. It just doesn't happen frequently enough to be a problem.

The graphs above show latency's in nanoseconds(billionths of a second). Process's changing core/cpus happens so infrequently that it's measured in full seconds, double digit seconds or even minutes, depending on the process. Thus the latency's are such a small factor that it's effectively insignificant. As such, a few nanoseconds is just not important in the big picture. It's not worth debating.
1- Context switches do happen quite often
2- This is not only about context switch, but about any communication across threads, which is also something common to happen.
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.

Depending on the task this may be irrelevant. But for others that's something that could have high impact. FWIW, games fit in the latter.
 
1- Context switches do happen quite often
2- This is not only about context switch, but about any communication across threads, which is also something common to happen.
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.

Depending on the task this may be irrelevant. But for others that's something that could have high impact. FWIW, games fit in the latter.

I wonder if this extreme latency is what's basically negating the IPC improvements this CPU is supposed to show in so many workloads
 
  • Like
Reactions: tfp
I wonder if this extreme latency is what's basically negating the IPC improvements this CPU is supposed to show in so many workloads
Maybe? For some productivity workloads it can flex its muscles, but those are the same workloads where multi-CCD AMD CPUs don't have any issues whatsoever as well.
 
1- Context switches do happen quite often
Yes, Context IS very important.. Example?
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.
This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

Yes, context is important.. You kinda made my point for me.

Depending on the task this may be irrelevant.
May be?
But for others that's something that could have high impact. FWIW, games fit in the latter.
Ok, if that's what you want to go with, have at it. But let's keep in mind that the benchmarks show, very clearly, that gaming performance is on par and within single digit percentages of all of the other top tier CPUs, and that's highly dependent on the game.

This is of course setting aside everything non-gaming where this CPU is excellent in most tasks.

Also, let keep in mind that all other CPU models have latency's in the same range, so even if we're going to knock this one, fairness and objectivity demands that we look at the others as well in the same context and (TADA!), all of them are at or near the same 250cycle latency penalty. The differences amount to less than 100cycles between them... out 5 BILLION cycles.

So... What was that about context? Eh?
 
Last edited:
Yes, Context IS very important.. Example?

This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

Yes, context is important.. You kinda made my point for me.


May be?

Ok, if that's what you want to go with, have at it. But let's keep in mind that the benchmarks show, very clearly, that gaming performance is on par and within single digit percentages of all of the other top tier CPUs, and that's highly dependent on the game.

This is of course setting aside everything non-gaming where this CPU is excellent in most tasks.

Also, let keep in mind that all other CPU models have latency's in the same range, so even if we're going to knock this one, fairness and objectivity demands that we look at the others as well in the same context and (TADA!), all of them are at or near the same 250cycle latency penalty. The differences amount to less than 100cycles between them... out 5 BILLION cycles.

So... What was that about context? Eh?
I'm under the impression that you totally misunderstood what a context switch even is, but I believe trying to argue about that with you is pointless since all you want is frame this product as spotless for some reason, so let's leave it at that.
 
but I believe trying to argue about that with you is pointless since all you want is frame this product as spotless for some reason
That's another failure at context. Spotless? No. But this CPU is most assuredly NOT the "godawful" thing certain people are trying(and failing) to make it out to be. Statements like that come off as little more than meritless fanboy-like nonsense.
so let's leave it at that.
That's probably best.
 
Last edited:
This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

I see where Igor's coming from though, my surface-level understanding of how CPUs work tells me each instruction takes a certain amount of cycles to complete, so if there's a cycle penalty involved for moving data or when context switching, it could add up very fast. I don't understand *why*, though. Lion Cove clearly's a very high performance architecture, it's not like we're debating Prescott or Bulldozer here, and Skymont should have big gains over Gracemont. I really hope it's just microcode bugs.
 
it could add up very fast
For poorly optimized code, sure. The vast majority of programmers are not stupid or incompetent. They know that context-switching or core/CPU shifting comes with a penalty on any platform and as such it is avoided as much as possible. It just doesn't happen as frequently as they are implying. Therefore, it's a latency penalty that isn't a serious problem in most tasks. I knew what they were talking about and was deliberately looking past it.
 
Last edited:
I see where Igor's coming from though, my surface-level understanding of how CPUs work tells me each instruction takes a certain amount of cycles to complete, so if there's a cycle penalty involved for moving data or when context switching, it could add up very fast. I don't understand *why*, though. Lion Cove clearly's a very high performance architecture, it's not like we're debating Prescott or Bulldozer here, and Skymont should have big gains over Gracemont. I really hope it's just microcode bugs.
The core itself seems to be pretty good, but the uncore part seems to be its Achilles' heel. As already mentioned above, it looks like the ringbus is way too long and unidirectional, which causes this cross-core issue.
Along with that, the memory subsystem is a downgrade over previous generations, with way higher latency overall.
I still have to go properly through that article from chips and cheese, but it does give some good insights on that:
Arrow Lake’s high latency, high bandwidth memory subsystem may be well suited to very parallel productivity workloads like image processing. But latency sensitive tasks with low thread counts may not do so well.
 
The core itself seems to be pretty good, but the uncore part seems to be its Achilles' heel. As already mentioned above, it looks like the ringbus is way too long and unidirectional, which causes this cross-core issue.
Along with that, the memory subsystem is a downgrade over previous generations, with way higher latency overall.
I still have to go properly through that article from chips and cheese, but it does give some good insights on that:
You still seem to be missing the important point, the differences in worst case latency are measured in single digits.
Using the graphs you posted from the site above.
Latency-285K.jpg

Worst case scenario for 285k? 85.15ns

Latency-9900X.jpg

Worst case scenario for 9900X? 80.65ns

That is a difference of whooping & earthshattering 4.5ns!...:rolleyes: Good grief, whatever is Intel going to do?!? :laugh:
That's 4.5/1,000,000,000ths of a second for a function event that happens infrequently.

You, @trparky and the site you linked to are making a big deal out of what is effectively nothing-sauce. Do you want to continue with this silly shtick?
 
Last edited:
Back
Top