Intel Core Ultra 9 285K

trparky · Dec 6, 2024

It's well known that CCX-to-CCX data transfers incur a latency penalty in multi-chiplet Ryzen processors, hence the reason why it's a good idea to keep a program locked onto the same CCX. But if we are to take what Chips and Cheese say, there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!

AnotherReader · Dec 6, 2024

trparky said:
It's well known that CCX-to-CCX data transfers incur a latency penalty in multi-chiplet Ryzen processors, hence the reason why it's a good idea to keep a program locked onto the same CCX. But if we are to take what Chips and Cheese say, there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!

Yeah it's a very surprising finding; I can't fathom why these transfers would be so slow given that they all are on one ring. Note that some P cores have slightly lower latency to E cores, perhaps because the neighbouring E cores are relatively close in terms of rings hops. Chips and Cheese thinks it's because of the increased levels in the P core cache hierarchy, but that's supposition.

I suspect Lion Cove’s cache design plays a large role too. Cache coherency protocols ensure only one core can have a particular address cached in modified state. Thus the requesting core in this test will miss in all of its private cache levels. When the core with modified data gets a probe, it has to check all of its own private cache levels, both to read out the modified data and ensure the address is invalidated from its own caches. Lion Cove goes from two levels of core-private data caches to three, adding another step in the process.

lexluthermiester · Dec 7, 2024

trparky said:
there's a latency penalty going from just one core to another on Arrow Lake, even if the core is a next-door neighbor. YIKES!!!

There is always a latency penalty when transferring a process from one CPU to another regardless of CPU type or model, even from one core to another on the same CPU die. This is especially true for CPU's with segregated L2/L3 caches, which is everything these days. Not so "yikes". Par for the course.

trparky · Dec 7, 2024

lexluthermiester said:
There is always a latency penalty when transferring a process from one CPU to another regardless of CPU type or model, even from one core to another on the same CPU die. This is especially true for CPU's with segregated L2/L3 caches, which is everything these days. Not so "yikes". Par for the course.

What I'm saying "Yikes!" about is that the latency penalty on Arrow Lake to transfer from one core to a neighboring core in on the order of the same latency penalty that Ryzen gets when going from one CCX to another. I can understand the latency hit going from one CCX to another, but damn... to have the same latency going from one core to a neighboring core is just... bad. How did Intel screw this up so badly?

lexluthermiester · Dec 7, 2024

trparky said:
What I'm saying "Yikes!" about is that the latency penalty on Arrow Lake to transfer from one core to a neighboring core in on the order of the same latency penalty that Ryzen gets when going from one CCX to another. I can understand the latency hit going from one CCX to another, but damn... to have the same latency going from one core to a neighboring core is just... bad. How did Intel screw this up so badly?

What I'm saying is that it's an over-reaction. Those latency times are not a problem.
Optimal? No. Anything close to a serious issue? Also no.

trparky · Dec 7, 2024

lexluthermiester said:
What I'm saying is that it's an over-reaction. Those latency times are not a problem.
Optimal? No. Anything close to a serious issue? Also no.

I don't think you understand what I'm trying to say. I understand there's going to be latency involved but it's like Intel is having the latency that AMD incurs when, again, going from one CCX to another on the same damn die! How is that something you don't seem to be understanding?

Core-to-core latency according to Chips and Cheese is as presented in this chart...

Note that going from just Core 1 to Core 2 incurs an 85 ns latency hit. P-Core to P-Core inter-core latency is nearly double the latency that Zen 5 endures going from Core 1 to Core 2.

Meanwhile, this is AMD's latency chart from the same article...

Note while within the same CCX, going from Core 1 to Core 2 is about 32 ns on average. It's only where you have to cross CCX boundries where you incur an Intel Arrow Lake latency performance hit of 85 ns.

To quote the article...

Each P-Core gets a ring bus stop, as does each quad core E-Core cluster. Add another ring stop for cross-die transfer, and Arrow Lake’s ring bus likely has 13 stops. But latency is worst between Arrow Lake’s P-Cores, so ring bus length is only part of the story.

I suspect Lion Cove’s cache design plays a large role too. Cache coherency protocols ensure only one core can have a particular address cached in modified state. Thus the requesting core in this test will miss in all of its private cache levels. When the core with modified data gets a probe, it has to check all of its own private cache levels, both to read out the modified data and ensure the address is invalidated from its own caches. Lion Cove goes from two levels of core-private data caches to three, adding another step in the process.

Again, this is what I'm trying to get you to understand here dude. Intel really fucked up here and as Chips and Cheese surmise, it may be because the ring bus is so long. My thoughts are that Intel is going to have to design a new core-to-core interconnect to replace the aging ring bus.

lexluthermiester · Dec 7, 2024

trparky said:
How is that something you don't seem to be understanding?

I understand what you're saying, you're not following what I'm saying.

The latency's are not so dramatic that they are of ANY serious concern. Windows(or Linux) is compiled specifically to minimize dynamic latency so the graphs above are a reference only for when the thread or OS demands a process change to another core. It just doesn't happen frequently enough to be a problem.

The graphs above show latency's in nanoseconds(billionths of a second). Process's changing core/cpus happens so infrequently that it's measured in full seconds, double digit seconds or even minutes, depending on the process. Thus the latency's are such a small factor that it's effectively insignificant. As such, a few nanoseconds is just not important in the big picture. It's not worth debating.

trparky · Dec 7, 2024

lexluthermiester said:
As such, a few nanoseconds is just not important in the big picture.

I'm reminded of a phrase... "0.68 seconds sir. For an android, that is nearly an eternity."

These access latencies are the only thing that I can surmise is the reason why this new chip is just godawful when compared to that of the older 14th gen.

lexluthermiester · Dec 7, 2024

trparky said:
I can surmise is the reason why this new chip is just godawful

What?!? How is this CPU "just godawful"?!? Did you actually read this review? And if so, how on this green Earth are you arriving at "godawful"??

Or am I missing something?

EDIT:
Seriously, how does this, this, this, this, this, or this qualify as "godawful"?!? And before you lay in with the game performance complaints, moose muffins. 1080p is STILL the most used gaming resolution on Earth and this was the result. For a CPU that is a shift in arch, that is not a bad shout. Still top tier!

So no, I haven't missed anything. Would you like to revise your nonsense statement?

trparky · Dec 7, 2024

When they’re 15% slower than a 14900K, there’s something wrong!!!

lexluthermiester · Dec 7, 2024

trparky said:
When they’re 15% slower than a 14900K, there’s something wrong!!!

Are you serious? Where are you getting 15% from? Or are you just trolling? That's it isn't it, you're just being an *&^ for no reason, aren't you..

3valatzy · Dec 7, 2024

More than 15%. ~18%.

Intel Core Ultra 9 285K Review

Finally! Intel's new Arrow Lake architecture is launched. The new CPUs are full of design changes, like removal of Hyper-Threading, new Lion Cove P-Cores, an improved Thread Director and more. In our review we got surprising results that were both impressive and disappointing.

www.techpowerup.com

lexluthermiester · Dec 7, 2024

3valatzy said:
More than 15%. ~18%.

View attachment 374762
View attachment 374760
View attachment 374761

Intel Core Ultra 9 285K Review

Finally! Intel's new Arrow Lake architecture is launched. The new CPUs are full of design changes, like removal of Hyper-Threading, new Lion Cove P-Cores, an improved Thread Director and more. In our review we got surprising results that were both impressive and disappointing.

www.techpowerup.com

Nice cherry-pick there bud. Please embarrass yourself further with some additional cherry-picking. Also...

lexluthermiester said:
this, this, this, this, this, or this

...read the WHOLE page..

Dr. Dro · Dec 7, 2024

The 285K really seems to be a step forward and two steps backwards vs. the i9-13900K (I won't even say "14th" because those are not a new architecture and don't really have any changes), and that is really bad considering Raptor Cove is basically just Golden Cove with an improved and enlarged cache subsystem - Alder Lake's Golden Cove architecture dates all the way to 2021, placing it in and around the Zen 3 era. I did expect much more out of it, but it doesn't really offer me anything that I don't already have with my Raptor Lake chip, perhaps that is what is most bitterly disappointing.

Intel promised some microcode and firmware level updates that would supposedly improve Arrow Lake's performance figures where it seems to be falling woefully short of the previous generation's performance, but it has been some time and they have not delivered on this. Would not really be surprised if this did not come at all.

That is not to say that this product does not have some very interesting technology, I think it does. It just hasn't delivered the expected performance for a chip of this caliber and price. Sometimes architectures seemingly don't really have anything wrong with them and if you run the math, they should perform exceptionally well, but it for some reason ultimately doesn't pan out the way it was expected to - RDNA 3 is another great example of it.

Chipmaking is complicated. But these misfires can hurt the company's reputation and bottom line a lot, and Intel certainly wasn't in their strongest position, especially after all the negative press about the Raptor Lake bugs, delays in their foundries and constant controversy surrounding their CEOs.

N/A · Dec 7, 2024

trparky said:
Note that going from just Core 1 to Core 2 incurs an 85 ns latency hit.

Core 1 may be adjacent to core 2, but it appears that the ring bus is unidirectional and the signal has to go a full circle. Unlike server processors which have bidirectional half rings.

igormp · Dec 7, 2024

lexluthermiester said:
I understand what you're saying, you're not following what I'm saying.

The latency's are not so dramatic that they are of ANY serious concern. Windows(or Linux) is compiled specifically to minimize dynamic latency so the graphs above are a reference only for when the thread or OS demands a process change to another core. It just doesn't happen frequently enough to be a problem.

The graphs above show latency's in nanoseconds(billionths of a second). Process's changing core/cpus happens so infrequently that it's measured in full seconds, double digit seconds or even minutes, depending on the process. Thus the latency's are such a small factor that it's effectively insignificant. As such, a few nanoseconds is just not important in the big picture. It's not worth debating.

1- Context switches do happen quite often
2- This is not only about context switch, but about any communication across threads, which is also something common to happen.
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.

Depending on the task this may be irrelevant. But for others that's something that could have high impact. FWIW, games fit in the latter.

Dr. Dro · Dec 7, 2024

igormp said:
1- Context switches do happen quite often
2- This is not only about context switch, but about any communication across threads, which is also something common to happen.
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.

Depending on the task this may be irrelevant. But for others that's something that could have high impact. FWIW, games fit in the latter.

I wonder if this extreme latency is what's basically negating the IPC improvements this CPU is supposed to show in so many workloads

igormp · Dec 7, 2024

Dr. Dro said:
I wonder if this extreme latency is what's basically negating the IPC improvements this CPU is supposed to show in so many workloads

Maybe? For some productivity workloads it can flex its muscles, but those are the same workloads where multi-CCD AMD CPUs don't have any issues whatsoever as well.

lexluthermiester · Dec 8, 2024

igormp said:
1- Context switches do happen quite often

Yes, Context IS very important.. Example?

igormp said:
3- At 5GHz, a single cycle takes 0.2ns, so at 50ns we're talking about 250 wasted cycles.

This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

Yes, context is important.. You kinda made my point for me.

igormp said:
Depending on the task this may be irrelevant.

May be?

igormp said:
But for others that's something that could have high impact. FWIW, games fit in the latter.

Ok, if that's what you want to go with, have at it. But let's keep in mind that the benchmarks show, very clearly, that gaming performance is on par and within single digit percentages of all of the other top tier CPUs, and that's highly dependent on the game.

This is of course setting aside everything non-gaming where this CPU is excellent in most tasks.

Also, let keep in mind that all other CPU models have latency's in the same range, so even if we're going to knock this one, fairness and objectivity demands that we look at the others as well in the same context and (TADA!), all of them are at or near the same 250cycle latency penalty. The differences amount to less than 100cycles between them... out 5 BILLION cycles.

So... What was that about context? Eh?

igormp · Dec 8, 2024

lexluthermiester said:
Yes, Context IS very important.. Example?

This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

Yes, context is important.. You kinda made my point for me.

May be?

Ok, if that's what you want to go with, have at it. But let's keep in mind that the benchmarks show, very clearly, that gaming performance is on par and within single digit percentages of all of the other top tier CPUs, and that's highly dependent on the game.

This is of course setting aside everything non-gaming where this CPU is excellent in most tasks.

Also, let keep in mind that all other CPU models have latency's in the same range, so even if we're going to knock this one, fairness and objectivity demands that we look at the others as well in the same context and (TADA!), all of them are at or near the same 250cycle latency penalty. The differences amount to less than 100cycles between them... out 5 BILLION cycles.

So... What was that about context? Eh?

I'm under the impression that you totally misunderstood what a context switch even is, but I believe trying to argue about that with you is pointless since all you want is frame this product as spotless for some reason, so let's leave it at that.

lexluthermiester · Dec 8, 2024

igormp said:
but I believe trying to argue about that with you is pointless since all you want is frame this product as spotless for some reason

That's another failure at context. Spotless? No. But this CPU is most assuredly NOT the "godawful" thing certain people are trying(and failing) to make it out to be. Statements like that come off as little more than meritless fanboy-like nonsense.

igormp said:
so let's leave it at that.

That's probably best.

Dr. Dro · Dec 8, 2024

lexluthermiester said:
This. Seriously? You are telling me the 250cycles is important when we're talking about 5,000,000,000 cycles per second? Hmm?

I see where Igor's coming from though, my surface-level understanding of how CPUs work tells me each instruction takes a certain amount of cycles to complete, so if there's a cycle penalty involved for moving data or when context switching, it could add up very fast. I don't understand *why*, though. Lion Cove clearly's a very high performance architecture, it's not like we're debating Prescott or Bulldozer here, and Skymont should have big gains over Gracemont. I really hope it's just microcode bugs.

lexluthermiester · Dec 8, 2024

Dr. Dro said:
it could add up very fast

For poorly optimized code, sure. The vast majority of programmers are not stupid or incompetent. They know that context-switching or core/CPU shifting comes with a penalty on any platform and as such it is avoided as much as possible. It just doesn't happen as frequently as they are implying. Therefore, it's a latency penalty that isn't a serious problem in most tasks. I knew what they were talking about and was deliberately looking past it.

igormp · Dec 8, 2024

Dr. Dro said:
I see where Igor's coming from though, my surface-level understanding of how CPUs work tells me each instruction takes a certain amount of cycles to complete, so if there's a cycle penalty involved for moving data or when context switching, it could add up very fast. I don't understand *why*, though. Lion Cove clearly's a very high performance architecture, it's not like we're debating Prescott or Bulldozer here, and Skymont should have big gains over Gracemont. I really hope it's just microcode bugs.

The core itself seems to be pretty good, but the uncore part seems to be its Achilles' heel. As already mentioned above, it looks like the ringbus is way too long and unidirectional, which causes this cross-core issue.
Along with that, the memory subsystem is a downgrade over previous generations, with way higher latency overall.
I still have to go properly through that article from chips and cheese, but it does give some good insights on that:

Examining Intel's Arrow Lake, at the System Level

Arrow Lake is the codename for Intel's newest generation of high performance desktop CPUs.

chipsandcheese.com

Arrow Lake’s high latency, high bandwidth memory subsystem may be well suited to very parallel productivity workloads like image processing. But latency sensitive tasks with low thread counts may not do so well.

lexluthermiester · Dec 9, 2024

igormp said:
The core itself seems to be pretty good, but the uncore part seems to be its Achilles' heel. As already mentioned above, it looks like the ringbus is way too long and unidirectional, which causes this cross-core issue.
Along with that, the memory subsystem is a downgrade over previous generations, with way higher latency overall.
I still have to go properly through that article from chips and cheese, but it does give some good insights on that:

Examining Intel's Arrow Lake, at the System Level

Arrow Lake is the codename for Intel's newest generation of high performance desktop CPUs.

chipsandcheese.com

You still seem to be missing the important point, the differences in worst case latency are measured in single digits.
Using the graphs you posted from the site above.

Worst case scenario for 285k? 85.15ns

Worst case scenario for 9900X? 80.65ns

That is a difference of whooping & earthshattering 4.5ns!... :rolleyes:

Good grief, whatever is Intel going to do?!? :laugh:

That's 4.5/1,000,000,000ths of a second for a function event that happens infrequently.

You, @trparky and the site you linked to are making a big deal out of what is effectively nothing-sauce. Do you want to continue with this silly shtick?

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

Processor	E5-4627 v4
Motherboard	VEINEDA X99
Memory	32 GB
Video Card(s)	2080 Ti
Storage	NE-512
Display(s)	G27Q
Case	MATREXX 50
Power Supply	SF850L

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw