• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Arrow Lake-S 24 Thread CPU Leaked - Lacks Hyper-Threading & AVX-512 Support

Status
Not open for further replies.

T0@st

News Editor
Joined
Mar 7, 2023
Messages
3,129 (3.92/day)
Location
South East, UK
System Name The TPU Typewriter
Processor AMD Ryzen 5 5600 (non-X)
Motherboard GIGABYTE B550M DS3H Micro ATX
Cooling DeepCool AS500
Memory Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s) PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s) Lenovo Legion Y27q-20 27" QHD IPS monitor
Case GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s) FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse Roccat Kone Pro Air
Keyboard Cooler Master MasterKeys Pro L
Software Windows 10 64-bit Home Edition
An interesting Intel document leaked out last month—it contained detailed pre-release information that covered their upcoming 15th Gen Core Arrow Lake-S desktop CPU platform, including a possible best scenario 8+16+1 core configuration. Thorough analysis of the spec sheet revealed a revelation—the next generation Core processor family could "lack Hyper-Threading (HT) support." The rumor mill had produced similar claims in the past, but the internal technical memo confirmed that Arrow Lake's "expected eight performance cores without any threads enabled via SMT." These specifications could be subject to change, but tipster—InstLatX64—has uprooted an Arrow Lake-S engineering sample: "I spotted (CPUID C0660, 24 threads, 3 GHz, without AVX 512) among the Intel test machines."

The leaker had uncovered several pre-launch Meteor Lake SKUs last year—with 14th Gen laptop processors hitting the market recently, InstLatX64 has turned his attention to seeking out next generation parts. Yesterday's Arrow Lake-S find has chins wagging about the 24 thread count aspect (sporting two more than the fanciest Meteor Lake Core Ultra 9 processor)—this could be an actual 24 core total configuration—considering the evident lack of hyper-threading, as seen on the leaked engineering sample. Tom's Hardware reckons that the AVX-512 instruction set could be disabled via firmware or motherboard UEFI—if InstLatX64's claim of "without AVX-512" support does ring true, PC users (demanding such workloads) are best advised to turn to Ryzen 7040 and 8040 series processors, or (less likely) Team Blue's own 5th Gen Xeon "Emerald Rapids" server CPUs.



View at TechPowerUp Main Site | Source
 
so 8p + 16e?
 
My prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.
The point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

intelavx102.png


The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.
 
I think Zen 5 is going to be killer, especially if AMD is able to use TSMC 3nm, but do hope Intel is able to bring some competition with their Intel 4 node to HEDT. It's not sounding like they will right now but it's too early to know.
 
The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.
The claim? Hmm, the claim is "in future P-cores and E-cores". Not mentioned exact time and series of CPU's.
 
The point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.
Unfortunately it doesn't work like that. Even with AVX10.2 you still have to choose the vector width at compile time, it's not like ARM Scalable Vector Extensions which is vector register width-independent.

From the Intel AVX10 paper:
The converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an
AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and
new versions of 256-bit instructions supporting embedded rounding. This converged version will be supported on
both P-cores and E-cores.
While the converged version is limited to a maximum 256-bit vector length, Intel AVX10
itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10
carries forward all the benefits of Intel AVX-512 from the Intel® Xeon® with P-core product lines, supporting the
key instructions, vector and mask register lengths, and capabilities that have comprised the ISA to date. Future P-
core based Xeon processors will continue to support all Intel AVX-512 instructions ensuring that legacy applications
continue to run without impact.
256-bit as baseline with 512-bit for P-cores. Further it clarifies that 512-bit length is on processors only containing P-cores, so most likely only Xeons:
[...] with 128-bit and 256-bit vector lengths being supported across all processors, and 512-bit vector
lengths additionally supported on P-core processors.
It would be nice if they allowed disablement of E-cores to make the CPU "fully P-core" to enable 512-bit vector registers, but we'll have to see. Intel wasn't very happy with early Alder Lake BIOS switches to do this.
You won't be able to use current AVX-512 software on AVX10 E-cores without recompilation either, and if they use 512-bit vectors you will need to make changes in code:
Existing Intel AVX-512 applications, many of them already using maximum 256-bit vectors, should see the same
performance when compiled to Intel AVX10/256 at iso-vector length. For applications that can leverage greater
vector lengths, Intel AVX10/512 will be supported on Intel P-cores, continuing to deliver the best-in-class perfor-
mance for AI, scientific, and other high-performance codes.
Again, on P-core CPUs (Xeons) it will work without recompilation.

The GCC documentation also confirms that 512-bit register support is a separate feature.

AVX10 is bringing a lot of AVX-512 goodness to E-core designs, but it's not seamless nor fully backwards compatible with current AVX-512 software.
 
Is Intel planning on releasing new generation HEDT CPUs?
 
It would be nice if they allowed disablement of E-cores to make the CPU "fully P-core" to enable 512-bit vector registers, but we'll have to see. Intel wasn't very happy with early Alder Lake BIOS switches to do this.
You won't be able to use current AVX-512 software on AVX10 E-cores without recompilation either, and if they use 512-bit vectors you will need to make changes in code:
And I can't help but think that Intel is really holding back the rest of the industry with these kinds of shenanigans. We could have universal AVX-512 support but we can't because... Intel.
 
My prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.

Just some notes:

- Extended features of AVX10-256 instructions over AVX2-256 instructions are more important for a desktop CPU like Arrow Lake than the lack of AVX10-512

- Zen 5 presumably won't have APX which is an instruction set extension more important for performance of general-purpose codes than AVX10-512 because most general-purpose codes cannot be vectorized with AVX10

Unfortunately it doesn't work like that. Even with AVX10.2 you still have to choose the vector width at compile time, it's not like ARM Scalable Vector Extensions which is vector register width-independent.

I think the previous post suggesting that E-cores might implement AVX10-512 meant that a large part of the AVX10-512 instruction set could (in theory) be implemented by 256-bit ALUs on E-cores.

In either case, the past failure of heterogeneous x86 Intel CPUs is purely a software failure (operating systems, compilers).

And I can't help but think that Intel is really holding back the rest of the industry with these kinds of shenanigans. We could have universal AVX-512 support but we can't because... Intel.

Intel isn't holding back the industry. The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.
 
Just some notes:

- Extended features of AVX10-256 instructions over AVX2-256 instructions are more important for a desktop CPU like Arrow Lake than the lack of AVX10-512
Agreed, however this fragments the ecosystem even further.
- Zen 5 presumably won't have APX which is an instruction set extension more important for performance of general-purpose codes than AVX10-512 because most general-purpose codes cannot be vectorized with AVX10
IMO Intel is playing a dangerous game with APX. This looks like the similar attempt which was made during the 32-bit to 64-bit transition. Itanium (ia64) was supposed to be the 64-bit architecture, obviously under Intel/HP control, while x86 remained 32-bit. The industry wasn't happy about such prospect and chose the amd64 extension to x86 instead which retained 100% software compatibility.

Implementing support for APX will touch every aspect of software, from operating systems through compilers to (specific) libraries. I'm not sure if it will be a success for Intel. AVX-512 software only relatively recently started picking up, and with the Intel consumer SKUs not supporting it after Rocket/Ice/Tiger Lakes did looks like Intel can't stick to its own technology. It wouldn't be the first time either - SGX and TSX also were removed.

I think the previous post suggesting that E-cores might implement AVX10-512 meant that a large part of the AVX10-512 instruction set could (in theory) be implemented by 256-bit ALUs on E-cores.
Sure they can, that's what Centaur's CHA microarchitecture did, AMD's implementation in Zen 4 is a bit more complex with more of the CPU being 512-bit optimized. Adjusting the decoding part for AVX10 should be relatively cheap area-wise, but only Intel knows if it's feasible for sure.
In either case, the past failure of heterogeneous x86 Intel CPUs is purely a software failure (operating systems, compilers).

Intel isn't holding back the industry. The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.
Not sure why you're blaming compilers when it's Intel that is responsible for their development and wiring up support for their microarchitectures. As for operating systems, it's kind of the same - they did work closely with Microsoft to implement support in Windows 11, and still it fails to assign threads correctly sometimes. Linux support was also Intel's to complete, yet it's still not done with no equivalent for Intel Thread Director support.
 
Last edited:
Not sure why you're blaming compilers when it's Intel that is responsible for their development and wiring up support for their microarchitectures. As for operating systems, it's kind of the same - they did work closely with Microsoft to implement support in Windows 11, and still it fails to assign threads correctly sometimes. Linux support was also Intel's to complete, yet it's still not done with no equivalent for Intel Thread Director support.

It is an operating system's choice whether to support or not to support some form of dynamic recompilation as a core feature of its architecture. This choice cannot be made by a CPU designed and manufactured to be heterogeneous. The "top killer" or "alpha predator" of heterogeneous x86 CPUs is the architecture of operating systems.
 
My prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.
Where's the lower IPC and clocks come from? Isn't it on the new Intel 20A?
 
Implementing support for APX will touch every aspect of software, from operating systems through compilers to (specific) libraries.

In Linux, the existing infrastructure for distributing software packages could be used to get APX binaries. In Windows, adoption of APX might be slower than in Linux.

I'm not sure if it will be a success for Intel. AVX-512 software only relatively recently started picking up, and with the Intel consumer SKUs not supporting it after Rocket/Ice/Tiger Lakes did looks like Intel can't stick to its own technology. It wouldn't be the first time either - SGX and TSX also were removed.

The curvature of the adoption rate of APX cannot be inferred from the historical record of AVX-512 adoption rate.
 
The architecture of operating systems and compilers, incapable of supporting heterogeneous CPUs, are holding it back.
But wouldn't that ultimately lead to bigger executable binaries and associated DLLs since there would have to be two code paths? One for AVX-512 equipped CPUs and another for everything else.

dynamic recompilation as a core feature of its architecture.
That's a possibility but more disk space would be required since you essentially would want to cache the recompiled binary.
 
It is an operating system's choice whether to support or not to support some form of dynamic recompilation as a core feature of its architecture. This choice cannot be made by a CPU designed and manufactured to be heterogeneous. The "top killer" or "alpha predator" of heterogeneous x86 CPUs is the architecture of operating systems.
Sorry, I'm having problems understanding what you're trying to say. It's the CPU's vendor obligation to provide support, not the other way around.
Do you know of any mainstream operating system that actually supports heterogenous ISAs? As far as I know it isn't done even on ARM - all the SoCs spotting big.LITTLE (and "midDLE" nowadays) cores always support the same ARM specification levels on all of them, so that processes can be migrated - the same as Intel E-/P-core designs. The problem here is proper scheduling.

In Linux, the existing infrastructure for distributing software packages could be used to get APX binaries. In Windows, adoption of APX might be slower than in Linux.
That's not enough as you need modifications to the lowest levels of the OS kernel. Intel outlines what's needed in their documentation. In order to do that you need hardware in the hands of kernel developers. In order to support APX software you also need the hardware to develop/port them in the first place. It's a common problem with all new technology.
The curvature of the adoption rate of APX cannot be inferred from the historical record of AVX-512 adoption rate.
Yes, but the history of Intel's additions to x86, and their subsequent removals, can make potential developers wary of supporting APX in the first place. This is also the case for AVX10.

But wouldn't that ultimately lead to bigger executable binaries and associated DLLs since there would have to be two code paths? One for AVX-512 equipped CPUs and another for everything else.
Yes, and that's what Intel's Clear Linux does:
To fully use the capabilities in different generations of CPU hardware, Clear Linux OS will perform multiple builds of libraries with CPU-specific optimizations. For example, Clear Linux OS builds libraries with Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512). Clear Linux OS can then dynamically link to the library with the newest optimization based on the processor in the running system. Runtime libraries used by ordinary applications benefit from these CPU specific optimizations.
Another method would be dynamic dispatching dependent on runtime detection of CPU flags, like in Intel IPP or common math libraries.
 
Sorry, I'm having problems understanding what you're trying to say. It's the CPU's vendor obligation to provide support, not the other way around.

Just to put this in another perspective: The idea "It's the CPU's vendor obligation to provide [OS] support" would seam crazy around year 1980. Z80 CPU vendor should be responsible for providing software support to machines built with Z80? ---- Year 2024 isn't the end of history.

Do you know of any mainstream operating system that actually supports heterogenous ISAs?

How does non-existence of such operating systems invalidate my previous claim that the operating system architecture is the top "alpha predator" of heterogeneous CPUs?

That's not enough as you need modifications to the lowest levels of the OS kernel. Intel outlines what's needed in their documentation. In order to do that you need hardware in the hands of kernel developers. In order to support APX software you also need the hardware to develop/port them in the first place. It's a common problem with all new technology.

AVX10.1 support has already been posted to the gcc compiler. I don't know whether the developers who posted it have access to a physical CPU with AVX10.1.

Yes, but the history of Intel's additions to x86, and their subsequent removals, can make potential developers wary of supporting APX in the first place. This is also the case for AVX10.

No. The fact is that there hasn't been any such x86 ISA extension since introduction of amd64. APX is the first ever extension on top amd64 for general-purpose computations.

That's a possibility but more disk space would be required since you essentially would want to cache the recompiled binary.

Do you know what the size of Vulkan shader caches is on a gaming machine?
 
Another method would be dynamic dispatching dependent on runtime detection of CPU flags, like in Intel IPP or common math libraries.
There are also other techniques available, and I'm amazed that Intel didn't implement them in their P+E CPUs. If an E core encounters an AVX-512 instruction, and given proper OS support, it can do one of two things that don't kill the process: either emulate that instruction or suspend execution so the scheduler can migrate the thread to a P core. That's not how you gain performance of course, but you get stable execution, the processor can still run code that only some of its cores are able to execute, and large areas of silicon are not wasted.
 
My prediction for Arrow Lake:

Lower p core IPC
Lower p core clocks
No AVX512
No HT
Emphasis on AI, iGPU and e cores

Zen 5 is gonna wipe the floor with this thing.
Arrow Lake will have p-core clocks ~1GHz lower than Raptor Lake. However the real killer blow will be the pricing which is said to be very high for MB's. AMD will obliterate them on performance per dollar with Zen 5. Zen 5 is already said to be faster than Zen 4 X3D in gaming.
 
The point about AVX512 could be correct, sort of, because Intel has replaced it with AVX 10, that for Arrow Lake is rumoured to be AVX 10.2 (see the Intel diagram below)

View attachment 332662

The claim is that the Arrow Lake CPU will have P-cores supporting 512 bit vectors, and E-cores 256 bit vectors but functionally the chip will support the full AVX-512 instruction set.
that promote SSE to Avx thing hasn't been working as well in resent years. Since E-cores don't have AVX.
 
One thing they could do is a 8P cores with HT along with 8 E cores not clusters. They should just keep the E core shared cache the same and reduce them to pairs of 2 per cluster. It will allow them more flexibility to insert more actual P cores something many have been desiring which is a opportunity and at the same time would then provide more shared cache per E core cluster than current designs. It would also reduce power and improve thermals if running fewer cores in total, but getting better ST across more cores and higher efficiency in exchange. It's not a bad trade off. The E cores would also have more consistent latency response with more cache per cluster to access.
 
Just to put this in another perspective: The idea "It's the CPU's vendor obligation to provide [OS] support" would seam crazy around year 1980. Z80 CPU vendor should be responsible for providing software support to machines built with Z80? ---- Year 2024 isn't the end of history.
From Wikipedia:
The first samples were returned from Mostek on 9 March 1976. By the end of the month, they had also completed an assembler-based development system.
So... it was Z80's creators that provided support after all. Other operating systems obviously used this and documentation to implement support, but it is the CPU vendor's job to provide the initial support, development environments and documentation.
How does non-existence of such operating systems invalidate my previous claim that the operating system architecture is the top "alpha predator" of heterogeneous CPUs?
There is no such operating system because nobody actually created a CPU like that, as far as I know.
AVX10.1 support has already been posted to the gcc compiler. I don't know whether the developers who posted it have access to a physical CPU with AVX10.1.
If you actually looked into it you'd know who those developers were: Intel employees who obviously have access to hardware. They are the ones who always do enablement of new parts in the Linux kernel and GCC/LLVM.
No. The fact is that there hasn't been any such x86 ISA extension since introduction of amd64. APX is the first ever extension on top amd64 for general-purpose computations.
I guess we'll have to see how far APX can go. There's still a risk that software vendors will simply not bother, and continue to support amd64 only or invest in ARM/RISC-V instead as the "next big thing". This is the danger I wrote about before. When Intel tried this with Itanium they have been in a much stronger market position than they are in now, yet it still failed.
On the other hand if AMD is on board and has been implementing APX, AVX10 and X86S into their ~Zen 5/6 it's going to bring good additions to x86 in general. They do have cross-licensing agreements (after Intel lost in court).

There are also other techniques available, and I'm amazed that Intel didn't implement them in their P+E CPUs. If an E core encounters an AVX-512 instruction, and given proper OS support, it can do one of two things that don't kill the process: either emulate that instruction or suspend execution so the scheduler can migrate the thread to a P core. That's not how you gain performance of course, but you get stable execution, the processor can still run code that only some of its cores are able to execute, and large areas of silicon are not wasted.
Yeah, there are many potential software solutions, and unfortunately all of them have downsides. IMO the proper way would have been equipping E-cores with AVX-512 capabilities, even if it's executed on 256-bit registers like Centaur's CHA. I'm not knowledgeable enough to see what it would take to modify Atom cores to do it, but Intel did not think its worth it. They continue to think that given the fact that even Arrow/Lunar/Panther Lake E-cores do not have AVX-512 (via GCC).
 
While the move from 32->64bit was slow, it was also restricted due to limited internet infrastructure at the time, we usually had service packs etc, on CDs, so backwards compatibility was very important.
Now almost every new instruction set patches are released a few months early to the public and you can just download a patch/update for your OS, software, games, compilers etc


the downside is of course that nothing is being tested anymore, although it's a different discussion on that
AVX10 is crap, they should just let avx512 be the last "extension" (and both start working on x86-S or APX, or the original idea behind AMD Fusion/HSA)

For AMD/Intel adding a few tiny ARM accelerators on chiplets/tiles is a matter of cost-benefit, Qualcomm/Samsung/Apple on the other hand cannot use x86 at all, AMD already had ARM co-processors on-chip since FX era

Then again maybe they both believe they're too big to fall, I guess we'll see in 5-10 years
 
Status
Not open for further replies.
Back
Top