Thursday, October 19th 2017

Intel "Cannon Lake" Could Bring AVX-512 Instruction-set to the Mainstream

Intel's next-generation "Cannon Lake" CPU micro-architecture could see the introduction of the AVX-512 instruction-set to the mainstream segments (MSDT or mainstream-desktop, and mobile). It is currently available on the company's Core X "Skylake-X" HEDT processors, and on the company's Xeon "Skylake-W," Xeon Scalable "Skylake-SP," and in a limited form on the Xeon Phi Knights Landing and Knights Mill scalar compute chips.

The upcoming "Cannon Lake" mainstream silicon will feature AVX512F, AVX512CD, AVX512DQ, AVX512BW, and AVX512VL instructions, and will support AVX512_IFMA and AVX512_VBMI commands, making it a slightly broader implementation of AVX-512 than the "Skylake-SP" silicon. The new AVX-512 will vastly improve performance of compute-intensive applications that take advantage of it. It will also be a key component of future security standards.

Source: Anandtech
Add your own comment

52 Comments on Intel "Cannon Lake" Could Bring AVX-512 Instruction-set to the Mainstream

#1
2901BitSlice
The more salient question is WHEN will Cannon Lake bring us the AVX512 Instruction Set ?
Posted on Reply
#2
First Strike
From an architectural perspective, that makes pretty much sense. A skylake(client) core has two 256-bits FMA units (on port0 and port1), which can be fused into one 512-bits FMA unit, as they have already done with Skylake server variant (one of two 512-bits FMAs is fused from 2*256 in original client core variant, and another 512 is implemented by attaching an additional AVX section to the core).

But the problem is, if they have already implemented this on Skylake-SP, why not coffee lake or even kaby lake? And why did they change their mind with Cannonlake? From a timeline perspective, doubtful.
Posted on Reply
#3
Camm
AVX512 - where either the vector unit runs, or your CPU runs (as thermally, the vector unit throttles the shit out of the CPU).

Intel needs to solve that before I'll get excited about AVX512 (as lets be honest, its only generally useful for the 1% of stuff I can't send to the GPU in the first place).
Posted on Reply
#4
StrayKAT
2901BitSlice said:
The more salient question is WHEN will Cannon Lake bring us the AVX512 Instruction Set ?
When AMD gives them another scare.
Posted on Reply
#5
cucker tarlson
I've been hearing about avx since haswell, could someone explain it to a simpleton ?
Posted on Reply
#6
londiste
cucker tarlson said:
I've been hearing about avx since haswell, could someone explain it to a simpleton ?
Additional set of instructions/operations that processor can perform, also with larger numbers. There are additional bonuses, like being way more efficient when doing the exact same operation on a number of operands.
While the main x86 operations work with 64-bit numbers, SSE does 128-bit, AVX does 256-bit and AVS-512 does 512-bit numbers. This increases both range for integer calculations and precision for floating point calculations.
Usefulness of these extensions relies heavily on both compilers and software being aware of these and using these. On the other hand, these operations are useful for some types of software (productivity software usually, lots of calculations, especially requiring high precision) and less useful for others.

SSE is from Pentium 3 era.
AVX has been there since Sandy Bridge (and Bulldozer).
AVX2 since Haswell (and Excavator).

Wiki is actually pretty good on the topic:
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
Posted on Reply
#7
cucker tarlson
what software can benefit from avx ? rendering ? games ? or just number crunching ?
Posted on Reply
#8
RejZoR
So, in a nutshell, don't buy Kaby Lake now, wait for Cannon Lake next year...
Posted on Reply
#9
londiste
cucker tarlson said:
what software can benefit from avx ? rendering ? games ? or just number crunching ?
Good candidate is anything that does parallelization or vectorization, basically running same operations on a lot of data. Image or video processing (Adobe stuff), encoding/decoding (ffmpeg), compression (7zip, WinRAR) plus obviously anything that does a lot of math (Excel).
Posted on Reply
#10
R0H1T
RejZoR said:
So, in a nutshell, don't buy Kaby Lake now, wait for Cannon Lake next year...
It's supposed to be on track for release later this year, proper paper launch it ought to be. Though CNL is not going to be in Intel's full product stack i.e. desktop, server et al, only a handful of low power chips.
Posted on Reply
#11
londiste
What makes it a paper launch?
If I look at local shops, I could go and buy i7 8700 or i5 8400 right now for what seem to be MSRP prices. i7 8700K is not in stock but that is not very surprising.
Posted on Reply
#12
RejZoR
So, it's like GeForce GTX 750 and Radeon R9 285. A forgettable chip with all the latest tech not seen even in their most expensive top of the line products. Well, that's garbage then. Don't buy either and wait 2 more years then for it to get into the "mainstream". I see it as important enough to be worthy waiting unless you have a really prehistoric system that needs replacing like NOW.
Posted on Reply
#13
bug
cucker tarlson said:
what software can benefit from avx ? rendering ? games ? or just number crunching ?
It's primarily about number crunching. But if you use that number crunching to improve AI (for example), this can also mean more challenging/fun games.
Posted on Reply
#14
Th3pwn3r
Intel drops more processors than I take dumps.
Posted on Reply
#15
piloponth
cucker tarlson said:
what software can benefit from avx ? rendering ? games ? or just number crunching ?
Posted on Reply
#16
piloponth
x265 HEVC encoder can utilize AVX
Posted on Reply
#17
EarthDog
RejZoR said:
So, in a nutshell, don't buy Kaby Lake now, wait for Cannon Lake next year...
If you use AVX512 otherwise, no.
Posted on Reply
#18
TheinsanegamerN
londiste said:
Good candidate is anything that does parallelization or vectorization, basically running same operations on a lot of data. Image or video processing (Adobe stuff), encoding/decoding (ffmpeg), compression (7zip, WinRAR) plus obviously anything that does a lot of math (Excel).
Also going to add- emulation. Emulating more complex game consoles depends on high FPU calculations. PCSX2 got a nice boost from AVX.

Things like AVX-512 will probably be a near requirement for x360/PS3 emulation, if we ever get there.
Posted on Reply
#19
efikkan
Camm said:
AVX512 - where either the vector unit runs, or your CPU runs (as thermally, the vector unit throttles the shit out of the CPU).
Sure, the CPU clocks down when running AVX instructions, but the efficiency gains of AVX is so massive it will still outperform pure ALU/FPU operations by a large factor.

Camm said:

Intel needs to solve that before I'll get excited about AVX512 (as lets be honest, its only generally useful for the 1% of stuff I can't send to the GPU in the first place).
AVX-512 is massive, it can't reach high clocks.
The real problem is it will take years before consumer software will utilize it.
As with other CPU instructions the software has to be compiled to use this feature. In some cases compilers can automatically vectorize certain structures (requires compiler flags), but usually the programmer has to apply specific intrinsics, which basically are macros mapping almost directly to assembly.

cucker tarlson said:
I've been hearing about avx since haswell, could someone explain it to a simpleton ?
A vector unit is able to process multiple pieces of data at once, e.g. a AVX-512 unit can process up to 1x512-bit, 2x256-bit, 4x128-bit, 8x64-bit, 16x32-bit, etc. operations per clock. Each CPU core may contain multiple AVX and FMA units on different execution ports, some only do e.g. multiplication.

RejZoR said:
So, in a nutshell, don't buy Kaby Lake now, wait for Cannon Lake next year...
If you need AVX-512…
Also, Ice Lake will be the next desktop archtecture.
Posted on Reply
#20
GoldenX
I bet Celeron and Pentium won't get it as always.
Posted on Reply
#21
Octopuss
Did someone say Intel will do a paper launch of the next generation of CPUs this year? Coffee Lake has JUST been released.
Posted on Reply
#22
2901BitSlice
Here is a screen shot of a leaked table of potential Intel CPUs. This came out of China and there are spelling errors. 'cores/treads' the H got lost.

Posted on Reply
#23
Prima.Vera
RejZoR said:
So, in a nutshell, don't buy Kaby Lake now, wait for Cannon Lake next year...
That's what I'm keep saying. Especially that we'll get 8 Cores and another new Mobo...
Posted on Reply
#24
EarthDog
Theres aways something better around the corner. Take this advice and nobody will ever buy.
Posted on Reply
#25
efikkan
2901BitSlice said:
Here is a screen shot of a leaked table of potential Intel CPUs. This came out of China and there are spelling errors. 'cores/treads' the H got lost.


This is not a leak, just someone creating a table of guesses. This is certainly not anything from Intel.

Core configurations are usually decided during tapeout, and clocks and model names closer to launch. Even Intel doesn't know yet what the models will look like.
And I like the socket names; old socket +10 :p

-----

The source from Anandtech is actually quite an interesting read. It also provides some early indications on what Ice Lake will bring, both in terms of new AVX features and other instructions.

What I find most interesting is the "Fast Short REP MOV". Those of you with experience with assembly, knows a CPU spends a lot of cycles not only moving data from memory to CPU registers, but also shuffling around the registers to be able to execute the next ALU or FPU instruction. A single ALU/FPU operation may require up to 3-4 MOV operations. It may seem very wasteful to spend clock cycles just moving a few bits instead of spending them calculating stuff, so anything which helps reduce these "wasteful" operations will help throughput without increasing computational resources.

Additionally Cannon Lake will add support for SHA-NI, which brings acceleration of SHA and MD5. Surely this will bring like a 100× acceleration for such algorithms, but I'm a firm believer that algorithm-specific instructions don't belong in a general purpose CPU. Whether it's algorithms for cryptography or compression, these algorithms keep evolving making acceleration quickly outdated. SHA and MD5 are already outdated in cryptography, so these are surely added just to show some gains in some specific benchmarks for enterprise customers. For general purpose use, this acceleration is mostly a waste of die space and energy consumption. How much of your CPU time is really spent on AES, SHA, MD5, etc? Probably less than 1%, unless you run some kind of web server, which is why I believe these features belong in specialized processors for such workloads. Back in the 80s, Intel made specialized co-processors for math(8087, etc.), I think they should have used this approach for special enterprise features.
Posted on Reply
Add your own comment