• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Itanium Reaches End of the Road with Linux Kernel Stopping Updates

And did that automatically do the SIMD scheduling that was the only thing that could have made Itanium fast?

Yes. For Intel for IA-64 compiler default options set to create as fastest as possible binaries. For example:
...
/O2 optimize for maximum speed (DEFAULT)
...
/Qvec[-] enables(DEFAULT)/disables vectorization
...

OpenMP was the easiest way to enable parallelization ( in our codes almost all for-loops have #pragma omp ... directives ):
...
/Qopenmp enable the compiler to generate multi-threaded code based on the OpenMP* directives
...

Auto-Parallelization was also available (!):
...
/Qparallel enable the auto-parallelizer to generate multi-threaded code for loops that can be safely executed in parallel
...

SIMD-like features for explicit application of vectorization was also available from fvec.h and dvec.h:
...
const union
{
int i[4];
__m128d m;
} __f64vec2_abs_mask_cheat = {0xffffffff, 0x7fffffff, 0xffffffff, 0x7fffffff};

#define _f64vec2_abs_mask ((F64vec2)__f64vec2_abs_mask_cheat.m)

/* EMM Functionality Intrinsics */

class I8vec16; /* 16 elements, each element a signed or unsigned char data type */
class Is8vec16; /* 16 elements, each element a signed char data type */
class Iu8vec16; /* 16 elements, each element an unsigned char data type */
class I16vec8; /* 8 elements, each element a signed or unsigned short */
class Is16vec8; /* 8 elements, each element a signed short */
class Iu16vec8; /* 8 elements, each element an unsigned short */
class I32vec4; /* 4 elements, each element a signed or unsigned long */
class Is32vec4; /* 4 elements, each element a signed long */
class Iu32vec4; /* 4 elements, each element a unsigned long */
class I64vec2; /* 2 element, each a __m64 data type */
class I128vec1; /* 1 element, a __m128i data type */
...
 
Last edited:
Ah, I thought ia64 has been dead for a long time...
 
Yes. For Intel for IA-64 compiler default options set to create as fastest as possible binaries. For example:
...
/O2 optimize for maximum speed (DEFAULT)
...
/Qvec[-] enables(DEFAULT)/disables vectorization
...

OpenMP was the easiest way to enable parallelization ( in our codes almost all for-loops have #pragma omp ... directives ):
...
/Qopenmp enable the compiler to generate multi-threaded code based on the OpenMP* directives
...

Auto-Parallelization was also available (!):
...
/Qparallel enable the auto-parallelizer to generate multi-threaded code for loops that can be safely executed in parallel
...

SIMD-like features for explicit application of vectorization was also available from fvec.h and dvec.h:
...

Yeah, but again, it doesn't introduce SIMD instructions when compiling code that it not using explicit vectorization. So any random program you compile on there is slow unless you put serious work into it.

And automatically threading is not specific to Itanium.
 
It's not just a different architecture, they literally made closed so no backward compatibility at all. The market was like yea ok bro.
That's literally just a different architecture.

Anyways, an emulation layer existed but sucked IIRC.

Did Linux ever drop support for an entire processor architecture before?
Yep. Alpha comes to mind. PA-RISC as well.
 
Last edited:
Can you imagine what the computing landscape would have looked like if Intel and HP got away with it?
HP as secondary IA-64 CPU source is not competitive just like their PA-RISC.

Itanium supported big-endian mode for HP's 68K and PA-RISC Unix product lines. Itanium supports both x86's little endian and 68K/PA-RISC's big endian modes.

ARMv8 can run in big endian mode and proven to run big endian mode with Amiga related PiStorm Emu68 (hypervisor level 68K to ARM JIT translator).
 
Itanium supported big-endian mode for HP's 68K and PA-RISC Unix product lines. Itanium supports both x86's little endian and 68K/PA-RISC's big endian modes.

ARMv8 can run in big endian mode and proven to run big endian mode with Amiga related PiStorm Emu68 (hypervisor level 68K to ARM JIT translator).
Is that important? I mean, is there a significant overhead if a big-endian CPU emulates a little-endian CPU (or vice versa) and has to reorder the bytes?
 
Is that important? I mean, is there a significant overhead if a big-endian CPU emulates a little-endian CPU (or vice versa) and has to reorder the bytes?
I'm pretty sure that's a big emu penalty yes.
 
Is that important? I mean, is there a significant overhead if a big-endian CPU emulates a little-endian CPU (or vice versa) and has to reorder the bytes?

For HP UX, a native big endian mode is one less translation overhead.

Most Linux ARM and MacOS ARM builds are a little-endian like the X86. X86 can translate big endian into little endian with a 486-era BSWAP instruction.
 
Back
Top