• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

It should work ok for the small instruction windows that fits into the pipeline at any given moment.
The CPU can really only see the instruction stream within a short window. Any branching will of course increase the potential instruction streams beyond the conditional, and gets worse if there are multiple conditionals, even if it's technically the same in both branches, it still creates new branches anywhere it occurs, so two conditionals may give up to 4 branches, 3 up to 8, and so on. This gets really hard with data dependencies, and if the CPU tries to execute things out-of-order. You will quickly need more resources on die than is realistic.

Also keep in mind that the CPU can't see beyond a memory access until it's dereferenced, and the same with any memory access with a data dependency, like
Code:
variable = array[some_number + some_other_number];
The CPU will try to execute these out-of-order as early as possible, but that will only save a few clock cycles of idle. The cost of a cache miss is up to ~400 clocks for Skylake, and for a misprediction it's up to 19 cycles for the flush plus any delays from fetching the new instructions, which can be even a instruction cache miss if it's a long jump! The instruction window for Skylake is 224, and I believe it can decode 6 instructions or so per cycle, so it doesn't take a lot before it's virtually "walking blindly". And as you can see, even a single random memory access can't be found in time to prefetch it, and often there are multiple data dependencies in a chain, leaving the CPU stalled most of the time. The only memory accesses it can do ahead of time without a stall are linear accesses, where it guesses beyond the instruction window. Something as simple as a function call or a pointer dereference will in most cases cause a cache miss. The same with innocent looking conditionals, like:
Code:
if (a && b && (c > 2)) {}
Put something like this inside a loop and you'll kill performance very quickly. Even worse, function calls with inheritance in OOP; while it might suit your coding desires, doing it in a critical part of the code can easily make a peformance difference of >100×.

But I'm just speculating (see what I did there?).
;)
 
Only benchmark I could find:
41009840.jpg

Xeon 20.1474609375
Itanium 8.8193359375
Opteron 11.516927083333333333333333333333
Itanium 13.1015625

That's per core. Itanium 2 is nothing to scoff at.

8-Core Itanium Poulson: 3.1 billion transistors
8-Core Xeon Nehalem-EX: 2.3 billion transistors

Interesting article about Poulson (newest Itanium architecture): https://www.realworldtech.com/poulson/

Itanium had 20% of the TOP 500 super computers back in 2004. IA-64 gained traction because x86 lacked memory addressing space. x86-64 reversed that pattern because of backwards compatibility/not having to find Itanium software developers.

12 instructions per clock, 8 cores, and 16 threads at the end of 2012. It was a monster.

Pretty sad in real apps from what I recall. Also, rendered obsolete by intel themselves with nehalem the next year.
 
HP paid to keep it going for at least 18 years. It had its uses. Latest Itanium 2 processors actually came out in 2017.
 
Actual development of Itanium was discontinued shortly after the launch of Itanium 2. Intel did have long-term commitments though, so they kept tweaking it a bit for some time.
 
That would be wrong (see link above). Itanium 2 debuted in 2002 on 180nm. Poulson (released in 2012) was a huge makeover for the architecture on the 32nm node. Kittson (released in 2017) was supposed to be a 22nm node shrink of Poulson but, for reasons unknown, 22nm was abandoned and Kittson was produced on a matured 32nm node.
 
Back
Top