• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

Joined
Jun 10, 2014
Messages
1,823 (0.91/day)
It should work ok for the small instruction windows that fits into the pipeline at any given moment.
The CPU can really only see the instruction stream within a short window. Any branching will of course increase the potential instruction streams beyond the conditional, and gets worse if there are multiple conditionals, even if it's technically the same in both branches, it still creates new branches anywhere it occurs, so two conditionals may give up to 4 branches, 3 up to 8, and so on. This gets really hard with data dependencies, and if the CPU tries to execute things out-of-order. You will quickly need more resources on die than is realistic.

Also keep in mind that the CPU can't see beyond a memory access until it's dereferenced, and the same with any memory access with a data dependency, like
Code:
variable = array[some_number + some_other_number];
The CPU will try to execute these out-of-order as early as possible, but that will only save a few clock cycles of idle. The cost of a cache miss is up to ~400 clocks for Skylake, and for a misprediction it's up to 19 cycles for the flush plus any delays from fetching the new instructions, which can be even a instruction cache miss if it's a long jump! The instruction window for Skylake is 224, and I believe it can decode 6 instructions or so per cycle, so it doesn't take a lot before it's virtually "walking blindly". And as you can see, even a single random memory access can't be found in time to prefetch it, and often there are multiple data dependencies in a chain, leaving the CPU stalled most of the time. The only memory accesses it can do ahead of time without a stall are linear accesses, where it guesses beyond the instruction window. Something as simple as a function call or a pointer dereference will in most cases cause a cache miss. The same with innocent looking conditionals, like:
Code:
if (a && b && (c > 2)) {}
Put something like this inside a loop and you'll kill performance very quickly. Even worse, function calls with inheritance in OOP; while it might suit your coding desires, doing it in a critical part of the code can easily make a peformance difference of >100×.

But I'm just speculating (see what I did there?).
;)
 
Joined
Sep 15, 2007
Messages
3,714 (0.83/day)
Location
Police/Nanny State of America
System Name More hardware than I use :|
Processor 4.7 8350 - 4.2 4560K - 4.4 4690K
Motherboard Sabertooth R2.0 - Gigabyte Z87X-UD4H-CF - AsRock Z97M KIller
Cooling Mugen 2 rev B push/pull - Hyper 212+ push/pull - Hyper 212+
Memory 16GB Gskill - 8GB Gskill - 16GB Ballistix 1.35v
Video Card(s) Xfire OCed 7950s - Powercolor 290x - Oced Zotac 980Ti AMP! (also have two 7870s)
Storage Crucial 250GB SSD, Kingston 3K 120GB, Sammy 1TB, various WDs, 13TB (actual capactity) NAS with WDs
Display(s) X-star 27" 1440 - Auria 27" 1440 - BenQ 24" 1080 - Acer 23" 1080
Case Lian Li open bench - Fractal Design ARC - Thermaltake Cube (still have HAF 932 and more ARCs)
Audio Device(s) Titanium HD - Onkyo HT-RC360 Receiver - BIC America custom 5.1 set up (and extra Klipsch sub)
Power Supply Corsair 850W V2 - EVGA 1000 G2 - Seasonic 500 and 600W units (dead 750W needs RMA lol)
Mouse Logitech G5 - Sentey Revolution Pro - Sentey Lumenata Pro - multiple wireless logitechs
Keyboard Logitech G11s - Thermaltake Challenger
Software I wish I could kill myself instead of using windows (OSX can suck it too).
Only benchmark I could find:

Xeon 20.1474609375
Itanium 8.8193359375
Opteron 11.516927083333333333333333333333
Itanium 13.1015625

That's per core. Itanium 2 is nothing to scoff at.

8-Core Itanium Poulson: 3.1 billion transistors
8-Core Xeon Nehalem-EX: 2.3 billion transistors

Interesting article about Poulson (newest Itanium architecture): https://www.realworldtech.com/poulson/

Itanium had 20% of the TOP 500 super computers back in 2004. IA-64 gained traction because x86 lacked memory addressing space. x86-64 reversed that pattern because of backwards compatibility/not having to find Itanium software developers.

12 instructions per clock, 8 cores, and 16 threads at the end of 2012. It was a monster.
Pretty sad in real apps from what I recall. Also, rendered obsolete by intel themselves with nehalem the next year.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
25,565 (6.27/day)
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) Sapphire Radeon RX 5500 XT Pulse 8 GiB
Storage Crucial MX300 275 GB, Seagate Exos X12 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 VGA)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
HP paid to keep it going for at least 18 years. It had its uses. Latest Itanium 2 processors actually came out in 2017.
 
Joined
Jun 10, 2014
Messages
1,823 (0.91/day)
Actual development of Itanium was discontinued shortly after the launch of Itanium 2. Intel did have long-term commitments though, so they kept tweaking it a bit for some time.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
25,565 (6.27/day)
Location
IA, USA
System Name BY-2015
Processor Intel Core i7-6700K (4 x 4.00 GHz) w/ HT and Turbo on
Motherboard MSI Z170A GAMING M7
Cooling Scythe Kotetsu
Memory 2 x Kingston HyperX DDR4-2133 8 GiB
Video Card(s) Sapphire Radeon RX 5500 XT Pulse 8 GiB
Storage Crucial MX300 275 GB, Seagate Exos X12 TB 7200 RPM
Display(s) Samsung SyncMaster T240 24" LCD (1920x1200 HDMI) + Samsung SyncMaster 906BW 19" LCD (1440x900 VGA)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse SteelSeries Sensei RAW
Keyboard Tesoro Excalibur
Software Windows 10 Pro 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
That would be wrong (see link above). Itanium 2 debuted in 2002 on 180nm. Poulson (released in 2012) was a huge makeover for the architecture on the 32nm node. Kittson (released in 2017) was supposed to be a 22nm node shrink of Poulson but, for reasons unknown, 22nm was abandoned and Kittson was produced on a matured 32nm node.
 
Top