• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Apple to End the x86 Mac Era in 2020

Joined
Feb 18, 2005
Messages
5,238 (0.75/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Logitech G613
Software Windows 10 Professional x64
This will be interesting to watch.

Everyone claimed apple going for their own CPU design would end in failure. Today, Apple's A series are the best ARM processors bar none, they flat out embarass the best qualcomm can offer.

Then apple went to design their own GPU, everyone claimed it would end in failure. the iphone 8 is supposed to use that chip, and it promises to be quite advanced even compared to previous A series chips.

So if apple says they are going to move from x86 to their own chips, and pulls of the GPU, I'd place bets on them pulling it off. again.

I have to agree. While ARM remains, quite frankly, s**t, if there's any company with the resources to make it competitive with x86, it's Apple. And they wouldn't be making this announcement unless they've already made significant progress in that regard.

... alternatively, they might have realised that it's not possible and are just trying to squeeze Intel into giving them a better deal on the next few generations of chips. We'll see.

Even if we do assume devs move over it will not happen instantly. How long are creatives willing to wait while devs migrate their code? Given the complexity of modern adobe software, I don't expect it would be quick. Sure you can emulate x86 but the performance is terrible.

Apple's business model is selling overpriced tat to morons, and that extends to the software for that platform. If you write software for Mac, you can charge a lot more for it than you could if you wrote it for Windows. So developers will pay out of their own pockets to port their code from x86 to ARM for Apple, because they will make more money in the long run.

And for Apple, the beauty is that they don't have to care about any of this. When the people outside your walled garden are shouting to get in because the garden is made of solid gold, you can afford to sit back and let survival of the fittest (or deepest pockets) win.
 
Joined
Aug 20, 2007
Messages
20,789 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64
x86 is awesome because it's CISC.

You missed the part where x86 translates instructions to RISC in the micro-ops.

Instruction sets are overrated and mean little... All the big names are more than mature enough now. The microarchitecture behind them matters more.
 
Joined
Jan 16, 2008
Messages
1,349 (0.23/day)
Location
Milwaukee, Wisconsin, USA
Processor i7-3770K
Motherboard Biostar Hi-Fi Z77
Cooling Swiftech H20 (w/Custom External Rad Enclosure)
Memory 16GB DDR3-2400Mhz
Video Card(s) Alienware GTX 1070
Storage 1TB Samsung 850 EVO
Display(s) 32" LG 1440p
Case Cooler Master 690 (w/Mods)
Audio Device(s) Creative X-Fi Titanium
Power Supply Corsair 750-TX
Mouse Logitech G5
Keyboard G. Skill Mechanical
Software Windows 10 (X64)
I have to agree. While ARM remains, quite frankly, s**t, if there's any company with the resources to make it competitive with x86, it's Apple. And they wouldn't be making this announcement unless they've already made significant progress in that regard.

... alternatively, they might have realised that it's not possible and are just trying to squeeze Intel into giving them a better deal on the next few generations of chips. We'll see.



Apple's business model is selling overpriced tat to morons, and that extends to the software for that platform. If you write software for Mac, you can charge a lot more for it than you could if you wrote it for Windows. So developers will pay out of their own pockets to port their code from x86 to ARM for Apple, because they will make more money in the long run.

And for Apple, the beauty is that they don't have to care about any of this. When the people outside your walled garden are shouting to get in because the garden is made of solid gold, you can afford to sit back and let survival of the fittest (or deepest pockets) win.

At least those "morons" how to spell and proofread.
 
Joined
Mar 6, 2017
Messages
3,211 (1.23/day)
Location
North East Ohio, USA
System Name My Ryzen 7 7700X Super Computer
Processor AMD Ryzen 7 7700X
Motherboard Gigabyte B650 Aorus Elite AX
Cooling DeepCool AK620 with Arctic Silver 5
Memory 2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s) XFX AMD Radeon RX 7900 GRE
Storage Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s) Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case Lian Li LANCOOL II MESH C
Audio Device(s) On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply MSI A850GF
Mouse Logitech M705
Keyboard Steelseries
Software Windows 11 Pro 64-bit
Benchmark Scores https://valid.x86.fr/liwjs3
Even if we do assume devs move over it will not happen instantly. How long are creatives willing to wait while devs migrate their code? Given the complexity of modern adobe software, I don't expect it would be quick. Sure you can emulate x86 but the performance is terrible.
Correct me if I'm wrong (because I probably am) but unless Adobe used low-level Assembly code to do some of the hard core image and video manipulation magic that their tools do, wouldn't it be just as simple as recompiling the C/C++ code against a new ISA (Instruction Set Architecture)? I mean, that is the reason why we have compilers right? So that we can write in higher level languages and be able to port it over to other architectures, right?
 
Joined
Jan 8, 2017
Messages
8,944 (3.35/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
You missed the part where x86 translates instructions to RISC in the micro-ops.

Instruction sets are overrated and mean little... All the big names are more than mature enough now. The microarchitecture behind them matters more.

Doesn't really matter , it's still CISC at a higher level of abstraction and those micro-ops still constitute more robust instructions. For example , ARM needs separated instructions for memory operations whereas with x86 pretty much all instructions allow for complex addressing methods encoded within the instruction itself. Memory operations are slow , which means that you would have to find ways to keep an ARM core busy more often than an x86 one and inevitably you wont achieve the same efficiency.

ARM designs will always inherent disadvantages which just can't be mitigated due to it's RISC nature , instruction sets do matter , quite a lot.

Than being said either Apple will be shooting themselves in the foot attempting to become independent in a way which simply ins't fit for their current product stack , or they will just change said products , aka turning them into glorified iOS devices.

I mean, that is the reason why we have compilers right? So that we can write in higher level languages and be able to port it over to other architectures, right?

And with a potential overhead , which can be significant in some cases. He is right , x86 software on ARM will be atrocious.
 
Last edited:
Joined
Mar 6, 2017
Messages
3,211 (1.23/day)
Location
North East Ohio, USA
System Name My Ryzen 7 7700X Super Computer
Processor AMD Ryzen 7 7700X
Motherboard Gigabyte B650 Aorus Elite AX
Cooling DeepCool AK620 with Arctic Silver 5
Memory 2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s) XFX AMD Radeon RX 7900 GRE
Storage Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s) Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case Lian Li LANCOOL II MESH C
Audio Device(s) On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply MSI A850GF
Mouse Logitech M705
Keyboard Steelseries
Software Windows 11 Pro 64-bit
Benchmark Scores https://valid.x86.fr/liwjs3
turning them glorified iOS devices as laptops/desktops
That's what I see will eventually happen, not just to the Mac but to all general purpose computing. If you ask most industry pundits and analysts who are far smarter than I am they will tell you that the desktop as we know it today will be dead within the next ten years. The majority of us will be using mobile devices with walled gardens that can become a desktop using cradle-like accessories.
 
Joined
Aug 20, 2007
Messages
20,789 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64
Doesn't really matter , it's still CISC at a higher level of abstraction and those micro-ops still constitute more robust instructions. For example , ARM needs separated instructions for memory operations whereas with x86 pretty much all instructions allow for complex addressing methods encoded within the instruction itself. Memory operations are slow , which means that you would have to find ways to keep an ARM core busy more often than an x86 one and inevitably you wont achieve the same efficiency.

ARM designs will always inherent disadvantages which just can't be mitigated due to it's RISC nature , instruction sets do matter , quite a lot.

I think I know a bit more about this than you are giving me credit for (I've actually written assembly level code for several platforms, all the way back to my NES which lacked a multiplication instruction in fact).

What you just said made no sense. If it's being translated to RISC how in the world can a CISC instruction run as anything but RISC at final runtime? If it was going to be busy during a memory access it will be busy. As in, it's all the same at the end game, it's just easier on the compiler if anything to "think" in CISC.

Illustration: I wrote a mulitplication code macro for my NES using a very light derivitave of homebrew basic someone made for it way back when. It multiplied using the old school "additive method," adding the first number over and over the set number of times in the second. It could be called in one line, but it still tied up the CPU for a godawful length of time. Conceptually, this was a "CISC" instruction of sorts, but the backend RISC was holding it up.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
*cough* https://en.wikipedia.org/wiki/X86_instruction_listings

RISC is silicon efficient; CISC is process efficient.

Case in point, ARM has no instructions dedicated to virtual machines. I'm pretty sure that Windows 10 natively runs in a virtual machine on systems that support it for security reasons (you can't disable it).
 
Last edited:
Joined
Jan 8, 2017
Messages
8,944 (3.35/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
What you just said made no sense.

All I said is arithmetic instructions that also perform memory access calls for better efficiency. You are insinuating , as far as I can tell , that it does not and that nowadays x86 is pretty much interchangeable with ARM because they both use RISC-like micro-ops.

If it was going to be busy during a memory access it will be busy. .

Yes , but you need to fetch more instructions in order to achieve the same thing.

Conceptually, this was a "CISC" instruction of sorts, but the backend RISC was holding it up.

Of course that would happen , you are emulating CISC behavior on hardware that was not made for that. x86 cores , despite being RISC-like under the hood now are still designed with the complex instructions in mind and accompanying microcode optimizations wheres that NES was clearly not.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Nothing RISC about x86. Compare the above link with ARM:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf

When processing SIMD, some x86 instructions hijack the FPUs and ALUs. Sure, ALUs and FPUs only understand a reduced set of instructions but it's the instruction decoder at the top of the processor that determines RISC/CISC, not components inside.
 
Joined
Apr 19, 2013
Messages
296 (0.07/day)
System Name Darkside
Processor R7 3700X
Motherboard Aorus Elite X570
Cooling Deepcool Gammaxx l240
Memory Thermaltake Toughram DDR4 3600MHz CL18
Video Card(s) Gigabyte RX Vega 64 Gaming OC
Storage ADATA & WD 500GB NVME PCIe 3.0, many WD Black 1-3TB HD
Display(s) Samsung C27JG5x
Case Thermaltake Level 20 XL
Audio Device(s) iFi xDSD / micro iTube2 / micro iCAN SE
Power Supply EVGA 750W G2
Mouse Corsair M65
Keyboard Corsair K70 LUX RGB
Benchmark Scores Not sure, don't care
Bwahahahaha.... Nothing but conjecture! Every few months this story pops up and still NOTHING official from Apple nor it's developers. It makes zero sense for Apple to ditch the X86 arch. simply because the whole reason Apple switched to Intel in the first place was for 100% comparability with Windows software (all while making it easier on developers). I will bet the house come 2020 yet another "Apple to ditch Intel" story pops up (TPU will jump on the bandwagon for clicks) stating that by 2024 Apple might use their own CPUs in limited capacity for the notebook line.
 
Joined
Aug 20, 2007
Messages
20,789 (3.41/day)
System Name Pioneer
Processor Ryzen R9 7950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage 2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64
You know, of all the instruction sets I played with, x86 ironically is not one of them. My assembly knowledge may be slightly out of play here, so I'll admit I could be out of my element and completely wrong. I will defer to those more in the know, as my knowledge is only second hand and conceptual.
 
Joined
Sep 14, 2017
Messages
610 (0.25/day)
Never owned anything Apple and probably never will but still, this is exciting, just for the fact it'll shake up the industry a bit.
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Correct me if I'm wrong (because I probably am) but unless Adobe used low-level Assembly code to do some of the hard core image and video manipulation magic that their tools do, wouldn't it be just as simple as recompiling the C/C++ code against a new ISA (Instruction Set Architecture)? I mean, that is the reason why we have compilers right? So that we can write in higher level languages and be able to port it over to other architectures, right?
In theory, normal programs can be recompiled like you say. In-line assembly is not that common, but nearly all high-performance programs rely on intrinsics for AVX, SSE, FMA, etc. These intrinsics are low-level macros which directly maps to assembly. If these features are implemented as optional features, the developer can of course just disable them and recompile. But in cases such as programs from Adobe, the performance will be terrible. Rewriting the program to use different intrinsics for a new architecture require some effort, but is not extremely hard.

Nothing RISC about x86. Compare the above link with ARM:

http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/DUI0068.pdf
You are still confusing ISA (Instruction Set Architecture) with CPU architectures. x86 in its pure form is CISC, but all current implementations of x86 are RISC implementations which translates x86 into RISC-style micro-operations.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
You are still confusing ISA (Instruction Set Architecture) with CPU architectures. x86 in its pure form is CISC, but all current implementations of x86 are RISC implementations which translates x86 into RISC-style micro-operations.
Fundamentally what separates CISC and RISC is that RISC must load the data into a register, execute an instruction on the registers, and store the result into memory (load-store). A CISC instruction can address a register or a memory address and the processor subsystems will pull the necessary data from the memory, execute the instruction, then the result can be pushed back to memory. Said differently, CISC memory operations are implicit where in RISC, they are explicit.
 
Joined
Mar 6, 2017
Messages
3,211 (1.23/day)
Location
North East Ohio, USA
System Name My Ryzen 7 7700X Super Computer
Processor AMD Ryzen 7 7700X
Motherboard Gigabyte B650 Aorus Elite AX
Cooling DeepCool AK620 with Arctic Silver 5
Memory 2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s) XFX AMD Radeon RX 7900 GRE
Storage Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s) Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case Lian Li LANCOOL II MESH C
Audio Device(s) On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply MSI A850GF
Mouse Logitech M705
Keyboard Steelseries
Software Windows 11 Pro 64-bit
Benchmark Scores https://valid.x86.fr/liwjs3
Rewriting the program to use different intrinsics for a new architecture require some effort, but is not extremely hard.
Wouldn't that be up to the compiler though? The compiler does the heavy lifting when it comes to optimizing the resulting machine code, humans write the C/C++ code and the compiler does the hard work. One would think that perhaps with the giant leaps that ARM has taken over the last couple of years that extensions like AVX, SSE, FMA, etc. are in the pipeline for ARM, it's just a matter of time until they reach the public. It would then be up to the compilers to take advantage of those new ARM extensions.
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Rewriting the program to use different intrinsics for a new architecture require some effort, but is not extremely hard.
Wouldn't that be up to the compiler though? The compiler does the heavy lifting when it comes to optimizing the resulting machine code, humans write the C/C++ code and the compiler does the hard work. One would think that perhaps with the giant leaps that ARM has taken over the last couple of years that extensions like AVX, SSE, FMA, etc. are in the pipeline for ARM, it's just a matter of time until they reach the public. It would then be up to the compilers to take advantage of those new ARM extensions.
I think you are misunderstanding how these intrinsics works. Compilers can introduce optimizations themselves at compile time, and this will work fine, but that's not what I'm talking about here.

Most intrinsics (that I'm familiar with at least) are closely or directly mapped to assembly instructions. If the specific ARM implementation have a comparable extension with matching parameters, then surely the compiler could convert them (in theory), but extensions like AVX etc. are closely linked to how AVX is implemented on x86 designs, an automatic translation to another vector extension could result in sub-optimal use or even performance loss vs. normal instructions.

It's important to understand that intrinsics are usually only used in the most performance critical part of a program's code. When used properly, the alignment of data in memory is meticulously designed in order to scale well with those specific intrinsics. Switching to another set of intrinsics may require realignment of data structures and code logic to get maximum performance. Vector extensions are especially sensitive, and using these well or not can easily make a >10× difference in performance.
 
Joined
Mar 6, 2017
Messages
3,211 (1.23/day)
Location
North East Ohio, USA
System Name My Ryzen 7 7700X Super Computer
Processor AMD Ryzen 7 7700X
Motherboard Gigabyte B650 Aorus Elite AX
Cooling DeepCool AK620 with Arctic Silver 5
Memory 2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s) XFX AMD Radeon RX 7900 GRE
Storage Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s) Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case Lian Li LANCOOL II MESH C
Audio Device(s) On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply MSI A850GF
Mouse Logitech M705
Keyboard Steelseries
Software Windows 11 Pro 64-bit
Benchmark Scores https://valid.x86.fr/liwjs3
I think you are misunderstanding how these intrinsics works.
And I probably don't understand it at all, I've not written any C/C++ code; my experience is in much higher languages like C# and VB.NET. I always figured that when you write C/C++ code to do something the compiler ultimately decides how that job is done when it comes to the machine code that's generated. Multiple optimization paths will of course result in much more optimized machine code but of course that takes more time to compile.
 
Joined
Jan 8, 2017
Messages
8,944 (3.35/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Wouldn't that be up to the compiler though? The compiler does the heavy lifting when it comes to optimizing the resulting machine code, humans write the C/C++ code and the compiler does the hard work. One would think that perhaps with the giant leaps that ARM has taken over the last couple of years that extensions like AVX, SSE, FMA, etc. are in the pipeline for ARM, it's just a matter of time until they reach the public. It would then be up to the compilers to take advantage of those new ARM extensions.

Most compilers do a pretty bad job at vectorization due to how convoluted one can write code which translates to data level parallelism. This was a rather unfortunate comparisons , an x86 compiler wont vectorize most workloads and neither will one that also translates x86 into ARM. And intrinsics are sparingly used anyway.
 
Joined
Mar 6, 2017
Messages
3,211 (1.23/day)
Location
North East Ohio, USA
System Name My Ryzen 7 7700X Super Computer
Processor AMD Ryzen 7 7700X
Motherboard Gigabyte B650 Aorus Elite AX
Cooling DeepCool AK620 with Arctic Silver 5
Memory 2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s) XFX AMD Radeon RX 7900 GRE
Storage Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s) Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case Lian Li LANCOOL II MESH C
Audio Device(s) On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply MSI A850GF
Mouse Logitech M705
Keyboard Steelseries
Software Windows 11 Pro 64-bit
Benchmark Scores https://valid.x86.fr/liwjs3
Most compilers do a pretty bad job at vectorization due to how convoluted one can write code which translates to data level parallelism.
SISO which translates to "shit in, shit out". Yes, badly written C code is going to result in a badly compiled program (duh!). No amount optimizations at the compiler level is going to turn lead (badly written code) into gold. It's up to the human writing the C code to write better code, it's always come down to this very simple thing.
 
Joined
Jun 10, 2014
Messages
2,902 (0.80/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
And I probably don't understand it at all, I've not written any C/C++ code; my experience is in much higher languages like C# and VB.NET. I always figured that when you write C/C++ code to do something the compiler ultimately decides how that job is done when it comes to the machine code that's generated. Multiple optimization paths will of course result in much more optimized machine code but of course that takes more time to compile.
It's important to understand that the realm of automatic optimizations the compilers can actually is very limited. It does of course help a bit, and in some cases gives a nice boost, but will never compare to writing proper low-level code.

Compilers are for instance very good at optimizing small things that comes down to syntax; like unrolling small loops, rearranging some accesses, etc. But they can never deal with the "big stuff", like scaling problems resulting from your design choices.

If you want to hear some good explanations about how efficient code works, take a look at these:
CppCon 2014: Mike Acton "Data-Oriented Design and C++"
code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care
Even if you don't grasp all the details, it should still be an eye-opener of how much the structure of the code matters.

It's up to the human writing the C code to write better code, it's always come down to this very simple thing.
Yes, it always comes down to the skills of the coder and the understanding of the problem to be solved.

As Vya Domus mentions, vector instructions exploit data level parallelism. No compiler can ever optimize your code to make this parallelism, you have to make tightly packed data structures which matches the way you are going to process them.

Let's say you have 100 calculations in the form of A + B = C, then usually this will be compiled to two instructions fetching A and B into registers, one instruction to do the addition, and then one instruction to copy the sum back to memory. If you want to exploit AVX, you'll first have to align your data structures,
not like this: A0 B0 C0 A1 B1 C1 …
But like this:
A0 A1 A2 A3 A4 …
B0 B1 B2 B3 B4 …
C0 C1 C2 C3 C4 …
If you are using AVX2 on 32-bit floats, you can compute 8 additions per cycle. But you can't do this if your data is fragmented, which it might be in a typical OOP structure with data scattered across hundreds of objects.

Applications using intrinsics may only use them in a few functions (typically some "tight" loops), but the data structure might be shared with major parts of the codebase. So the developers usually have to be aware of the constraints even when they are not touching these parts of the code.

I don't know what Vya Domus means by intrinsics being sparingly used. It is used in many applications that matter for productivity; like Adobe programs, (3D) modelers, simulators, encoders etc, and essential libraries for compression etc. It's rarely used in games, and even if used, "never" impacts rendering performance. But as I mentioned, even when it's used, it's usually just a small percentage of the code.

To get back on topic; many performance critical applications can't be recompiled to another architecture and maintain acceptable performance without optimizations.
 
Joined
Feb 18, 2005
Messages
5,238 (0.75/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Logitech G613
Software Windows 10 Professional x64

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.63/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Top