Intel's IPO Program Supercharges Underperforming "Arrow Lake" Chips, but Only in China for Now

Prima.Vera · Apr 15, 2025

I forsee the return of the dual, ever triple GPU cards in the future, and dual CPU boards for mainstream in order to keep the performance increase. Sadly the power consumption will increase exponentially, and I would not be surprised at all to see 2KW PSUs being the norm 10 years from now, with a system price simmilar to what was in the 70's and 80's. Sadly this is the reality going to wait for us in the next decade....

efikkan · Apr 15, 2025

Prima.Vera said:
This are not the '90s anymore. We are already at the limit of the physical transistor size, gateway size, number of transistors, etc.

Not quite, we might not be shrinking the transistors a lot further, but we can stack many more of them on top of each other. Chips today only have a few layers of transistors. We are still in the early days of stacking, and while I don't expect progress to be comparable to the 90s, there will be huge advancements over the next couple of decades.

Prima.Vera said:
I forsee the return of the dual, ever triple GPU cards in the future, and dual CPU boards for mainstream in order to keep the performance increase. Sadly the power consumption will increase exponentially, and I would not be surprised at all to see 2KW PSUs being the norm 10 years from now, with a system price simmilar to what was in the 70's and 80's. Sadly this is the reality going to wait for us in the next decade....

Considering the Threadrippers, Xeon Ws and server processors which already exists, we already can throw more cores at a single socket than is practically useful for non-server workloads. (If they wanted to, we could have 128+ core, 8/12 channel RAM and 128+ PCIe lanes on "mainstream" right now.)

But with interactive (non-batch) workloads there will always be diminishing returns with multithreading, so we need to make cores faster if we want to continue performance scaling. Increasing IPC is obviously the main contributor here, but let's not forget ISA improvements. Increasing the core performance also have the advantage of making other overhead relatively smaller (kernel, driver, library, etc.), so unless they keep adding more bloat, it may even help multithreading scale even further.

When we get to a point where a single core can consist of multiple stacked "dies" closely connected together, it "removes" a major design constraint for CPU designs, allowing for more significant advances in core design than we've seen in over a decade. Still today, CPU cores are underutilized thanks to branch mispredictions and cache misses, even a small percentage improvement in this area will make more of a difference than adding another core. AMD tried something novel with their Zen 5 design, which apparently added prediction for two branches, but I haven't seen any clear improvement as a result. While I'm sure having an alternative branch prefetched helps in some edge cases, it still doesn't solve the penalty of stalling, flushing the pipeline, and re-executing instructions. (Interestingly, Meteor Lake introduced changes which improves faster recovery after a misprediction, so attacking the problem from another angle.) But even if these issues were improved upon, the bigger problem remains; quite often there are a lot of branching, and the amount of branches doubles with each branching instruction, so predicting or even executing parallel branches is not going to get you very far. Some branching is inevitable, and some can't possibly be predicted 100% either, but the vast majority of them aren't affecting the overall control flow. (I often like to refer to these as "false branching" (my term).) This means if the ISA were improved, the compiler should be able to remove these. This seems to be what Intel is trying to solve with APX, but time will show whether they're successful. But if all the unnecessary branching are removed, then perhaps such innovations as in Zen 5 will start to be more efficient too.

So, I'm saying all this to illustrate that we are nowhere close to the theoretical limit of performance per core. In some ways I'm more positive than I've been for many years, but in others I'm concerned that the entire field of CS is diverting all focus on "AI". And this goes for GPUs too; and that "AI" bubble will burst at some point, and some ASIC is going to do it much more efficiently. But let's not go down that rabbit hole…

Prima.Vera · Apr 15, 2025

efikkan said:
Not quite, we might not be shrinking the transistors a lot further, but we can stack many more of them on top of each other. Chips today only have a few layers of transistors. We are still in the early days of stacking, and while I don't expect progress to be comparable to the 90s, there will be huge advancements over the next couple of decades.

The more transistors they will pack, the more power, they will consume, the more cooling will be required, but most importantly, the yields will become abysmal low. Right now they are less than 50% (ignore the fake, marketing data claiming 60%+ or more), in the future, they might become around 10-20% ... So that's that.

efikkan · Apr 15, 2025

Prima.Vera said:
The more transistors they will pack, the more power, they will consume, the more cooling will be required, but most importantly, the yields will become abysmal low.

Not if newer and more efficient architectures get significantly higher performance per clock, allowing them to run at lower clock speeds while achieving higher performance than before. Imagine cores running at ~1.5-2 GHz, but having multiple times more execution ports/resources than today, and ISA changes like I mentioned above, allowing for less stalls and much higher throughput.

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	KUROUTOSHIKOU RTX 5080 GALAKURO
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu