• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel's IPO Program Supercharges Underperforming "Arrow Lake" Chips, but Only in China for Now

I forsee the return of the dual, ever triple GPU cards in the future, and dual CPU boards for mainstream in order to keep the performance increase. Sadly the power consumption will increase exponentially, and I would not be surprised at all to see 2KW PSUs being the norm 10 years from now, with a system price simmilar to what was in the 70's and 80's. Sadly this is the reality going to wait for us in the next decade....
 
This are not the '90s anymore. We are already at the limit of the physical transistor size, gateway size, number of transistors, etc.
Not quite, we might not be shrinking the transistors a lot further, but we can stack many more of them on top of each other. Chips today only have a few layers of transistors. We are still in the early days of stacking, and while I don't expect progress to be comparable to the 90s, there will be huge advancements over the next couple of decades.

I forsee the return of the dual, ever triple GPU cards in the future, and dual CPU boards for mainstream in order to keep the performance increase. Sadly the power consumption will increase exponentially, and I would not be surprised at all to see 2KW PSUs being the norm 10 years from now, with a system price simmilar to what was in the 70's and 80's. Sadly this is the reality going to wait for us in the next decade....
Considering the Threadrippers, Xeon Ws and server processors which already exists, we already can throw more cores at a single socket than is practically useful for non-server workloads. (If they wanted to, we could have 128+ core, 8/12 channel RAM and 128+ PCIe lanes on "mainstream" right now.)

But with interactive (non-batch) workloads there will always be diminishing returns with multithreading, so we need to make cores faster if we want to continue performance scaling. Increasing IPC is obviously the main contributor here, but let's not forget ISA improvements. Increasing the core performance also have the advantage of making other overhead relatively smaller (kernel, driver, library, etc.), so unless they keep adding more bloat, it may even help multithreading scale even further.

When we get to a point where a single core can consist of multiple stacked "dies" closely connected together, it "removes" a major design constraint for CPU designs, allowing for more significant advances in core design than we've seen in over a decade. Still today, CPU cores are underutilized thanks to branch mispredictions and cache misses, even a small percentage improvement in this area will make more of a difference than adding another core. AMD tried something novel with their Zen 5 design, which apparently added prediction for two branches, but I haven't seen any clear improvement as a result. While I'm sure having an alternative branch prefetched helps in some edge cases, it still doesn't solve the penalty of stalling, flushing the pipeline, and re-executing instructions. (Interestingly, Meteor Lake introduced changes which improves faster recovery after a misprediction, so attacking the problem from another angle.) But even if these issues were improved upon, the bigger problem remains; quite often there are a lot of branching, and the amount of branches doubles with each branching instruction, so predicting or even executing parallel branches is not going to get you very far. Some branching is inevitable, and some can't possibly be predicted 100% either, but the vast majority of them aren't affecting the overall control flow. (I often like to refer to these as "false branching" (my term).) This means if the ISA were improved, the compiler should be able to remove these. This seems to be what Intel is trying to solve with APX, but time will show whether they're successful. But if all the unnecessary branching are removed, then perhaps such innovations as in Zen 5 will start to be more efficient too.

So, I'm saying all this to illustrate that we are nowhere close to the theoretical limit of performance per core. In some ways I'm more positive than I've been for many years, but in others I'm concerned that the entire field of CS is diverting all focus on "AI". And this goes for GPUs too; and that "AI" bubble will burst at some point, and some ASIC is going to do it much more efficiently. But let's not go down that rabbit hole…
 
Not quite, we might not be shrinking the transistors a lot further, but we can stack many more of them on top of each other. Chips today only have a few layers of transistors. We are still in the early days of stacking, and while I don't expect progress to be comparable to the 90s, there will be huge advancements over the next couple of decades.
The more transistors they will pack, the more power, they will consume, the more cooling will be required, but most importantly, the yields will become abysmal low. Right now they are less than 50% (ignore the fake, marketing data claiming 60%+ or more), in the future, they might become around 10-20% ... So that's that.
 
The more transistors they will pack, the more power, they will consume, the more cooling will be required, but most importantly, the yields will become abysmal low.
Not if newer and more efficient architectures get significantly higher performance per clock, allowing them to run at lower clock speeds while achieving higher performance than before. Imagine cores running at ~1.5-2 GHz, but having multiple times more execution ports/resources than today, and ISA changes like I mentioned above, allowing for less stalls and much higher throughput.
 
Back
Top