• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

A big.little strategy could work for x86 but the devil will be in the details about how quickly the CPU can transition processes between cores when there is such a performance disparity between the little and big cores. Big.little works as a power saving measure because leakage current scales with transistor numbers, so very large cores have much higher leakage current than smaller cores. This puts a floor in how low processors can drop their power consumption during idle, and this effect gets worse with smaller process nodes. If the presence of small cores allows the processor to completely power down the larger cores during light usage scenarios, then power consumption during light usage will be lower. But, for highly variable loads like gaming, the time it takes to move processes from small to large cores will likely lead to degraded performance and prevent any measurable power saving.

A more sophisticated way would be to allow each core to power off certain elements within the core when not required. For example, powering off a FPU unit when not required, or half the L2 and L3 cache when not needed. But that doesn't allow marketing to scream 'MOAR CORES!' so that option is off the table.
That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.
 
CPUs have been doing this for over a decade already...

I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.
 
Last edited:
ARM's marketing material promises up to a 75% savings in power usage for some activities.[2]



Serious answers are not available and its the same sentiment everywhere right? We really don't know anything other than 'it uses Big little'. We can speculate :)

More cores equals more power used. And from that conclusion... its easy to draw other conclusions. Such as:
1. Windows scheduler and good allocation of workloads will be the key to gaining an advantage over other products
2. Intel's goal must be: faster when its needed (it can turbo high), fall back on little when possible (big cores can cool down and clear TDP budget for a new boost). Any other approach is not feasible, because then they are not competitive against stripped AND full fat performance cores.
4. A new reduction of base clocks on the BIG cores is likely, to clear more TDP headroom for turbo. Or maybe even dial back entirely to idle clock, some 800 mhz, and just have a turbo on top of that. Or maybe fully shut down, but I'm then thinking of latency problems.

So, using the cores at the same time will bring what advantage exactly? I'm not seeing it, do you? For this product to be viable, it needs to be better than either variant of the cores used in it. 8 fast and 8 slow cores are still worse than 16 regular ones at base clock, I reckon...

Interesting stuff indeed :) What I personally think is that Alder Lake is a way to get 10nm dies out that were planned anyway, and still keep competitive product across the whole stack. Forget 'glued together', Intel is going full scrapyard dive. It also confirms yet again that 10nm scales like shit into performance territory.


There are 3 ways of arranging and using BIG.little:
1. Clustered switching - the described by you - either big cores or small cores and never at the same time;
2. In-kernel switcher - when a big and a small cores are coupled into pairs, so with 8 + 8 you would have something like 8 big cores + hyper-threading enabled;
and the third:
3. Heterogenous multi-processing (global scheduling):

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[10][11]
This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),[12][13] and Apple mobile application processors starting with the Apple A11.[14]
https://en.wikipedia.org/wiki/ARM_big.LITTLE#cite_note-14

1588627347859.png



The thing also which should be considered is that you have frequency wall on the 14nm process, so no matter the approach, more performance would not be possible.

And the whole approach will still be inferior to Zen 3 and Zen 4, especially with 16 big cores (or double) with SMT.
 
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.



Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.
I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.
 
I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.


Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.
 
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That just exposes their process and architectural failings in the past few years. Intel was the first in Skylake and Kaby to push for a smarter, more responsive core with Speed Shift. Yet, AMD completely showed them up with Matisse and how much a practical difference can be made with a core that responds to loads in 2ms. Renoir's monolithic die enabled a dynamically clocked IF, and that combined with Zen 2's signature CPPC2 features resulted in the 50-100% improvement in battery life in direct comparison to similarly specced Coffee Lake, in light and moderate workloads. You don't see AMD resorting to Jaguar on half the die to maintain efficiency at low loads, or power gate half of a Zen 2 core to gimp it down to Puma-level performance just to save 2/10ths of a watt at low loads. Matisse and Renoir already have the best of both worlds, they don't need to be that desperate.

This "allowing a Skylake core to morph into a Goldmont core" isn't happening. Intel moved to a considerably larger core with Sunny Cove (and if the rumors are true, even bigger in Willow Cove) in order to leverage that performance over traditional Core, to stay competitive. All these Alder Lake rumors reek of Intel engineers finally giving up on trying to optimize this larger Core for efficiency because their 10nm+ process still isn't worth a damn and 7nm is nowhere in sight, and instead turning to shitty Atom for the lower end of the power spectrum.

Intel can forget trying to turn off half a core, running cores at 25MHz, or juggling Atom and Core on the same substrate, if they can't even get their own Speed Shift technology down to where it rivals AMD's CPPC2. That's a prerequisite to all this nonsense. And if they do in fact perfect that concept, that would just enable Tiger Lake to perform in an adaptive manner as Renoir does, so then what's the point of using Goldmont? Mainstream consumers want an thin and light notebook that draws power like it's not even on when at idle, but ramps up to provide the requisite performance at a moment's notice. What Renoir is capable of, hits that nail right on the head.

And then there's the Windows scheduler, the worst cockblock of all.
 
Last edited:
The way I see it Intel will use Cluster Switching to achieve low clocks/ power consumption during idle low intensity usage, switcing to Big cores during high usage.
I could be wrong but I don't think all cores will be usable at he same time.
 
Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.
They already do that too.

A lot of the obvious stuff has already been done.
 
They already do that too.

A lot of the obvious stuff has already been done.


Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?
 
Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)
 
Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)


Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?
 
Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.
 
Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.


It is not that extremely low - it drains my battery like no tomorrow. And it has only roughly 40-50% of the efficiency achieved in Renoir.
Renoir is the benchmark which we should compare everything else with.


Actually, laptops have much larger batteries than phones and despite this, the phones can last for weeks in standby, while poor laptops in the best case can last half a day in standby.
 
Last edited:
There's no way Alder Lake is using 3D die stacking.


But Lakefield with 22nm/10nm is exactly this. 22nm base field, 10nm compute field.
1 big Sunny Cove core + 4 small Tremont cores.
This is the so called non-symmetric grouping heterogeneous multi-core of BIG.little cores.

 
Just because this uses big+small cores does it mean it's a 3d stacked chip? Come on. Good luck cooling a +95W die stacked CPU. Lakefield is only 5-7W TDP so there's no problem in cooling stacked dies at this low power.
 
Back
Top