Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

TheoneandonlyMrK · May 4, 2020

Tom Yum said:
A big.little strategy could work for x86 but the devil will be in the details about how quickly the CPU can transition processes between cores when there is such a performance disparity between the little and big cores. Big.little works as a power saving measure because leakage current scales with transistor numbers, so very large cores have much higher leakage current than smaller cores. This puts a floor in how low processors can drop their power consumption during idle, and this effect gets worse with smaller process nodes. If the presence of small cores allows the processor to completely power down the larger cores during light usage scenarios, then power consumption during light usage will be lower. But, for highly variable loads like gaming, the time it takes to move processes from small to large cores will likely lead to degraded performance and prevent any measurable power saving.

A more sophisticated way would be to allow each core to power off certain elements within the core when not required. For example, powering off a FPU unit when not required, or half the L2 and L3 cache when not needed. But that doesn't allow marketing to scream 'MOAR CORES!' so that option is off the table.

That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.

Tom Yum · May 4, 2020

Assimilator said:
CPUs have been doing this for over a decade already...

I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

theoneandonlymrk said:
That's exactly what they're competition does, as I said before gate the power per core on or off Ryzens do this, Intel do this ,intel also developed race to idle so that they can turn cores off sooner.

Unfortunately for Intel it's just as much an issue of power use under load, I'm not thinking this will fix that.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.

ARF · May 4, 2020

ARM's marketing material promises up to a 75% savings in power usage for some activities.[2]

Vayra86 said:
Serious answers are not available and its the same sentiment everywhere right? We really don't know anything other than 'it uses Big little'. We can speculate

More cores equals more power used. And from that conclusion... its easy to draw other conclusions. Such as:
1. Windows scheduler and good allocation of workloads will be the key to gaining an advantage over other products
2. Intel's goal must be: faster when its needed (it can turbo high), fall back on little when possible (big cores can cool down and clear TDP budget for a new boost). Any other approach is not feasible, because then they are not competitive against stripped AND full fat performance cores.
4. A new reduction of base clocks on the BIG cores is likely, to clear more TDP headroom for turbo. Or maybe even dial back entirely to idle clock, some 800 mhz, and just have a turbo on top of that. Or maybe fully shut down, but I'm then thinking of latency problems.

So, using the cores at the same time will bring what advantage exactly? I'm not seeing it, do you? For this product to be viable, it needs to be better than either variant of the cores used in it. 8 fast and 8 slow cores are still worse than 16 regular ones at base clock, I reckon...

Interesting stuff indeed What I personally think is that Alder Lake is a way to get 10nm dies out that were planned anyway, and still keep competitive product across the whole stack. Forget 'glued together', Intel is going full scrapyard dive. It also confirms yet again that 10nm scales like shit into performance territory.

There are 3 ways of arranging and using BIG.little:
1. Clustered switching - the described by you - either big cores or small cores and never at the same time;
2. In-kernel switcher - when a big and a small cores are coupled into pairs, so with 8 + 8 you would have something like 8 big cores + hyper-threading enabled;
and the third:
3. Heterogenous multi-processing (global scheduling):

The most powerful use model of big.LITTLE architecture is Heterogeneous Multi-Processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.[10][11]
This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),[12][13] and Apple mobile application processors starting with the Apple A11.[14]

https://en.wikipedia.org/wiki/ARM_big.LITTLE#cite_note-14

ARM big.LITTLE - Wikipedia

en.wikipedia.org

The thing also which should be considered is that you have frequency wall on the 14nm process, so no matter the approach, more performance would not be possible.

And the whole approach will still be inferior to Zen 3 and Zen 4, especially with 16 big cores (or double) with SMT.

TheoneandonlyMrK · May 4, 2020

Tom Yum said:
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

Agree, it does nothing for power use under load. Like I said, this is more coming out of Intel marketing requirements than actual better end-user experience. It allows Intel to spruik bullsh*t 'up-to XX hours battery life' metrics (bullsh*t because no one idles their laptop for 12 hours straight) and 'Moar cores', even if those extra cores perform like potatoes. My point was more that the principles of big.little as a power saving measure are sound, but also crude, and I would expect someone with the R&D budget of Intel to implement something more sophisticated than rehashing a 9 year old idea from ARM.

I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.

ARF · May 4, 2020

theoneandonlymrk said:
I think in time they might do that , AMD will certainly advance their power saving systems in that direction, and i agree with all you just said though I think it too extensive, once your down to one core on why would you want the complication of turning threads off dynamically , I can see some gain to turning the Fp units and some others but there is a limit to the usefullness of doing some things as every thing costs transistors and space, even a power gate.

Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.

tabascosauz · May 4, 2020

Tom Yum said:
I think you misunderstood the level of power gating I am referring too, though yes dynamic cache has been a feature for some time. I am referring to shutting down execution units, instruction decoders, branch prediction caches, switching from OoO to in order processing, etc, dynamically as processing resources require. Effectively and, allowing a Skylake core to morph into an Goldmont core as required.

Big.little is a crude way of doing it, effectively throwing silicon at the problem. The benefit is that it is simpler to set up, the negative is that the courser the powergating, the slower the responsiveness as the time taken to 'turn on' and initialise the powered up processor element increases.

That just exposes their process and architectural failings in the past few years. Intel was the first in Skylake and Kaby to push for a smarter, more responsive core with Speed Shift. Yet, AMD completely showed them up with Matisse and how much a practical difference can be made with a core that responds to loads in 2ms. Renoir's monolithic die enabled a dynamically clocked IF, and that combined with Zen 2's signature CPPC2 features resulted in the 50-100% improvement in battery life in direct comparison to similarly specced Coffee Lake, in light and moderate workloads. You don't see AMD resorting to Jaguar on half the die to maintain efficiency at low loads, or power gate half of a Zen 2 core to gimp it down to Puma-level performance just to save 2/10ths of a watt at low loads. Matisse and Renoir already have the best of both worlds, they don't need to be that desperate.

This "allowing a Skylake core to morph into a Goldmont core" isn't happening. Intel moved to a considerably larger core with Sunny Cove (and if the rumors are true, even bigger in Willow Cove) in order to leverage that performance over traditional Core, to stay competitive. All these Alder Lake rumors reek of Intel engineers finally giving up on trying to optimize this larger Core for efficiency because their 10nm+ process still isn't worth a damn and 7nm is nowhere in sight, and instead turning to shitty Atom for the lower end of the power spectrum.

Intel can forget trying to turn off half a core, running cores at 25MHz, or juggling Atom and Core on the same substrate, if they can't even get their own Speed Shift technology down to where it rivals AMD's CPPC2. That's a prerequisite to all this nonsense. And if they do in fact perfect that concept, that would just enable Tiger Lake to perform in an adaptive manner as Renoir does, so then what's the point of using Goldmont? Mainstream consumers want an thin and light notebook that draws power like it's not even on when at idle, but ramps up to provide the requisite performance at a moment's notice. What Renoir is capable of, hits that nail right on the head.

And then there's the Windows scheduler, the worst cockblock of all.

Caring1 · May 4, 2020

The way I see it Intel will use Cluster Switching to achieve low clocks/ power consumption during idle low intensity usage, switcing to Big cores during high usage.
I could be wrong but I don't think all cores will be usable at he same time.

TheoneandonlyMrK · May 4, 2020

ARF said:
Yup, I think Bulldozer with its clustered multi-threading was designed with this in mind - saving transistors.

Power saving systems means to clock the cores between 0 MHz and 4800 MHz, and allow the cores to execute tasks even at 50 or 25 MHz.

They already do that too.

A lot of the obvious stuff has already been done.

ARF · May 5, 2020

theoneandonlymrk said:
They already do that too.

A lot of the obvious stuff has already been done.

Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

Mussels · May 5, 2020

ARF said:
Really? Has it, though?

I don't see neither the Task Manager nor a third-party program like Core Temp to report anything lower than 1496 MHz on my APU ?

Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)

ARF · May 5, 2020

Mussels said:
Disabled cores dont *give* readings - the very act of reading from them, forces them awake (see the dramas with 'waaah my ryzen reads high voltage at idle' because the one active core is boosting to its max)

Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

Mussels · May 5, 2020

ARF said:
Yes, I know this. So, between "disabled" state and 1496 MHz at 0.7-something volts there are no other states in between ?

Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.

ARF · May 5, 2020

Mussels said:
Probably not, no. Thats probably an extremely low wattage state for the CPU deemed an efficient point to just leave it as the minimum.

It is not that extremely low - it drains my battery like no tomorrow. And it has only roughly 40-50% of the efficiency achieved in Renoir.
Renoir is the benchmark which we should compare everything else with.

Actually, laptops have much larger batteries than phones and despite this, the phones can last for weeks in standby, while poor laptops in the best case can last half a day in standby.

T1beriu · May 5, 2020

There's no way Alder Lake is using 3D die stacking.

ARF · May 5, 2020

T1beriu said:
There's no way Alder Lake is using 3D die stacking.

But Lakefield with 22nm/10nm is exactly this. 22nm base field, 10nm compute field.
1 big Sunny Cove core + 4 small Tremont cores.
This is the so called non-symmetric grouping heterogeneous multi-core of BIG.little cores.

Lakefield - Microarchitectures - Intel - WikiChip

Lakefield (LKF) is a high-performance low-power 3D microarchitecture designed by Intel and introduced in 2019.

en.wikichip.org

T1beriu · May 14, 2020

Just because this uses big+small cores does it mean it's a 3d stacked chip? Come on. Good luck cooling a +95W die stacked CPU. Lakefield is only 5-7W TDP so there's no problem in cooling stacked dies at this low power.

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	ab┃ob
Processor	7800X3D┃5800X3D
Motherboard	B650E PG-ITX┃X570 Impact
Cooling	NH-U12A + T30┃AXP120-x67
Memory	64GB 6400CL32┃32GB 3600CL14
Video Card(s)	RTX 4070 Ti Eagle┃RTX A2000
Storage	8TB of SSDs┃1TB SN550
Case	Caselabs S3┃Lazer3D HT5

System Name	H7 Flow 2024
Processor	AMD 5800X3D
Motherboard	Asus X570 Tough Gaming
Cooling	Custom liquid
Memory	32 GB DDR4
Video Card(s)	Intel ARC A750
Storage	Crucial P5 Plus 2TB.
Display(s)	AOC 24" Freesync 1m.s. 75Hz
Mouse	Lenovo
Keyboard	Eweadn Mechanical
Software	W11 Pro 64 bit

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

Intel's Alder Lake Processors Could use Foveros 3D Stacking and Feature 16 Cores

TheoneandonlyMrK

Tom Yum

ARF

ARM big.LITTLE - Wikipedia

TheoneandonlyMrK

ARF

tabascosauz

Moderator

Caring1

TheoneandonlyMrK

ARF

Mussels

Freshwater Moderator

ARF

Mussels

Freshwater Moderator

ARF

T1beriu

ARF

Lakefield - Microarchitectures - Intel - WikiChip

T1beriu

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.