Intel "Emerald Rapids" Die Configuration Leaks, More Details Appear

Squared · Dec 7, 2023

DavidC1 said:
According to Semianalysis, with a perfect defect density rate, the amount of CPUs that can be made per wafer is 34 on EMR vs 37 on SPR.

Does Semianalyis compare the cost of the EMIB dies and packaging? Sapphire Rapids maxes out at 10 EMIB interconnects whereas Emerald Rapids maxes out at 3. That brings the silicon parts down from 14 to 5. I wonder if the reduced EMIB die cost and packaging cost could more than offset the additional CPU die cost.

DavidC1 said:
In theory, Meteorlake should do better. The LP E-cores will force tasks off compute tile for bursty workloads and reduce SoC power. The Intel 4 process has a steeper curve, so while it won't do as well on higher power, it'll do quite well on the lower end. Hopefully, whatever low-level changes Alder/Raptor had that made it regress is addressed on Meteorlake too.

I was thinking what you were saying about the frequency curves should mean that Alder Lake's laptop efficiency problem is solved in Meteor Lake. I would think that it'd actually be best if Intel 4 focused on high frequency efficiency because the near-idle CPU usage will be on cores in the SoC tile built with TSMC N6, and they will be low-frquency optimized (to an extent), so there's not as much need for for the CPU tile to be efficient at low frequencies.

DavidC1 · Dec 7, 2023

Squared said:
Does Semianalyis compare the cost of the EMIB dies and packaging? Sapphire Rapids maxes out at 10 EMIB interconnects whereas Emerald Rapids maxes out at 3. That brings the silicon parts down from 14 to 5. I wonder if the reduced EMIB die cost and packaging cost could more than offset the additional CPU die cost.

No, but the die area includes EMIB. It can increase complexity but since even having one would increase it versus having none, it would be small.

Squared said:
I was thinking what you were saying about the frequency curves should mean that Alder Lake's laptop efficiency problem is solved in Meteor Lake. I would think that it'd actually be best if Intel 4 focused on high frequency efficiency because the near-idle CPU usage will be on cores in the SoC tile built with TSMC N6, and they will be low-frquency optimized (to an extent), so there's not as much need for for the CPU tile to be efficient at low frequencies.

Actually, not really.

I should explain in more detail. CPU power is actually active power + static power. Active power is when the transistor is active and switching activity is going on. Static power is leakage. There's a crossover point.

Simply put the tradeoffs are like this:
1. Lower voltages at high frequencies, larger area.
2. Higher voltages, much lower leakage.

This table illustrates it very well: https://www.realworldtech.com/includes/images/articles/iedm10-10.png?x97168

Let's look at Intel's 32nm process. There's High VT and Low VT. Low VT NMOS Idsat* 1550 ua/um with 100 ua/um leakage, while High VT NMOS is
1357 ua/um Idsat with 10ua/um leakage. This is just one aspect. The Low VT can drive higher current and switch faster, but leaks more, while for High VT it's the opposite. They can further this by design differences like P and E cores.

At laptop power levels, and especially like Ultrabook designs where every component is chosen for lower power, leakage power becomes significant, because that's an unchanging level regardless of load. Before Power Gating was introduced(with Nehalem) where certain sections of a circuit can be turned off to near-zero levels, they would clock gate, but still tons of leakage power.

So really you want a transistor that does better on low, plus ultra low leakage, which is basically what laptop chips are compared to desktop ones. For desktops, since the system is using 30W at idle anyway, cutting 5W CPU idle in half doesn't matter if you reduce peak frequency by 10%.

The way CPUs run nowadays aren't off and on. They go through multiple power and frequency levels as well. Some may be bursty enough that it doesn't even reach the max frequency since it happens so fast. The LP E cores primary purpose is to keep the compute tile off guaranteed while handling minimal tasks. I would think the chip is clocked closer to 1GHz than 2GHz, which is more than plenty enough for video playback, and in other cases like web browsing where more performance is required for a short amount of time, keep the components on a close leash so it actually powers down and hands back to the LP E cores when the demand goes lower again.

I can tell you on my system you need to do a lot of work to keep the CPU at a lower power state, and from what I hear they regressed post-Icelake. On Icelake, for some reason you can get 25 hours(30-50% over predecessor) screen-on idle but only 10 hour(same) battery life. So even that chip has a harder time keeping the cores off.

*Idsat means Current at saturation, ie when it's at max. VT stands for Threshold voltage, the point when transistors turn on.

Squared · Dec 7, 2023

DavidC1 said:
High VT NMOS Idsat* 1550 ua/um with 100 ua/um leakage, while Low VT NMOS is
1357 ua/um Idsat with 10ua/um leakage. This is just one aspect.

It looks like you swapped the high and the low values; I think the values in the table make your point, that it's possible to have lower leakage at higher voltage. Actually the table seems to show that this is the norm, which makes sense to me because this is why power from the grid travels most of the distance at extremely high voltage.

So the same chip can be very efficient in one use case and inefficient in another, even if the type of work is the same but the load changes.

DavidC1 · Dec 8, 2023

Squared said:
It looks like you swapped the high and the low values; I think the values in the table make your point, that it's possible to have lower leakage at higher voltage. Actually the table seems to show that this is the norm, which makes sense to me because this is why power from the grid travels most of the distance at extremely high voltage.

So the same chip can be very efficient in one use case and inefficient in another, even if the type of work is the same but the load changes.

VT is threshold voltage, the point where transistors turn on. Yes you are right, I made the mistake of swapping it around. Low VT is faster and higher leakage, High VT is slower and lower leakage.

Low VT transistors are faster because the point where transistors turn on are lower and easier. The sacrifice is increased leakage current.

In addition, there's this: https://images.anandtech.com/doci/15967/N1.png

For lower power, you want a steeper curve, while for higher clocks you want the opposite. This is the reason why AMD mobile chips are ahead of Intel's at lower power, but it crosses over and is opposite at higher power levels. Intel 4 and Meteorlake looks to make the curve steeper based on preliminary data.

That graph makes my point a bit confusing, but what I mean is that the steeper curve allows for higher performance at lower power envelopes, but as you increase clocks and power, it crosses over and the products with a steeper curve ramps up in power so much more so that despite the early advantage, it'll end up using more power after that point. If you see rare comparisons of power/voltage/clock comparisons of laptop and desktop chips, you see just that. Mobile chips need more voltage to reach the same frequency.

If you read the Realworldtech articles, then you'll also see the lower leakage transistors have a lower drive current limit so there's a lower clock speed floor too. You can't just keep increasing voltage, not only it'll increase power dramatically, it might just fail anyways.