• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Meteor Lake Technical Deep Dive

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
28,664 (3.74/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Today Intel is taking the wraps off their Meteor Lake Architecture. Our tech preview tells you everything you need to know about Intel's new ideas that will power the company's processors for years to come. Just like AMD, Intel is betting on chiplets, which combine multiple silicon dies into a single CPU to build faster, more-energy efficient designs that are cheaper to manufacture.

Show full review
 
Wouldn't Meteor Lake be focused on laptops? I don't understand the reason for comparing the iGPU with the desktop version which only has 256shaders (768shaders on the laptop). Bigger numbers to show investors, of course...

1695140710377.png
 
I like the low power island which clocks the E cores more sensibly. I was expecting them to use smaller cores than their fairly large Crestmont cores for the low power island. Let's see when it starts shipping. The claims for power efficiency are lower than I expected: more than 20% compared to Raptor Lake mobile. Intel's presentation was rather light on microarchitectural details. The only hard numbers were cache sizes: I cache for the P core has doubled to 64 KB. Other caches stay at the same size as Raptor Lake. It's also interesting that they have opted to use TSMC's N5 rather than their own process for the GPU tile.
 
Maybe there was more buried in the slides but I feel like very little new information was presented here. I had heard that there was a 6P+8E core configuration plus 2 e-cores in the SoC that would only be used if the CPU tile was powered down. But now I know it's 4 cores on the SoC and they're available for use even when the CPU tile is active. Maybe this was in the slides but the article only said that the GPU tile used something from TSMC which is better than what Arc uses, which implies N5, N4, or N3 but still nothing more specific, which pretty much confirms what we already surmised. The article says the interposer has some logic, but what is that logic? Is it the rumored L4 cache? And what about the ISA, is AVX-512 supported?
 
  • Like
Reactions: bug
Maybe there was more buried in the slides but I feel like very little new information was presented here. I had heard that there was a 6P+8E core configuration plus 2 e-cores in the SoC that would only be used if the CPU tile was powered down. But now I know it's 4 cores on the SoC and they're available for use even when the CPU tile is active. Maybe this was in the slides but the article only said that the GPU tile used something from TSMC which is better than what Arc uses, which implies N5, N4, or N3 but still nothing more specific, which pretty much confirms what we already surmised. The article says the interposer has some logic, but what is that logic? Is it the rumored L4 cache? And what about the ISA, is AVX-512 supported?
They only claimed VNNI which suggests 256 bit AVX.
 
AVX 512 is practically dead for Intel, at least post TGL till now. Don't expect anything like that till AMD also moves to AVX 512 minus the double pumping.
 
AVX 512 is practically dead for Intel, at least post TGL till now. Don't expect anything like that till AMD also moves to AVX 512 minus the double pumping.
Zen 4 has AVX-512 which means that for the foreseeable future, its successors will support it as well.
 
I'm not really sure what it is for, except I've heard some AI workloads can benefit from it. My reason for wondering is because Golden/Raptor Cove includes it, but Gracemont does not. So I'm wondering if Redwood Cove and Crestmont were actually designed with the same ISA instead of non-shared instructions being disabled.

From Anand Tech, it sounds like the GPU tile is made with TSMC N5 and the SoC with TSMC N6.

I wonder today if Intel's new reliance on TSMC has to do with node optimization. Intel's newest nodes get used first for CPUs, so they're frequency-optimized. But TSMC favors density-optimized nodes because their early adopters make smartphone and graphics processors. So even if Intel and TSMC were keeping pace with one another, it'd still make more sense to build the CPU tile at Intel and the GPU and SoC tiles at TSMC. (I believe both companies tend to make alternate versions of their nodes that are optimized differently, but those come later and might be a little more expensive.)

Anand Tech also said that the Crestmont cores in the SoC are optimized with a lower voltage-frequency curve, perhaps that's because the TSMC N6 process they're built with is density-optimized?

Source:
 
Last edited:
I'm not really sure what it is for, except I've heard some AI workloads can benefit from it. My reason for wondering is because Golden/Raptor Cove includes it, but Gracemont does not. So I'm wondering if Redwood Cove and Crestmont were actually designed with the same ISA instead of non-shared instructions being disabled.

From Anand Tech, it sounds like the GPU tile is made with TSMC N5 and the SoC with TSMC N6.

I wonder today if Intel's new reliance on TSMC has to do with node optimization. Intel's newest nodes get used first for CPUs, so they're frequency-optimized. But TSMC favors density-optimized nodes because their early adopters make smartphone and graphics processors. So even if Intel and TSMC were keeping pace with one another, it'd still make more sense to build the CPU tile at Intel and the GPU and SoC tiles at TSMC. (I believe both companies tend to make alternate versions of their nodes that are optimized differently, but those come later and might be a little more expensive.)

Anand Tech also said that the Crestmont cores in the SoC are optimized with a lower voltage-frequency curve, perhaps that's because the TSMC N6 process they're built with is density-optimized?

Source:
I suspect you're right about N5 being denser than Intel 4 for GPUs which don't need them to clock very high. As far as VNNI is concerned, it's useful for algorithms used for AI. I'm quoting the relevant part from the link in the previous sentence
Platforms not using VNNI require the vpmaddubsw, vpmaddwd and vpaddd instructions to complete the multiply-accumulate operations in INT8 convolution operation:

DL int-8


Platforms using VNNI require only one instruction, “vpdpbusd”, to complete the INT8 convolution operation:

DL int-8
 
Wouldn't Meteor Lake be focused on laptops? I don't understand the reason for comparing the iGPU with the desktop version which only has 256shaders (768shaders on the laptop). Bigger numbers to show investors, of course...

View attachment 314302
Not sure where you are getting that it's desktop Raptorlake. It's a 1:1 comparison against mobile Raptorlake. Or did you forget Raptorlake is also in mobile?

I like the low power island which clocks the E cores more sensibly. I was expecting them to use smaller cores than their fairly large Crestmont cores for the low power island. Let's see when it starts shipping. The claims for power efficiency are lower than I expected: more than 20% compared to Raptor Lake mobile. Intel's presentation was rather light on microarchitectural details. The only hard numbers were cache sizes: I cache for the P core has doubled to 64 KB.
That's is just for the process, not Meteorlake. Also, the reason they were light on uarch details is that neither P nor the E cores advance that much. It's basically:

P: Doubles L1i cache to 64KB
E: rename/allocate goes from 5 to 6

Anand Tech also said that the Crestmont cores in the SoC are optimized with a lower voltage-frequency curve, perhaps that's because the TSMC N6 process they're built with is density-optimized?
We always knew that. Intel 4 for Compute, TSMC N5 for GPU and N6 for IO and SoC.

The LP E cores in the SoC are just optimized for lower power, nothing to do with the process.
 
Last edited:
Cool! Can’t wait to see what meteor lake can do, as always I will be getting the second generation, more optimization etc
 
It looks like Server and PC cores are diverging further.

Redwood Cove server in Granite Rapids: Enhanced branch prediction, double L1i
Sierra Glen(Crestmont counterpart) in Sierra Forest: Rename/Allocate is still at 5, same as Gracemont.

Redwood Cove client in Meteorlake: Double L1i
Crestmont in Meteorlake: Rename/Allocate is at 6, Enhanced branch prediction

Server E core is optimizing for higher frequency hence the slightly narrower uarch.
 
Not sure where you are getting that it's desktop Raptorlake. It's a 1:1 comparison against mobile Raptorlake. Or did you forget Raptorlake is also in mobile?


That's is just for the process, not Meteorlake. Also, the reason they were light on uarch details is that neither P nor the E cores advance that much. It's basically:

P: Doubles L1i cache to 64KB
E: rename/allocate goes from 5 to 6


We always knew that. Intel 4 for Compute, TSMC N5 for GPU and N6 for IO and SoC.

The LP E cores in the SoC are just optimized for lower power, nothing to do with the process.
You're right, I didn't even notice that they refreshed the Alder-Lake mobile and called it RaptorLake.
 
Maybe there was more buried in the slides but I feel like very little new information was presented here. I had heard that there was a 6P+8E core configuration plus 2 e-cores in the SoC that would only be used if the CPU tile was powered down. But now I know it's 4 cores on the SoC and they're available for use even when the CPU tile is active.
It's 2x LP E cores not four. https://www.techpowerup.com/review/intel-meteor-lake-technical-deep-dive/3.html

On the slide "Meteorlake Low Power Island".

With Crestmont you could have 2 core clusters rather than 4 only as with Alder and Raptor.

@AnotherReader Also specifically for Intel 4 it only support HP libraries and doesn't have enough to support a full IO block. Don't know why they would continue to heavily rely on TSMC for future though.
 
It's 2x LP E cores not four. https://www.techpowerup.com/review/intel-meteor-lake-technical-deep-dive/3.html

On the slide "Meteorlake Low Power Island".

With Crestmont you could have 2 core clusters rather than 4 only as with Alder and Raptor.

@AnotherReader Also specifically for Intel 4 it only support HP libraries and doesn't have enough to support a full IO block. Don't know why they would continue to heavily rely on TSMC for future though.
Yes, Intel 3 should be the full featured version of Intel 4.
 
I bet we'll see this in a lot of NUC like devices. That new iGPU might finally give AMD some competition on that front.
 
So if I get it right, the purpose of moving some e-cores onto the SoC tile is to save on power consumption by disabling the tile interconnects - which is basically what consumes a lot of power on Ryzen when idle. Interesting.
 
It's 2x LP E cores not four. https://www.techpowerup.com/review/intel-meteor-lake-technical-deep-dive/3.html

On the slide "Meteorlake Low Power Island".

With Crestmont you could have 2 core clusters rather than 4 only as with Alder and Raptor.
Ah you're right. The article says, "Based on the same 'Crestmont' core architecture as the E-cores on the Compute tile although not being part of its ringbus or sharing its L3 cache; this E-core cluster has its own L2 cache shared among four cores." But I can see the slide you referred to from Intel which says 2 cores.

I bet we'll see this in a lot of NUC like devices. That new iGPU might finally give AMD some competition on that front.
Tiger Lake's iGPU was actually a bit faster than AMD's Vega iGPU at the time. But Alder Lake and Raptor Lake use the exact same iGPU as Tiger Lake so yeah this will be the first time in a while it's improved and with a 2x improvement it should rival the current RDNA3 iGPU. Intel's lower trims are also less cut down than AMD's, so certain i5 and i3 models I think still compete very favorably against AMD.
 
A very mobile-focused architecture, but it's pretty cool that we're finally getting rid of the chipset (even though technically in mobile CPUs the PCH was on package).

Will they do that for desktops too, or will there still be a PCH on the motherboard?
 
So if I get it right, the purpose of moving some e-cores onto the SoC tile is to save on power consumption by disabling the tile interconnects - which is basically what consumes a lot of power on Ryzen when idle. Interesting.

From what I read it's only the compute titles:

it allows Intel to power down the Compute tile when not needed

In essence the SoC tile acts as a smaller low end chip within a chip that only taps the compute / IO tiles as needed. Should reduce power consumption for idle or very light tasks.
 
Intel calls a "disaggregated chiplet-based processor" a SoP or System-on-Package chip architecture.

This article fails to mention, even once, the core IDM 2.0 components as Intel defined them

Dis-aggregated nonsense writing... Better check with DerBuyer.. lulz.
 
A very mobile-focused architecture, but it's pretty cool that we're finally getting rid of the chipset (even though technically in mobile CPUs the PCH was on package).

Will they do that for desktops too, or will there still be a PCH on the motherboard?
AMD had an on-die chipset since the Bulldozer-derivative chip right before Zen. The I/O features are cut down so you could call them PCH-lite. The desktops come with additional chipset for more IO.

Intel could conceivably do the same thing.

Intel says 4-6% improvement for Crestmont E cores. Likely less for the P.
 
Last edited:
Disappointing that they neutered the AV1 encoder to 4:2:0. I was expecting them to make AV1 ubiquitous but they left room for improvement for future generations in classic Apple style.
 
Very nice article, I enjoyed reading this.
 
Back
Top