Tuesday, January 11th 2022

AMD's Lisa Su Confirms Zen 4 is Using Optimised TSMC 5 nm Node, 2D and 3D chiplets

Anandtech asked AMD during a meeting at CES about the production nodes used to make its chips at TSMC and the importance of leading edge nodes for AMD to stay competitive, especially in light of the cost of using said nodes. Lisa Su confirmed in her answer to Anandtech that AMD is using an optimised high-performance 5 nm node for its upcoming Zen 4 processor chiplets, which there interestingly appears to be both 2D and 3D versions of. This is the first time we've heard a mention of two different chiplet types using the same architecture and it could mean that we get to see Zen 4 based CPUs with and without 3D cache.

What strikes us as a bit odd about the Anandtech article, is that they mention the fact that several of TSMC's customers are already making 4 nm and soon 3 nm chips and are questioning why AMD wouldn't want to be on these same nodes. It seems like Anandtech has forgotten that not all process nodes are universally applicable and just because you can make one type of chip on a smaller node, doesn't mean it'll be suitable for a different type of chip. For the longest of times, mobile SoCs or other similar chips seem to always have been among the first things being made on new nodes, with more complex things like GPUs and more advanced CPUs coming later, to tweaked versions of the specific node. The fact that TSMC has no less than three 7 nm nodes, should be reason enough to realise that the leading edge node might not be the ideal node for all types of chips.
In related news, TSMC is said to have accepted advanced payments of US$5.44 billion from at least 10 of its clients, of which AMD, Apple, Nvidia and Qualcomm are all mentioned. The payments have been done to secure production capacity, although for exactly how long time into the future isn't clear. TSMC saw advanced payments of US$3.8 billion in the first three quarters of last year and it's likely that these kinds of deals will continue as long as there's more demand than supply.
Sources: Anandtech, @dnystedt, WikiChip
Add your own comment

28 Comments on AMD's Lisa Su Confirms Zen 4 is Using Optimised TSMC 5 nm Node, 2D and 3D chiplets

#1
ratirt
So the 2 different chiplets with vcache and without are a thing. No shock here for me to be fair. The question is how those will be divided. 2 different desktop chips or are we talking about APU vs Desktop.
Posted on Reply
#2
TheLostSwede
ratirtSo the 2 different chiplets with vcache and without are a thing. No shock here for me to be fair. The question is how those will be divided. 2 different desktop chips or are we talking about APU vs Desktop.
It does indeed seem like it. I really have no insight into what is what here, but it's interesting that she confirmed it, without seemingly being pressed about it.
Posted on Reply
#3
ratirt
TheLostSwedeIt does indeed seem like it. I really have no insight into what is what here, but it's interesting that she confirmed it, without seemingly being pressed about it.
I guess we would have to wait for the 5800x3D to have more recognition what it actually brings. But if Lisa confirmed it now, she must be confident about the 3dVcache for sure.
If the desktops are going to be divided w/wo Vcache, I'm really curious if it will be the entire lineups or other differentiation.
Posted on Reply
#4
big_glasses
ratirtSo the 2 different chiplets with vcache and without are a thing. No shock here for me to be fair. The question is how those will be divided. 2 different desktop chips or are we talking about APU vs Desktop.
could be server vs non-server also, as it refers to chiplets
Posted on Reply
#5
Chomiq
big_glassescould be server vs non-server also, as it refers to chiplets
Could be that lower end SKUs won't use it.
Posted on Reply
#6
ratirt
big_glassescould be server vs non-server also, as it refers to chiplets
True but the most benefit Vcache will have on the APU to some extent. Maybe each segment will get vcache for the top models. Having all the line ups with and without is not practical.
Posted on Reply
#7
Testsubject01
big_glassescould be server vs non-server also, as it refers to chiplets
ChomiqCould be that lower end SKUs won't use it.
Might be both correct, server + a x3D-sku of the R9 (and R7?) on high-end desktop. While R5, R3 and below get non.
I wonder how Zen4 APUs would work with 3D cache.
Posted on Reply
#8
Punkenjoy
Unless AMD add a way for faster chiplet to chiplet cache access (that would make it faster than going to RAM unlike right now), I do not see AMD making Zen 4 2 CCD with 3D V-Cache for consumers. It wouldn't just be good enough for current workload or at least the added price and lower yield wouldn't probably outweigh the benefits.

A CCD to CCD link would probably help there but that is added cost and added complexity. There would be also the challenge of knowing what is in the second CCD and when to access it directly. AMD have pattern on that and is working on resolving this issue as it will become more and more a problem.

But we never know, maybe the price of the added cache will drop rapidly and it might be just few more bucks. Then yes, we will probably see a dual CCD with 3d v-cache. After all, AMD is supposedly producing a lot of those for the big cloud provider right now. This should reduce the cost in the end for consumers as production scale up.

The best scenario i think would be a 8 core CCD paired with something like Navi 24 (but with all video encoder/decoder please). Each could have a 64 MB of cache and i suspect that could make this APU way enough for good 1080p gaming. At the price of all GFX card, they could probably price it in a way that make sense.
Posted on Reply
#9
Dr_b_
Really looking forward to this more than Raptor Lake for some reason, maybe they are:

-AM5 socket will likely persist longer than socket 1700, dont have to constantly replace motherboard every year to use new CPUs

-Zen4 process node advantage over intel, means less power, and maybe more performance without hybrid core weirdness and the half baked win11

-Intel is still stuck on 8 core CPUs, PCores are still 8 Cores, just getting more eCores in RL, do i need those on desktop, personally would rather have all PCores and more of them, but intel can't pull that off in a reasonable power and thermal budget. The eCores for multi thread aren't impactful for *my* workloads, i do realize many want them for their particular use cases.

Its too bad about HEDT, x299 hasn't been updated and left for dead, TR hasn't been updated either. Miss those PCIe lanes. If mainstream had a segment with even just 16 more CPU PCIe lanes....
Posted on Reply
#10
Wirko
ratirtTrue but the most benefit Vcache will have on the APU to some extent. Maybe each segment will get vcache for the top models. Having all the line ups with and without is not practical.
Yeah, something like that is what I think too. Look at Epyc 7003 lineup - number of CCDs, number of cores, and amount of L3 come in many combinations, but far from all that would possible with up to 8 chiplets.
Posted on Reply
#12
bug
Is it just me or does it really feel like companies are quick to announce an "enhanced" or "optimized" process if a cabinet was moved somewhere in the fab?
Posted on Reply
#13
Wirko
PunkenjoyA CCD to CCD link would probably help there but that is added cost and added complexity. There would be also the challenge of knowing what is in the second CCD and when to access it directly. AMD have pattern on that and is working on resolving this issue as it will become more and more a problem.
That challenge exists and has to be solved in all of existing AMD CPUs that have more than one chiplet. Cache synchronisation may generate a lot of traffic over the IF. However, I don't see how this issue would become much more severe with more cache.
PunkenjoyBut we never know, maybe the price of the added cache will drop rapidly and it might be just few more bucks. Then yes, we will probably see a dual CCD with 3d v-cache. After all, AMD is supposedly producing a lot of those for the big cloud provider right now. This should reduce the cost in the end for consumers as production scale up.
Nah. The way the whole two-layer chip is manufactured means that it that can't become much cheaper soon (or ever). Same is true for Intel's stacking and packing of course.

On the other hand, there are cost optimisations that can be done too. Two dies (CCD + cache) have a better yield than a single bigger one. Cache die, they said, is made on a process optimised for its purpose. Also, *maybe* AMD can use the exact same cache die (6nm) for Zen 3 and Zen 4. It's often said that SRAM scales poorly when going to finer nodes, so they could possibly calculate that going to 5nm is not worth the cost.
PunkenjoyAt the price of all GFX card, they could probably price it in a way that make sense.
So they could price it in any way they desire and it would still be competitive?
bugI s it just me or does it really feel like companies are quick to announce an "enhanced" or "optimized" process if a cabinet was moved somewhere in the fab?
Chipmakers can't afford to use non-optimised processes. Every 0.1% of yield gained or lost is many million $ gained or lost. So you can believe it when they say that.
Posted on Reply
#14
Punkenjoy
WirkoThat challenge exists and has to be solved in all of existing AMD CPUs that have more than one chiplet. Cache synchronisation may generate a lot of traffic over the IF. However, I don't see how this issue would become much more severe with more cache.
More cache mean a more complex lookup(meaning taking more time).

But the main point is on desktop, the main type of workload that benefits from larger cache is gaming (and maybe compression/decompression). For gaming, it's always best to run the game inside a Single CCD as much as possible so the game won't benefits from the larger cache on the second CCD. So having an extra cache on another core that could use TDP and lower the overall clock could lead to slower performance too. Well it may just not be worth the cost. Although they could still do it for people that just want to buy the top of the top.

Right now, it's as fast to access RAM than having to do another lookup on the next CCD so if the data is not in the local L3, it just grab it from RAM. It's really only for core to core communication that a cache access will be made on the second CCD.
WirkoNah. The way the whole two-layer chip is manufactured means that it that can't become much cheaper soon (or ever). Same is true for Intel's stacking and packing of course.
That seems a bit arbitrary. I really doubt that the current manufacturing process for stacked tie is the best it will ever be and no cost reduction and volume saving can be made.
WirkoOn the other hand, there are cost optimisations that can be done too. Two dies (CCD + cache) have a better yield than a single bigger one. Cache die, they said, is made on a process optimised for its purpose. Also, *maybe* AMD can use the exact same cache die (6nm) for Zen 3 and Zen 4. It's often said that SRAM scales poorly when going to finer nodes, so they could possibly calculate that going to 5nm is not worth the cost.
That is true, but 2 die without vcache will have better yield than 2 die with since there will always be some level of defect.
WirkoSo they could price it in any way they desire and it would still be competitive?
Up to a certain point. But a decent APU could be sold in this market around 400-600$ without too much problem. There would also be a very good market for SFF desktop or "console like PC" if it can do up to 1440p, it could cost even higher.
Posted on Reply
#15
thewan
It seems like Anandtech has forgotten
Even if you and me and Anandtech knows the reason why, there will be people asking the question. So it is better to get an official answer straight from AMD because people will ask the question anyway. By getting the answer from AMD, people get an official answer that should satisfy most of them if not all of them. Rather than rely on guesses and speculation like yours. Unless you get an answer to your question from the source, you are just speculating. Speculation doesn't mean you are right. You maybe wrong, or maybe besides yours, there is more than one answer, or there is more to it then just your speculation.
I guess the Lost in your name means you are truly lost, thinking yourself as a know-it-all.
Posted on Reply
#16
docnorth
Optane Empire strikes back ;)
It is different, but I just couldn't resist...
Posted on Reply
#17
ShurikN
Wouldn't be surprised if they released a single cpu with 2 different dies. One die that has something like P-cores with 3d cache, and a die that has e-cores, which are similar to P-cores but downclocked and without 3d.
Also single die cpus can be split into 2 SKUs, with and without extra cache.
Posted on Reply
#18
dragontamer5788
PunkenjoyMore cache mean a more complex lookup(meaning taking more time).
L3 cache is totally removed from the core. In practice, L3 cache is "accessed" over MESI-like protocols (en.wikipedia.org/wiki/MESI_protocol).

L1 and L2 cache exist for a reason. L3 cache can be made as arbitrarily complex as they want. As long as L3 cache is faster than DDR4 / DDR5, then it does the job.

IIRC, L3 cache even operates on its own clock: the Infinity Fabric clock instead of the core-clock on Zen+ processors. In Intel processors, L3 cache is also on its own clock IIRC. L3 is already incredibly complex with very high latency characteristics.
PunkenjoyRight now, it's as fast to access RAM than having to do another lookup on the next CCD so if the data is not in the local L3, it just grab it from RAM. It's really only for core to core communication that a cache access will be made on the second CCD.
Latency wise, that's correct. This is a big weakpoint of the multiple-die approach AMD has made, and why its so important for AMD to increase the size of its cache. In effect, an Intel chip with 40MB of L3 cache could have (in some use cases) better performance than AMD's 2x32MB of L3 cache. Since the "other 32MB" sits on another core that's slower than main-RAM / DDR4/5, the other 32MBs doesn't help.

Only in "fully parallel" situations (where the other socket is working on a completely different problem with no sharing of data) does the other 32MB help. Fortunately, that's called "Virtual Machines" today, so AMD EPYC / Zen works out for many people's problems. I expect the video-game market to be better on the "Intel-style 40MB L3" cache scenario, rather than "32x2 MB L3 cache".

Of course, AMD can fix that by just making every die have +64MB of L3 cache (96MB total per die). Its brute force and inelegant, but you can hardly disagree with the results or theory.
Posted on Reply
#19
mama
PCi-E 6.0 specification just released...
Posted on Reply
#20
TheoneandonlyMrK
mamaPCi-E 6.0 specification just released...
Probably a while before servers get it , we're looking at two to five years depending on Intel, AMD are on 5 for 5 years.
Posted on Reply
#21
trsttte
TheLostSwedeWhat strikes us a bit odd about the Anandtech article, is that they mention the fact that several of TSMC's customers are already making 4 nm and soon 3 nm chips and are questioning why AMD wouldn't want to be on these same nodes. It seems like Anandtech has forgotten that not all process nodes are universally applicable and just because you can make one type of chip on a smaller node, doesn't mean it'll be suitable for a different type of chip. For the longest of times, mobile SoCs or other similar chips seem to always have been among the first things being made on new nodes, with more complex things like GPUs and more advanced CPUs coming later, to tweaked versions of the specific node. The fact that TSMC has no less than three 7 nm nodes, should be reason enough to realise that the leading edge node might not be the ideal node for all types of chips.
I think the price is the most important factor, smartphone dies are much smaller than gpus for example. Phones are also being sold at crazy prices and qualcomm/apple are enjoying some very fat margins which don't exist on the computer space (I mean, not in CPUs anyway, GPUs are all over the place). So for Apple/Samsung/Qualcomm to jump on the new flashy node is great for marketing even if yields are still on the low side, AMD on the other hand has to wait for yields to stabilize.

At least that's my theory anyway
Posted on Reply
#22
Minus Infinity
I reckon we'll see Zen 4c without 3D cache and higher end Zen 4 with it. Might not see it in 7600X, but Zen 4 has 2x the L2 cache as Zen 3.
Posted on Reply
#23
sillyconjunkie
ratirtSo the 2 different chiplets with vcache and without are a thing. No shock here for me to be fair. The question is how those will be divided. 2 different desktop chips or are we talking about APU vs Desktop.
Crystal Ball......0

3DV for some number of skus not requiring an IO chiplet. ie..6 and 8 core. Multi-chiplet skus will all be non-3DV
Posted on Reply
#24
Kohl Baas
ratirtSo the 2 different chiplets with vcache and without are a thing. No shock here for me to be fair. The question is how those will be divided. 2 different desktop chips or are we talking about APU vs Desktop.
AFAIK, APUs are still monolithic, so no chiplets there...
Posted on Reply
#25
TheLostSwede
thewanEven if you and me and Anandtech knows the reason why, there will be people asking the question. So it is better to get an official answer straight from AMD because people will ask the question anyway. By getting the answer from AMD, people get an official answer that should satisfy most of them if not all of them. Rather than rely on guesses and speculation like yours. Unless you get an answer to your question from the source, you are just speculating. Speculation doesn't mean you are right. You maybe wrong, or maybe besides yours, there is more than one answer, or there is more to it then just your speculation.
I guess the Lost in your name means you are truly lost, thinking yourself as a know-it-all.
Did you read the article on Anandtech? It's talking about AMD using the 5nm node, then questions why AMD isn't using a more cutting edge node, when the writer full and well knows those nodes are not optimised for the kind of chips AMD makes, since Apple and MediaTek are using them, so why can't AMD. That's the issue I'm having with their article, as Ian Cutress should know better. On top of that, it's not normal to make huge node jumps, bypassing nodes that are the next step to the one you're on, presumably because it can lead to more problems than its worth. I have zero issues with the question he put to AMD, as it was a sensible question, it's the bit before that, that doesn't make sense.

If you read the comments on Anandtech, their own readers are pretty much saying the exact same thing.

You're having a go at me without knowing anything about me. I've been writing about this stuff for over 20 years, I've been to Intel's fabs, I've been to GloFo fab conferences and I've met the people that started a lot of the tech sites that carry their names to this day, but are no longer working there themselves. But yeah, I'm the one that's lost and that is a know-it-all, because I'm just some random person on a forum... Honestly dude, maybe at least check up on who you're having a go at first.

You don't have to agree with my thoughts on what Ian wrote, but as I said, he should really know better in this case, as low power nodes aren't suitable for making desktop CPUs and GPUs and that's common industry knowledge that he also has.
trsttteI think the price is the most important factor, smartphone dies are much smaller than gpus for example. Phones are also being sold at crazy prices and qualcomm/apple are enjoying some very fat margins which don't exist on the computer space (I mean, not in CPUs anyway, GPUs are all over the place). So for Apple/Samsung/Qualcomm to jump on the new flashy node is great for marketing even if yields are still on the low side, AMD on the other hand has to wait for yields to stabilize.

At least that's my theory anyway
Most important, maybe not, but it's obviously a top three factor, since the most important one is being that the foundry has a suitable node for your chip design, since you always have a design target and changing that design target is apparently a 6-12 month job in most cases. Then there's allocation, as if you get none, you're not making any chips. After that, cost I would say. Qualcomm is actually quite far behind on the nodes, as they went with Samsung something or the other at 8 nm or below, whereas Apple is at TSMC's 4 nm and supposedly on whatever node TSMC is working on next, since Apple is pretty much paying for TSMC's push towards smaller and smaller nodes. Samsung is going to be behind MediaTek if they really are going to be on the 3 nm node this year.

It's also not really about a "flashy node", most of these companies can't advanced their products without a node shrink. That's actually what was quite impressive with Nvidia, they managed to squeeze out a lot of extra performance while being stuck on the same node for three generations (Fermi, Kepler and Maxwell), something that is quite rare. It goes to show that some companies are capable of making do with what's available. Obviously Intel was stuck for a very long time, but then again, we didn't see nearly as good performance advances from them as Nvidia managed.

As mentioned above, AMD has to wait because they need to be on what used to be called a high power version of the node, as the low power versions used for MCUs and ARM/RISC-V/MIPS based SoCs, are not suitable for desktop CPUs and GPUs. This has been the case for as long as I've been writing about this stuff, which is as I mentioned, over 20 years by now.
Minus InfinityI reckon we'll see Zen 4c without 3D cache and higher end Zen 4 with it. Might not see it in 7600X, but Zen 4 has 2x the L2 cache as Zen 3.
Will we even get quad core Zen 4 based CPUs?
Posted on Reply
Add your own comment