Friday, June 19th 2020

AMD "Renoir" Die Annotation Raises Hopes of Desktop Chips Featuring x16 PEG

VLSI engineer Fritzchens Fritz, famous for high-detail EM photography of silicon dies and annotations of them, recently published his work on AMD's 7 nm "Renoir" APU silicon. His die-shots were annotated by Nemez aka GPUsAreMagic. The floor-plan of the silicon shows that the CPU component finally dwarfs the iGPU component, thanks to double the CPU cores over the previous-gen "Picasso" silicon, spread over two CCXs (compute complexes). The CCX on "Renoir" is visibly smaller than the one on the "Zen 2" CCDs found in "Matisse" and "Rome" MCMs, as the L3 cache is smaller, at 4 MB compared to 16 MB. Being MCMs with disintegrated memory controllers, it makes more sense for CCDs to have more last-level cache per CCX.

We also see that the iGPU features no more than 8 "Vega" NGCUs, so there's no scope for "Renoir" based desktop APUs to feature >512 stream processors. AMD attempted to compensate for the NGCU deficit by dialing up engine clocks of the iGPU by over 40% compared to those on "Picasso." What caught our eye in the annotation is the PCI-Express physical layer. Apparently the die indeed has 20 PCI-Express lanes besides an additional 4 lanes that can be configured as two SATA 6 Gbps ports thanks to SerDes flexibility.
This would mean that "Renoir" can finally spare 16 lanes toward PEG (PCI-Express graphics, or the main x16 slot on your motherboard), besides 4 lanes toward the chipset-bus, and the final four lanes allocated to the M.2 NVMe slot that's wired to the AM4 socket, on a typical desktop platform. On the mobile platforms, "Renoir" processors spare no more than 8 lanes toward PEG (discrete graphics), including when paired with discrete GPUs such as the GeForce RTX 2060 (mobile), which is capable of gen 3.0 x16. Previous generation desktop APUs such as "Picasso" and "Raven Ridge" spare no more than 8 PCIe gen 3.0 lanes toward PEG, even on the desktop platform. x16 PEG capability would bolster the credentials of desktop "Renoir" processors for premium gaming PC builds, using some of the top SKUs such as the Ryzen 7 4700G.
Sources: Fritzchens Fritz (Flickr), Nemez (Twitter)
Add your own comment

25 Comments on AMD "Renoir" Die Annotation Raises Hopes of Desktop Chips Featuring x16 PEG

#1
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.

As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.

If it were me designing the APU I would have thrown out the die area wasted on 20 PCIe lanes and used it to increase the CU count to 12 or 15 instead.
Posted on Reply
#2
mainlate
AsRock for ex. B550M Steel Legend:



Gigabyte for ex. B550I AORUS PRO AX:

Posted on Reply
#3
theblackitglows
@Chrispy_

Have you even took a glimpse at the die? 8 Pcie lanes barely take the space of one CU.

Also how do you know Renoir will be worse at gaming? Let's wait for benchmarks.
Posted on Reply
#4
cucker tarlson
x16 PEG capability would bolster the credentials of desktop "Renoir" processors for premium gaming PC builds
premium gaming pc build gotta have a 10tflop gpu at this point
not an integrated part
Posted on Reply
#5
Xajel
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.

As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.

If it were me designing the APU I would have thrown out the die area wasted on 20 PCIe lanes and used it to increase the CU count to 12 or 15 instead.
That gave them more die area for more CPU Core area, allowed them to add two 4C CCX. added more area for the IMC to support both DDR4 and LP-DDR4, plus other PCIe lanes and other features.
Graphics CU's consume too much area, seeing the APU without the GPU (which includes the CU (PS other GPU logics like the ROB), Media & Display Engines and the Display PHY's), they can't reduce the size of the CU, it's the easier part to use lower CU's with higher clocks than more CU's with lower clocks.
Posted on Reply
#6
ppn
With On-die memory controller CL16 memory would perform like CL14. saving 2 cycles but ZEN2 still, would lose badly to ZEN3.
Posted on Reply
#7
AnarchoPrimitiv
cucker tarlson
premium gaming pc build gotta have a 10tflop gpu at this point
not an integrated part
Yeah, the article saying that thanks to the x16 PCIe lanes on the APU, it can be teamed with discrete graphics... I thought that was obvious
Posted on Reply
#8
ARF
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.

As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.

If it were me designing the APU I would have thrown out the die area wasted on 20 PCIe lanes and used it to increase the CU count to 12 or 15 instead.
Ryzen 7 3700X is not faster than the Ryzen 7 4700G.

Look:

www.tomshardware.com/news/amd-ryzen-4000-renoir-desktop-benchmarks

Amd/comments/c7ejdf
About 20% faster than the 2700X. Very nice.

About 5~6% behind ~5GHz 9900k with a ~13.6% frequency deficit.
The 3700x is about 12-19% faster than a 2700x.
Posted on Reply
#10
Fouquin
cucker tarlson
are going just by 3dmark score ? synthetics are not a good measurement of a cpu's gaming performance.
So go read one of the Zephyrus G14 w/ Ryzen 9 4900HS reviews that's already out to gauge CPU gaming performance.
Posted on Reply
#11
cucker tarlson
Fouquin
So go read one of the Zephyrus G14 w/ Ryzen 9 4900HS reviews that's already out to gauge CPU gaming performance.
isn't 4900 different from 4700 ?
and how exactly do you suggest I compare it to 3700x ?
Posted on Reply
#12
Chrispy_
theblackitglows
@Chrispy_
Have you even took a glimpse at the die? 8 Pcie lanes barely take the space of one CU.
I'd argue that it's pretty close to two CUs. Eyeballing the annotated die area I think a PCIe x4 rectangle is about 10-20% more area than a single CU, so removing 8 lanes would make room for a Vega 10 or Vega 11. That wasn't really my point though because if you can believe it, I'm not actually an AMD APU layout designer!
theblackitglows
Also how do you know Renoir will be worse at gaming? Let's wait for benchmarks.
I think you've been caught sleeping; Renoir APU graphics benchmarks have been out for a while. I think the Anand review landed about a month ago and people on these very forums have been buying Renoir laptops since before that.

Whilst it's true that Renoir's launch has been hampered by COVID, the earliest benchmarks acutally go back as far as February, with more concrete stuff appearing in April and May seeing general availability of several models of Renoir laptop. I'm annoyed slightly that the 4800U with LPDDR4X hasn't been reviewed yet, but I don't really think it matters at this point. Vega8 isn't good enough to make a meaningful upgrade over Raven Ridge and anyone wanting actuall GPU performance is going to have to wait until the next generation or settle for a dGPU; The benchmarks of 4700U + DDR4-3200 are within 20% of my own 2700U using CL14 DDR4-2400 and a 22W power limit (stock was 20W)

As for pre-release Tiger Lake vs current APUs? 30fps on beta/pre-release drivers vs 25fps for Renoir.
[MEDIA=twitter]1273352056208850944[/MEDIA]
Like it or not, AMD APUs have always been slightly short of achieving AAA gaming at reasonable native-res 30fps. Renoir is such a sidegrade that AMD still haven't quite achieved that basic goal and now it's looking like Intel will beat them to it after AMD have been tantalising gamers with "almost enough" for three years. AMD had all the ingredients to bake a great APU and failed this generation. It's a CPU first and the GPU is clearly an afterthought with cut-down specs and less die area than before. I'm going to enjoy watching Intel kick their ass because as much as I want AMD to succeed, they've really fudged their APUs this generation and they need a kick up the ass.
Posted on Reply
#13
Fouquin
cucker tarlson
isn't 4900 different from 4700 ?
and how exactly do you suggest I compare it to 3700x ?
No. It's full Renoir on both. You're right in one way though, the current rumors surrounding 4700G suggest it'll boost higher.

Put as much energy into finding a good comparison point as you do arguing on tech forums and I'm sure you'll find a way.
Posted on Reply
#14
R0H1T
cucker tarlson
premium gaming pc build gotta have a 10tflop gpu at this point
not an integrated part
So you're saying Intel's entire MSDT lineup is junk :rolleyes:
Posted on Reply
#15
THANATOS
Xajel
That gave them more die area for more CPU Core area, allowed them to add two 4C CCX. added more area for the IMC to support both DDR4 and LP-DDR4, plus other PCIe lanes and other features.
Graphics CU's consume too much area, seeing the APU without the GPU (which includes the CU (PS other GPU logics like the ROB), Media & Display Engines and the Display PHY's), they can't reduce the size of the CU, it's the easier part to use lower CU's with higher clocks than more CU's with lower clocks.
I am looking at the included dieshot and 3CU are not much bigger than 1 Zen core, so It certainly didn't allow them to add one more 4C CCX, IMC, PCIe lanes and other stuff.
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.
....
The clocks It can sustain is still higher than the previous generation.
What would happen If they kept 3CU or added another one for a total of 12CU? It would clock even lower than a 8CU IGP within a limited TDP, because 11-12CU consumes more power than 8CU. The limit is still powerbudget and bandwidth so why bother adding or keeping more CU If It won't significantly increase performance? If they resolve the bandwidth limit, then they can increase the CU count and set TDP higher and for 15W TDP there will be a cutdown version.
Posted on Reply
#16
john_
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.
Renoir with 8CUs offers the best iGPU, for now and AMD also sells discrete graphics cards. So they offered as much performance as necessary to
- be faster than the competition
- not be slower than last gen APUs
- not threaten their discrete GPU sales


As an AMD fan for more than 20 years, I hope Intel to teach them a lesson.
Chrispy_
As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.
You would totally cripple the expandability options of the motherboard with only 8 lanes.
Posted on Reply
#17
king of swag187
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.

As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.

If it were me designing the APU I would have thrown out the die area wasted on 20 PCIe lanes and used it to increase the CU count to 12 or 15 instead.
The point is, assuming you can sustain the clock speed (which alot of laptops can, when the TDP is unlocked) you can get significantly more performance.

As for 20 PCIE lanes, its something for people who are interested in it, more is always better I guess. There really isn't a reason to dunk on it IMO, its better than the x8 we where fed previously

As for the 3700X vs the (potentially named) 4700G, the biggest deficit of the 3700X as well as other Zen 2 CPUs is the core to core latency, which wouldn't be present here,. See the 3100 to 3300X differences, assuming that the difference of moving to a single CCX will benefit the (potential) 4700G, it should perform the same, if not better.
Posted on Reply
#18
Tom Yum
Chrispy_
I still don't quite see the rationale behind dropping 3 CUs from the graphics core. Especially not since the single most distinctive point of an APU is its graphics.

Sure, they've ramped clocks up but most of the laptops reviewed so far aren't sustaining those clocks, so really it's still the same Vega CUs as it was in Raven ridge and the GPU clocks are still at the mercy of the power budget and RAM bandwidth, but now we have 3 fewer of them and Tiger lake early silicon is already outperforming it on beta drivers, apparently.

As for having 20PCIe lanes - What's the point? Anyone wanting to run a dGPU will just buy a 3700X instead and enjoy the superior performance of more cache and even more PCIe lanes.

If it were me designing the APU I would have thrown out the die area wasted on 20 PCIe lanes and used it to increase the CU count to 12 or 15 instead.
It could be that AMD consider (somewhat validly) that the graphics performance provided is enough for the target market (after all, Intel has sold garbage igpu for years without it being an impediment to market success). It could also be that current memory bandwidth available from DDR4 is not enough to support higher iGPU performance, whether from a wider but slower igpu like in Picasso, or a narrower but faster igpu from Renoir. Remember you have a 8 core high performance CPU and a MX250 level GPU sharing less bandwidth than a GT 1030. The alternative is AMD starting to embed dedicated memory but then that runs into the first issue, the market won't pay extra for performance it doesn't need. Gamers will continue to go with dgpu's, and businesses couldn't give a stuff. The hobbyist ITX crowd that might care about higher performance igpu's are a miniscule market for a company as resource constrained as AMD to target.
Posted on Reply
#19
john_
Tom Yum
It could also be that current memory bandwidth available from DDR4 is not enough to support higher iGPU performance, whether from a wider but slower igpu like in Picasso, or a narrower but faster igpu from Renoir.
The difference in performance between for example, 3200G and 3400G, even with slower Zen+ cores, shows that 11 CUs can make a difference compared to 8 CUs even with that bandwidth that a dual channel DDR4 can provide.
The alternative is AMD starting to embed dedicated memory but then that runs into the first issue, the market won't pay extra for performance it doesn't need.
Sideport memory was a feature that someone could find in $50 AM3 motherboards. So, it's not really that much expensive and definitely the technology is available for over 15 years. They don't have to develop something new.
Posted on Reply
#20
R0H1T
I always wanted to get those sideport memory boards, alas :ohwell:
Posted on Reply
#21
Tom Yum
john_
The difference in performance between for example, 3200G and 3400G, even with slower Zen+ cores, shows that 11 CUs can make a difference compared to 8 CUs even with that bandwidth that a dual channel DDR4 can provide.

Sideport memory was a feature that someone could find in $50 AM3 motherboards. So, it's not really that much expensive and definitely the technology is available for over 15 years. They don't have to develop something new.
That's irrelevant to my argument, 8CU Renoir in mobile APU's performs better than desktop 11CU Picasso, because of the clockspeed difference.
. It is likely that desktop Renoir will perform higher due to higher clocks from having a higher power budget than the 4800HS in the video (which also only has 7CU's, not 8).

My point is AMD may have determined that the APU performance provided by 8 CU Renoir is the peak that can be achieved with DDR4 which is why they cut it down to 8CU max, saving silicon for more CPU cores.

Regarding Sideport memory, I remember that as I had a mobo with it (256mb, 32 bit from memory). But again, you miss my point, it is not a technical challenge that prevents them, it is an economic one. AMD is providing more iGPU power than people are willing to pay for, so it makes no sense to increase mobo cost to make a faster iGPU when people who care get a dGPU and people that don't are happy with what is provided (which is double what Intel are currently providing, considering tiger lake isn't released yet).
Posted on Reply
#22
john_
Tom Yum
That's irrelevant to my argument, 8CU Renoir in mobile APU's performs better than desktop 11CU Picasso, because of the clockspeed difference.
. It is likely that desktop Renoir will perform higher due to higher clocks from having a higher power budget than the 4800HS in the video (which also only has 7CU's, not 8).

My point is AMD may have determined that the APU performance provided by 8 CU Renoir is the peak that can be achieved with DDR4 which is why they cut it down to 8CU max, saving silicon for more CPU cores.

Regarding Sideport memory, I remember that as I had a mobo with it (256mb, 32 bit from memory). But again, you miss my point, it is not a technical challenge that prevents them, it is an economic one. AMD is providing more iGPU power than people are willing to pay for, so it makes no sense to increase mobo cost to make a faster iGPU when people who care get a dGPU and people that don't are happy with what is provided (which is double what Intel are currently providing, considering tiger lake isn't releases yet).
It's not just clockspeed. Renoir comes with Zen 2 cores that perform much better in applications like games. Not to mention that we also talk about real cores, not multithreading and also Renoir comes with lower latency between the cores.

And no, the argument wasn't irrelevant. You just didn't understood it. Even with slower Zen+ cores the difference between a 3200G and a 3400G is there, even in games where not more than 4 cores/threads are needed. This means that 11 CUs in Renoir could offer more performance than 8 CUs. And because of that 65W TDP that you mention, the iGPU in a desktop Renoir could really shine. But then, who needs an RX 550? Probably no one.

Sideport memory could be used in mini ITX and micro ATX motherboards that where going to be used in mini PCs without discrete graphics.

Look, you make a lot of assumptions to support your point of view, even about what people want. No problem with that. But assumptions are not facts and no, what people need is not what you believe they need. And people sometime know what they need, but they don't know how to get it. So you see examples where someone is going out and buys a GT 730 with 4GB DDR3 memory on a 64bit data bus, to play games because the iGPU is slow.

Features like sideport memory are there as an option. Offering sideport memory doesn't mean that every motherboard out there will rush to implement it making it immediately more expensive. You are wrong here, again.

P.S. If Intel manages to win the iGPU battle and next AMD APUs come with 12 CUs, sideport memory and Hybrid graphics, then let's see what you will say then.
Posted on Reply
#23
Tom Yum
john_
It's not just clockspeed. Renoir comes with Zen 2 cores that perform much better in applications like games. Not to mention that we also talk about real cores, not multithreading and also Renoir comes with lower latency between the cores.

And no, the argument wasn't irrelevant. You just didn't understood it. Even with slower Zen+ cores the difference between a 3200G and a 3400G is there, even in games where not more than 4 cores/threads are needed. This means that 11 CUs in Renoir could offer more performance than 8 CUs. And because of that 65W TDP that you mention, the iGPU in a desktop Renoir could really shine. But then, who needs an RX 550? Probably no one.

Sideport memory could be used in mini ITX and micro ATX motherboards that where going to be used in mini PCs without discrete graphics.

Look, you make a lot of assumptions to support your point of view, even about what people want. No problem with that. But assumptions are not facts and no, what people need is not what you believe they need. And people sometime know what they need, but they don't know how to get it. So you see examples where someone is going out and buys a GT 730 with 4GB DDR3 memory on a 64bit data bus, to play games because the iGPU is slow.

Features like sideport memory are there as an option. Offering sideport memory doesn't mean that every motherboard out there will rush to implement it making it immediately more expensive. You are wrong here, again.

P.S. If Intel manages to win the iGPU battle and next AMD APUs come with 12 CUs, sideport memory and Hybrid graphics, then let's see what you will say then.
Are you seriously trying to claim that the Picasso iGPU is CPU limited? Because that is the only way that Renoir's equal performance compared to higher CU Picasso could be attributed to its Zen 2 cores. Why are you comparing 3200g and 3400g in this discussion? Your point has been that AMD shrinking down the CU count in Renoir reflects them making a conscious decision to keep the status quo, my point has been that 1) 8CU Renoir likely beats 11CU Picasso (given power constrained 7CU Renoir equals 11 CU Picasso), and 2) AMD's decision to not press for 11CU Renoir is likely because they are hitting up against bandwidth limits from DDR4. It is the same as strapping a 2080 with 64b GDDR3, it wouldn't perform any better than a 1650 because it would be so bandwidth limited. You can already see that even with Picasso by seeing how much performance goes up with higher memory speeds, it's clearly starved of bandwidth, and Renoir would be even more starved when it has a 8 core high performance CPU to also feed. Tiger Lake may perform better, I wait to see that actually benched rather than iffy leaks with little detail. If Intel does release a 12 CU equivalent iGPU that fits within a similar power limit and out performs Renoir, then I'll congratulate Intel, and still think AMD made a sound technical decision to set the CU count as they did with Renoir.
Posted on Reply
#24
Chrispy_
Tom Yum
It could be that AMD consider (somewhat validly) that the graphics performance provided is enough for the target market (after all, Intel has sold garbage igpu for years without it being an impediment to market success).
This is both the most depressing and the most likely possibility.

"Intel got filthy rich by serving up hot garbage for decades, let's see if it works for us!"
- AMD
Posted on Reply
#25
john_
Tom Yum
Are you seriously trying to claim that the Picasso iGPU is CPU limited? Because that is the only way that Renoir's equal performance compared to higher CU Picasso could be attributed to its Zen 2 cores.
Games are a certain type of application where IPC plays an important role. I think when you are talking about "CPU limited" you have for example, Blender in your mind, not games. IPC plays an important role and don't forget that these type of GPUs are tested in low settings usually. They are not tested in 4K and ultra setting. So, yes, Zen 2 can give a boost in those kind of benchmarks.

No one said that Zen 2 cores are the only reason for the extra performance from those 8 CUs. I don't have time to keep repeating what I wrote, because you try to force your own conclusions while twisting the meaning of my posts. I am going to ignore these kind of conclusions and tricky questions from now and on.
Why are you comparing 3200g and 3400g in this discussion?
I am not going to keep repeating myself.
Your point has been that AMD shrinking down the CU count in Renoir reflects them making a conscious decision to keep the status quo, my point has been that 1) 8CU Renoir likely beats 11CU Picasso (given power constrained 7CU Renoir equals 11 CU Picasso), and 2) AMD's decision to not press for 11CU Renoir is likely because they are hitting up against bandwidth limits from DDR4.
No one will point the finger at you if you permit someone else to have a different opinion than yours. I believe that it has been already proven that the bandwidth is enough to feed 11 CUs for low/mid settings 720p/1080p gaming, you believe it's not enough. OK.
It is the same as strapping a 2080 with 64b GDDR3, it wouldn't perform any better than a 1650 because it would be so bandwidth limited.
11CUs are NOT a 2080 Ti. I understand what you mean, but when you feel the need to use the fastest card in the world to make a point, I don't buy it.
You can already see that even with Picasso by seeing how much performance goes up with higher memory speeds, it's clearly starved of bandwidth, and Renoir would be even more starved when it has a 8 core high performance CPU to also feed.
Let's agree that we disagree here.
Tiger Lake may perform better, I wait to see that actually benched rather than iffy leaks with little detail. If Intel does release a 12 CU equivalent iGPU that fits within a similar power limit and out performs Renoir, then I'll congratulate Intel, and still think AMD made a sound technical decision to set the CU count as they did with Renoir.
"Sound technical decision" to lose the iGPU advantage. OK.
Posted on Reply
Add your own comment