Thursday, July 20th 2023

AMD "Strix Point" Zen 5 Monolithic Silicon has a 12-core CPU?

It looks like the monolithic silicon that succeeds "Phoenix," codenamed "Strix Point," will finally introduce an increase in CPU core counts for the thin-and-light and ultraportable mobile platforms. "Strix Point" is codename for the next-generation APU die being developed at AMD, which, according to a leaked MilkyWay@Home benchmark result, comes with a 12-core/24-thread CPU.

The silicon is identified by MilkyWay@Home with the OPN "AMD Eng Sample: 100-000000994-03_N," and CPU identification string "AuthenticAMD Family 26 Model 32 Stepping 0 -> B20F00." The "Strix Point" CPU could be the second time AMD has increased CPU core-counts per CCX. From "Zen 3" onward, the company increased the cores per CCX from 4 to 8, allowing a single "Zen 3" CCX on the "Cezanne" monolithic silicon to come with 8 cores. It's highly likely that with "Zen 5," the company is increasing the cores/CCX to 12, and that "Strix Point" has one of these CCXs.
"Strix Point" processors will be branded under the Ryzen 8000 series. Besides the 12-core Zen 5 CPU, it is expected to feature an updated iGPU based on the RDNA3 Gen 2 graphics architecture, and an upgraded memory interface, with support for higher DDR5 and LPDDR5 memory speeds. It's likely that the AMD Radiance Display Engine finds its way to the silicon, as well as an updated XDNA Ryzen AI accelerator. AMD is expected to debut Zen 5 in 2024, with "Strix Point" squaring off against Intel's Core "Meteor Lake" processors.
Sources: MilkyWay@home database, BenchLeaks (Twitter), VideoCardz
Add your own comment

45 Comments on AMD "Strix Point" Zen 5 Monolithic Silicon has a 12-core CPU?

#26
wNotyarD
Count von SchwalbeReduced prices as the die area is much much lower. 4-core CPUs don't make any sense as TSMC N5 is pretty mature, so there is not much reason to disable two more working cores and reduce profit margins.

Now a 6-core Zen 4c CCD would make sense, and disabled units for Athlons.
I mean, as (FINALLY) Intel is raising the core count on their i3's it would make sense for AMD to follow suit. Be it with 4C Athlons and 6C R3's (also, while we're at it, why not making R5's go 8C and R7's go 12C, reserve R9 for 16C).
TumbleGeorgeAPU's are... monolithic design...
We are considering hypothetical Zen 4c designs, so that's where the CCX/CCD lingo came from.
Posted on Reply
#27
AnotherReader
Count von SchwalbeReduced prices as the die area is much much lower. 4-core CPUs don't make any sense as TSMC N5 is pretty mature, so there is not much reason to disable two more working cores and reduce profit margins.

Now a 6-core Zen 4c CCD would make sense, and disabled units for Athlons.
That would be a good replacement for the Zen 2 based SKUs. Despite the anticipated clock speed handicap, they should be faster than those.
Posted on Reply
#28
Count von Schwalbe
TumbleGeorgeAPU's are... monolithic design...
Yep. EPYC is not.
AnotherReaderThat would be a good replacement for the Zen 2 based SKUs. Despite the anticipated clock speed handicap, they should be faster than those.
Much much faster.
Posted on Reply
#29
qcmadness
wNotyarDThat's the point. Zen 4c has half the L3 cache of Zen 4, that's what makes it smaller. Clock speeds are then to be defined by binning, core count and power target.
The APUs have half the L3 cache of desktop / server variants.
Posted on Reply
#30
HD64G
I guess that a monolithic APU with 12 Zen5c cores will have a bit lower clocks than the Zen5 cores in the normal dual chiplets. But since it will have a relatively powerful iGPU in it, the graphic performance won't be hindered by the cache being halved. Price will be the point to discuss because technologically that APU will be a marvelous product.
Posted on Reply
#31
Darmok N Jalad
wNotyarDActually, for an Athlon it makes a lot of sense. If memory doesn't fail me K10 Phenoms and Athlons were separated similarly.
My first Socket A cpu was a Dothan, which was the Athlon with less L2.

All this got me thinking though. AMD is the only vendor NOT doing asymmetric cores. Mobile devices have it, Intel has it, Apple has it. I think AMD is going to have to do something here, or they might just lose the core race they started. Maybe them saying they have no intention of an E core is really that they will do something like the C core. Curious which one has more cache anyway, an AMD C core or an Intel E core?
Posted on Reply
#32
AnotherReader
Darmok N JaladMy first Socket A cpu was a Dothan, which was the Athlon with less L2.

All this got me thinking though. AMD is the only vendor NOT doing asymmetric cores. Mobile devices have it, Intel has it, Apple has it. I think AMD is going to have to do something here, or they might just lose the core race they started. Maybe them saying they have no intention of an E core is really that they will do something like the C core. Curious which one has more cache anyway, an AMD C core or an Intel E core?
An AMD C core and the Raptor Cove E core have the same amount of L2 cache: 1 MB per core. In Raptor Cove's case, this L2 is shared across 4 cores for a total of 4 MB per cluster. Of course, the C core has L3 as well.
Posted on Reply
#33
Minus Infinity
Chrispy_IIRC these aren't likely to be 12 full "P-cores" to use Intel's existing nomenclature.
No one can give reliable information on the config. AMD is testing both 8P + 4E and 12P versions. AMD probably don't know what they'll ship yet this far out from release.
Posted on Reply
#34
persondb
If Phoenix is any indication, then AMD really needs to focus on driver support for their APUs as Phoenix has been a shitshow. Been months since products with it have been rolling off and there is no official support, only beta drivers from OEMs that basically barely work and no indication that AMD will ever actually integrate full support of it within the main drivers.

So a mega APU sounds great, but what is the point if AMD can't deliver the support for it?
Posted on Reply
#35
Minus Infinity
persondbIf Phoenix is any indication, then AMD really needs to focus on driver support for their APUs as Phoenix has been a shitshow. Been months since products with it have been rolling off and there is no official support, only beta drivers from OEMs that basically barely work and no indication that AMD will ever actually integrate full support of it within the main drivers.

So a mega APU sounds great, but what is the point if AMD can't deliver the support for it?
Never buy AMD products on first release is the simple rule. Wait 6 months at least and then see what the driver feedback is.
Posted on Reply
#36
TumbleGeorge
persondbIf Phoenix is any indication, then AMD really needs to focus on driver support for their APUs as Phoenix has been a shitshow. Been months since products with it have been rolling off and there is no official support, only beta drivers from OEMs that basically barely work and no indication that AMD will ever actually integrate full support of it within the main drivers.

So a mega APU sounds great, but what is the point if AMD can't deliver the support for it?
I search in the titles in forum "Phoenix problem" and found nothing.
Posted on Reply
#37
persondb
TumbleGeorgeI search in the titles in forum "Phoenix problem" and found nothing.
You just didn't look hard enough, examples:

community.amd.com/t5/drivers-software/amd-ryzen-7-7840u-adrenaline-drivers/m-p/614112

ROGAlly/comments/154ct2p
Amd/comments/14gngoj
People have a lot of instability with those systems and as an example, report crashing and being completely unable to use the AI tools from Photoshop and/or Lighroom, which is a big ooof, in my view as part of the point of Phoenix was the hardware engine for AI stuff.
Minus InfinityNever buy AMD products on first release is the simple rule. Wait 6 months at least and then see what the driver feedback is.
It has already been months since the first Phoenix products hit the market. I think they started to come in by May? It was supposed to be before, I think but AMD had delays.
Posted on Reply
#38
TumbleGeorge
persondbYou just didn't look hard enough, examples:

community.amd.com/t5/drivers-software/amd-ryzen-7-7840u-adrenaline-drivers/m-p/614112

ROGAlly/comments/154ct2p
Amd/comments/14gngoj
People have a lot of instability with those systems and as an example, report crashing and being completely unable to use the AI tools from Photoshop and/or Lighroom, which is a big ooof, in my view as part of the point of Phoenix was the hardware engine for AI stuff.



It has already been months since the first Phoenix products hit the market. I think they started to come in by May? It was supposed to be before, I think but AMD had delays.
Thanks.
I meant in the forum here at TPU, with the key phrase that was written in my previous comment and search in the titles.
Posted on Reply
#39
qcmadness
AnotherReaderAn AMD C core and the Raptor Cove E core have the same amount of L2 cache: 1 MB per core. In Raptor Cove's case, this L2 is shared across 4 cores for a total of 4 MB per cluster. Of course, the C core has L3 as well.
Zen 4 and Zen 4c literally have the same IPC, while the P and E cores in 12th / 13th Gen has 50%+ IPC difference (P-core is 50% faster than E core at same frequency).
Posted on Reply
#40
Chrispy_
Minus InfinityNo one can give reliable information on the config. AMD is testing both 8P + 4E and 12P versions. AMD probably don't know what they'll ship yet this far out from release.
True, this is all speculation, but monolithic products are mobile parts and I'm not sure 12P fits the 15-45W envelope in a viable way. At least not with today's heat density and manufacturing capabilities.

Historically, monolithic mobile-first parts have been clocked up to 65W TDPs as desktop APUs later, but their primary focus is on performance/Watt because cooling and battery life are always more important than outright performance in a laptop part. That's why I'm agreeing with the rumour that these are 4x Zen5 and 8x Zen5c. Maybe I'm wrong, but at least you have my reasoning now.
Posted on Reply
#41
AnotherReader
qcmadnessZen 4 and Zen 4c literally have the same IPC, while the P and E cores in 12th / 13th Gen has 50%+ IPC difference (P-core is 50% faster than E core at same frequency).
That's true, but the question was about the amount of cache per core for Zen 4c and Gracemont in Raptor Lake.
Posted on Reply
#42
Count von Schwalbe
AnotherReaderThat's true, but the question was about the amount of cache per core for Zen 4c and Gracemont in Raptor Lake.
Zen 4c and Zen 4 mobile have the same cache. Gracemont is weird as it has shared L2 and L3. Raptor Lake Gracemont has 4MB per cluster of 4 cores, so 1MB each. As these are strictly for MT tasks, I suppose it is better as shared?

Zen 4c is just a space-optimized Zen 4 core, rather than a clockspeed-optimized one. L3 is halved like the mobile Zen 4 parts.
Posted on Reply
#43
AnotherReader
Count von SchwalbeZen 4c and Zen 4 mobile have the same cache. Gracemont is weird as it has shared L2 and L3. Raptor Lake Gracemont has 4MB per cluster of 4 cores, so 1MB each. As these are strictly for MT tasks, I suppose it is better as shared?

Zen 4c is just a space-optimized Zen 4 core, rather than a clockspeed-optimized one. L3 is halved like the mobile Zen 4 parts.
They are lower performance cores so sharing the L2 is good from two points of view:
  • higher capacity means more hits in L2
  • fewer misses to the much slower L3
Drawbacks compared to a private L2 are:
  • higher latency
  • more wiring (probably a 4 port crossbar) requires more area
  • lower bandwidth when more than one requester is active, i.e. when more than one core is accessing L2
Chips and Cheese did a deep dive on Gracemont in the 12900k. This was when it had 2 MB L2 per cluster. Let's compare to an AMD SOC that had 4 core clusters too. The latencies are:

Cache LevelGracemont size (kiB)Zen 2 size (kiB)Gracemont latency (cycles)Zen 2 latency (cycles)
L1D323234
L220485121712
L330163847444


Let's look at bandwidth from the cache in a mulithreaded test.

Cache LevelGracemont size (kiB)Zen 2 size (kiB)Gracemont bandwidth (GB/s)Zen 2 bandwidth (GB/s)
L1D32324991057
L22048512208535
L3301638461321


Now this makes it clear why Intel chose a large, shared L2. Intel's L3 has poor bandwidth for a Gracemont cluster, but it doesn't matter because the L2 makes up for it.

As an aside, Gracemont has a large L1 I-cache which is something that the bigger cores should copy. This figure from Hirki et al's paper on Haswell's power consumption also shows the benefit of a larger L1.

Posted on Reply
#44
Count von Schwalbe
AnotherReaderAs an aside, Gracemont has a large L1 I-cache which is something that the bigger cores should copy. This figure from Hirki et al's paper on Haswell's power consumption also shows the benefit of a larger L1.
I reckon the production challenges of that are the main reason this is less common. Looking at the strict voltage and thermal requirements of the X3D chips. Burying more cache inside the core itself would be an engineering challenge.

It seems that a Gracemont cluster is more like a 4-thread core than 4 individual cores. I am sure that it technically isn't, but I get FX-8350 vibes...
Posted on Reply
#45
AnotherReader
Count von SchwalbeI reckon the production challenges of that are the main reason this is less common. Looking at the strict voltage and thermal requirements of the X3D chips. Burying more cache inside the core itself would be an engineering challenge.

It seems that a Gracemont cluster is more like a 4-thread core than 4 individual cores. I am sure that it technically isn't, but I get FX-8350 vibes...
Sharing the L2 cache doesn't make it like Bulldozer. Bulldozer shared much more, including the front-end and the floating point execution units.

Apple has enormous L1 caches and Power 6, the first 5 GHz CPU, had 64 KiB L1 caches. The Power 6, unlike Apple, didn't have the advantage of current processes, but it also had the advantage of a high power budget. The first Zen had a 64 KiB L1 I-cache as well.
Posted on Reply
Add your own comment
Oct 31st, 2024 19:30 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts