Saturday, January 7th 2023

AMD Ryzen 7040 Series "Phoenix Point" Mobile Processor I/O Detailed: Lacks PCIe Gen 5

The online datasheets of some of the first AMD Ryzen 7040 series "Phoenix Point" mobile processors went live, detailing the processor's I/O feature-set. We learn that AMD has decided to give PCI-Express Gen 5 a skip with this silicon, at least in its mobile avatar. The Ryzen 7040 SoC puts out a total of 20 PCI-Express Gen 4 lanes, all of which are "usable" (i.e. don't count 4 lanes toward chipset-bus). This would mean that the silicon has a full PCI-Express 4.0 x16 interface for discrete graphics, and a PCI-Express 4.0 x4 link for a CPU-attached M.2 NVMe slot; unlike the "Raphael" desktop MCM and the "Dragon Range" mobile MCM, whose client I/O dies put out a total of 28 Gen 5 lanes (24 usable, with x16 PEG + two x4 toward CPU-attached M.2 slots).

Another interesting aspect about "Phoenix Point" is its memory controllers. The SoC features a dual-channel (four sub-channel) DDR5 memory interface, besides support for LPDDR5 and LPDDR5x. DDR5-5600 and LPDDR5-7600 are the native speeds supported. What's really interesting is the maximum amount of memory supported, which stands at 256 GB—double that of "Raphael" and "Dragon Range," which top out at 128 GB. This bodes well for the eventual Socket AM5 APUs AMD will design based on the "Phoenix Point" silicon. Older Ryzen 5000G "Cezanne" desktop APUs are known for superior memory overclocking capabilities to 5000X "Vermeer," with the monolithic nature of the silicon favoring latencies. Something similar could be expected from "Phoenix Point."
The iGPU of the Ryzen 7040 series in its top avatar will have the branding "Radeon 780M," an upgrade from the "Radeon 680M" of the top iGPU option available with the "Rembrandt" silicon and its RDNA2-based iGPU. The new 780M is based on the latest RDNA3 graphics architecture, and packs 12 compute units (768 stream processors), with the same dual-instruction issue rate capabilities as the desktop Radeon RX 7900 series GPUs; and matrix-math accelerators (these are besides the dedicated XDNA AI accelerator present on the "Phoenix Point" silicon). The iGPU has engine clocks as high as 2.90 GHz.

The iGPU of "Phoenix Point" is confirmed to feature AMD's latest Radiance Display Engine, with support for DisplayPort 2.1 UHBR10 and HDMI 2.1, with native support for 8K 60 Hz displays with a single cable. It also features the latest VCN media engine, with hardware-accelerated AV1 encoding up to 4K @ 240 Hz 10 bpc, and 4320p @ 175 Hz 8 bpc H.265; and hardware-accelerated decoding of nearly all standard resolutions/bit-depth/framerates of MPEG2, VC1, VP9, H.264, H.265, and AV1.
Built on the 4 nm EUV foundry node at TSMC, the "Phoenix Point" monolithic silicon has a die-area of 178 mm², and a transistor-count of 25 billion. Besides the iGPU, it features a single 8-core "Zen 4" CCX. Each of the 8 CPU cores has 1 MB of dedicated L2 cache, and share 32 MB of L3 cache.Many Thanks to TumbleGeorge for the tip!
Source: AMD
Add your own comment

83 Comments on AMD Ryzen 7040 Series "Phoenix Point" Mobile Processor I/O Detailed: Lacks PCIe Gen 5

#76
ToTTenTranz
WirkoSome of the cache has evaporated overnight, or something...
On one hand it's strange that they'd make such a big mistake on something like an official presentation slide, and then repeat that same mistake across all 3 models.
On the other hand, that slide only mentions "total cache", and doesn't specify whether it's L3+L2+L1 or something else.

At the same time, they haven't really said much about that RDNA3 iGPU with 6 WGPs running at ~3GHz (despite being the highest-clocked RDNA3 GPU to date and the only one at N4).
The Radeon 780M seems to have >150% of the processing throughput of the 680M in Rembrandt while the total RAM bandwidth only goes up from 6400MT/s to 7500MT/s.
This is only a 17% boost which is hardly enough to keep up with the faster CPU cores which will undoubtedly demand more bandwidth by themselves.

One thing that could make sense is that the new RDNA3 iGPU actually has 16MB of Infinity Cache all to itself. It would explain the 40MB "total cache" typo, and provide an adequate boost to effective bandwidth to the substantially more powerful iGPU.

Or that really was just a mistake in those slides and this iGPU is going to be massively bottlenecked by memory bandwidth.
Posted on Reply
#77
trsttte
ToTTenTranzOn one hand it's strange that they'd make such a big mistake on something like an official presentation slide, and then repeat that same mistake across all 3 models.
On the other hand, that slide only mentions "total cache", and doesn't specify whether it's L3+L2+L1 or something else.

At the same time, they haven't really said much about that RDNA3 iGPU with 6 WGPs running at ~3GHz (despite being the highest-clocked RDNA3 GPU to date and the only one at N4).
The Radeon 780M seems to have >150% of the processing throughput of the 680M in Rembrandt while the total RAM bandwidth only goes up from 6400MT/s to 7500MT/s.
This is only a 17% boost which is hardly enough to keep up with the faster CPU cores which will undoubtedly demand more bandwidth by themselves.

One thing that could make sense is that the new RDNA3 iGPU actually has 16MB of Infinity Cache all to itself. It would explain the 40MB "total cache" typo, and provide an adequate boost to effective bandwidth to the substantially more powerful iGPU.

Or that really was just a mistake in those slides and this iGPU is going to be massively bottlenecked by memory bandwidth.
I think it was a simple copy paste mistake because of Dragon Range mobile cpus that are based on the desktop silicon, it's the simplest explanation
Posted on Reply
#78
THANATOS
MawkzinLooking to the Ryzen 5 7640 the igpu has 8 CU this time instead of 6 CU, so maybe with rdna3 it uses 3 wgp with 4 CU each instead of 2 wgp with 6 CU each in the rembrand series.

WGP is a Dual Compute unit(CU). Phoenix has 6 WGPs or in other words 6 dual compute units, people like to say 12CU. There is no difference compared to Rembrandt.
There should be 2 shader engines(arrays) with 3 WGPs per shader engine, just like what you see in that block diagram. Ryzen 5 7640 will lose a WGP per shader engine, which leaves 4WGPs or 8CUs.
It looks like in Rembrandt the 6CU version has half of shader engines deactivated, that's why only 1/2 of CU is left.
TumbleGeorgeMaybe we has been a some difference between models for China and for...outer world where part of them will be artificially limited.


PS. I have an idea why limitations. Because official support of more RAM, maybe even more cache from Windows is more expensive. Recently, in China, they are betting on their own operating systems...
Phoenix has only 16MB of L3 cache physically, the same as Rembrandt.
There is no artificial difference based on region. AMD marketing team did a poor job by not noticing they wrote the wrong amount of cache in some press slides, which caused this mess about L3 cache.
ToTTenTranzOn one hand it's strange that they'd make such a big mistake on something like an official presentation slide, and then repeat that same mistake across all 3 models.
On the other hand, that slide only mentions "total cache", and doesn't specify whether it's L3+L2+L1 or something else.

At the same time, they haven't really said much about that RDNA3 iGPU with 6 WGPs running at ~3GHz (despite being the highest-clocked RDNA3 GPU to date and the only one at N4).
The Radeon 780M seems to have >150% of the processing throughput of the 680M in Rembrandt while the total RAM bandwidth only goes up from 6400MT/s to 7500MT/s.
This is only a 17% boost which is hardly enough to keep up with the faster CPU cores which will undoubtedly demand more bandwidth by themselves.

One thing that could make sense is that the new RDNA3 iGPU actually has 16MB of Infinity Cache all to itself. It would explain the 40MB "total cache" typo, and provide an adequate boost to effective bandwidth to the substantially more powerful iGPU.

Or that really was just a mistake in those slides and this iGPU is going to be massively bottlenecked by memory bandwidth.
There was no mention anywhere about Phoenix having Infinity cache. 40MB is a typo.

That processing throughput increase is in reality only 25% higher (3GHz vs 2.4GHz). The rest of that increase comes from dual issue CUs, and for that to work, you need VOPD instructions and even then you can't use It for everything, there are limitations and also shaders need to be optimized for It. Even AMD says RDNA3 architecture is only 17.4% faster per clock than RDNA2. There is a reason why they say N31 has only 6144 stream processors, which is only 20% more than Navi21.
BTW, If AMD really couldn't meet the required BW, then they wouldn't use 6 WGPs, If they performed so much better.

Chip & Cheese made a great RDNA3 architectural analysis. Go, check It out.
Posted on Reply
#79
TumbleGeorge
THANATOSWGP is a Dual Compute unit(CU). Phoenix has 6 WGPs or in other words 6 dual compute units, people like to say 12CU. There is no difference compared to Rembrandt.
There should be 2 shader engines(arrays) with 3 WGPs per shader engine, just like what you see in that block diagram. Ryzen 5 7640 will lose a WGP per shader engine, which leaves 4WGPs or 8CUs.
It looks like in Rembrandt the 6CU version has half of shader engines deactivated, that's why only 1/2 of CU is left.


Phoenix has only 16MB of L3 cache physically, the same as Rembrandt.
There is no artificial difference based on region. AMD marketing team did a poor job by not noticing they wrote the wrong amount of cache in some press slides, which caused this mess about L3 cache.


There was no mention anywhere about Phoenix having Infinity cache. 40MB is a typo.

That processing throughput increase is in reality only 25% higher (3GHz vs 2.4GHz). The rest of that increase comes from dual issue CUs, and for that to work, you need VOPD instructions and even then you can't use It for everything, there are limitations and also shaders need to be optimized for It. Even AMD says RDNA3 architecture is only 17.4% faster per clock than RDNA2. There is a reason why they say N31 has only 6144 stream processors, which is only 20% more than Navi21.
BTW, If AMD really couldn't meet the required BW, then they wouldn't use 6 WGPs, If they performed so much better.

Chip & Cheese made a great RDNA3 architectural analysis. Go, check It out.
Maybe Xilinx IP will help for better training how to set hardware parameters for better presence depending of game or other app on this PC.
Posted on Reply
Add your own comment
May 23rd, 2024 02:29 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts