• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

8-Core AMD Ryzen AI Max Pro 385 Benchmark Appears As Cheaper Strix Halo APU Launch Nears

Average user does not give flipping monkeys about 5090, a halo GPU that is neither a relevant piece of hardware nor available to vast majority of everyday users. Irrelevant comparison in this context. On-processor NPU will be available to average Joe and will handle tasks in its own domain of capability. That's what it is designed for and meant to do.

In addition to developing NPU utilization ecosystem, Microsoft has many other zombies to sort out, such as USB-C port chaos.

Oh, no doubt about that. However it's the perfect showcase for what I'm talking about. Most high performance PCs have a GPU already, maybe not one that's nearly that fast but certainly one that will do, say, 5 times that work. Why not allow the user to leverage that performance?
 
Most high performance PCs have a GPU already, maybe not one that's nearly that fast but certainly one that will do, say, 5 times that work. Why not allow the user to leverage that performance?
Users are, of course, allowed to leverage the benefit of discrete GPUs. This is however, minority in laptop market. Most people buy laptops with class 60 GPU, like in desktop. Mega-APU is meant to disrupt this segment a bit and gain traction there. It will take some time and effort.
 
I'm getting a vivobook OLED with the HX 370, what is the main difference with these new AI APU anyways?

Can I use copilots offline with the help of NPU?
 
Why do they have to buy GPU's where all that is prebuilt
Using a socket would drive up development cost, including having to ensure that multiple types of GPU's works in multiple boards. Memory in slots would be slower.

Cards would get bigger, heavier, slower, more expensive.
 
I'm getting a vivobook OLED with the HX 370, what is the main difference with these new AI APU anyways?

Can I use copilots offline with the help of NPU?
Strix Halo APUs have much stronger graphics and all big cores in comparison to Strix Point.

You can use Copilot locally with NPU.
 
There's no real useful use for an NPU at the moment, so it's mostly marketing.
Anything interesting you'll be making use of the iGPU which supports more data formats and is faster than the NPU anyways.

Main reason to have an NPU is for local processing without using much energy. Think of text suggestions in your phone's keyboard, or those gallery features such as searching for people or objects.
At the moment there's no such use case in the desktop world.
Apps like blender, davinci resolve, visual studio code, audacity, affinity photo can use the NPU. It's overmarketed, but the NPU is supposed to be as usefull as hardware acceleration for video: nothing that will blow your mind, but you won't hog compute ressources whenever you make use of something AI accelerated. Especially for sustained workloads like real video/sound enhacement for video call. Pc laptop GPUs are notorisoulsy bad when plugged off, the performance tanks a lot, but it's still going to drain your battery like crazy.

Also those are mostly laptops chip, not desktop chips, sure they've been repurposed as small desktop in some cases, but they've been engineered to be an efficient platform for laptops.
 
Apps like blender, davinci resolve, visual studio code, audacity, affinity photo can use the NPU.
They can, but most of those don't. There are some external plugins available for the software you listed, but they are not native, nor do they support NPUs in a HW-agnostic way, which totally defeats the purpose of such thing to begin with.
It's overmarketed, but the NPU is supposed to be as usefull as hardware acceleration for video: nothing that will blow your mind, but you won't hog compute ressources whenever you make use of something AI accelerated. Especially for sustained workloads like real video/sound enhacement for video call.
Yeah, I totally agree with that. It's something that you even forget that exists, but if you don't have it you'd instantly notice. Your video acceleration example is perfect, and showcases how a common API for such is important, and fragmentation doesn't help at all.
Currently the NPUs are still facing the fragmentation scenario, even though DirectML is right there, so it's hard for a developer to be able to integrate such features without knowing the specifics of each vendor and implementing all different SDKs out there instead of a common API.

You can use Copilot locally with NPU.
Wait, can you? The actual MS Copilot stuff? Do you have any source on that?
So far all I've heard is that they're planning on that, but haven't deployed it yet.
Example:

They do have models like their Phi series out for quite some time, but I haven't seen it being used within the OS yet, only through 3rd party stuff such as LM Studio and whatnot.
 
Using a socket would drive up development cost, including having to ensure that multiple types of GPU's works in multiple boards. Memory in slots would be slower.

Cards would get bigger, heavier, slower, more expensive.
i know, but i still WANT a GPU mobo with CPU AIC. Because it would be cool
 
I remember the times when a Ryzen 5700G came out, and its powerful 8CU Vega iGPU's performance was the main topic. Now look at these 30CU+ APUs today...
 
Maybe I'm too optimistic, but I believe that 64CU APUs are possible, and that's the 9070XT level of performance with the current hardware. And just 40CUs with a more efficient arch UDNA might bring in the next gen could give us even more performance than that.
I am sure that AMD had several dies on their testing bench. 64 CUs would push the package beyond 600 mm2 and decrease yields. The trade-off would be less of functional dies. Below is roughly how much bigger would 395 look like if we add another 24 CUs.

If they add another 8CUs on RDNA4 or UDNA, with 24 big CPU cores, it will be a beastly APU anyway. AMD does things incrementally. They know they need to add more IO next time as iGPU is already very good. They need to have on Medusa Halo as many PCIe lanes as they have on AM5 chips, plus integrated USB4 v2 at 80 Gbps. XDNA3 will be bigger too, as well as wider DRAM bus. So, that's a lot of stuff to fit in.
AMD APU Z5 Strix Halo MAX die.jpg
 
Last edited:
I am sure that AMD had several dies on their testing bench. 64 CUs would push the package beyond 600 mm2 and decrease yields. The trade-off would be less of functional dies. Below is roughly how much bigger would 395 look like if we add another 24 CUs.

If they add another 8CUs on RDNA4 or UDNA, with 24 big CPU cores, it will be a beastly APU anyway. AMD does things incrementally. They know they need to add more IO next time as iGPU is already very good. They need to have on Medusa Halo as many PCIe lanes as they have on AM5 chips, plus integrate USB4 v2 at 80 Gbps.
View attachment 402346
I'm sure they're pushing the limits. TBH, when I saw 40CUs on the Halo, I was a bit shocked at first. It's already a massive breakthrough, and maybe with some optimization and even node shrink (but I doubt that), it might become a decent low/midrange alternative in the near future. But the fact it's integrated form only, not socketed, kinda limits its availability and it's not doing it justice in pricing neither obviously.
 
I'm sure they're pushing the limits. TBH, when I saw 40CUs on the Halo, I was a bit shocked at first. It's already a massive breakthrough, and maybe with some optimization and even node shrink (but I doubt that), it might become a decent low/midrange alternative in the near future. But the fact it's integrated form only, not socketed, kinda limits its availability and it's not doing it justice in pricing neither obviously.
It cannot be socketed right now as it's too big for AM5 socket and it has 256-bit bus. A new socket for this line of products only would not be economic and it would not fit with any other line-up. They might have a new offer on AM6 socket, a tier above desktop G SKUs. It looks like they will have two different iGPU-IO dies for either Medusa Halo or later on Zen7. One of those could become 'desktop G MAX'.

Strix Halo is primarily a vehicle for mobile platforms right now, and for testing new Infinity Fabric for Zen6 adoption. They are clearly going for some users of Apple devices with similar system, but on x86. They could hit a jackpot with it in future if they manage to offer a variant of it for DIY users on a mainstream socket.
 
Maybe I'm too optimistic, but I believe that 64CU APUs are possible, and that's the 9070XT level of performance with the current hardware. And just 40CUs with a more efficient arch UDNA might bring in the next gen could give us even more performance than that.
Good luck feeding that. The strix halo needs a 256 bit LPDDR5x to feed itself, and even then the few benchmarks we've seen show the 8060s is running into some scaling issues.

Also that is gonna be a $1500+ APU at that point. Not to mention that if they can shrink the GPU cores small enough to make sense on an APU, the dGPUs will also benefit and 64 will be the new 40. Same way everyone predicted that APUs would dominate at 1080p after Llano released and 10 years later the current strix point cant consistently manage 1080p across the board.
It cannot be socketed right now as it's too big for AM5 socket and it has 256-bit bus. A new socket for this line of products only would not be economic and it would not fit with any other line-up. They might have a new offer on AM6 socket, a tier above desktop G SKUs. It looks like they will have two different iGPU-IO dies for either Medusa Halo or later on Zen7. One of those could become 'desktop G MAX'.

Strix Halo is primarily a vehicle for mobile platforms right now, and for testing new Infinity Fabric for Zen6 adoption. They are clearly going for some users of Apple devices with similar system, but on x86. They could hit a jackpot with it in future if they manage to offer a variant of it for DIY users on a mainstream socket.
To really feed a proper big APU with DDR5 you would need a 4000+ pin server socket that supports 8 lanes of DDR5. Even then bandwidth would be an issue.

The EPYC genoa lineup supports 12 DDR5 channels, for up to 480GB/s of bandwidth. The RX 9070xt has a 640+GB/s memory bus just for itself.

It could see more widespread usage if CAMM2 ever gets out of the goofing off stage and become a viable memory system, since each CAMM2 is 128 bit on its own and 4 of them would be a lot easier to fit then 8 DDR5 modules and they hypothetically can hit DDR5x speed.
 
The EPYC genoa lineup supports 12 DDR5 channels, for up to 480GB/s of bandwidth
Latest EPYC is Turin with 691.2GB/s bandwidth(DDR5-6000). Next generation EPYC with ZEN 6 will get 16 DDR5 channels and I think that also with much faster RAM than DDR5-6000. Not sure but is possible to be up to newest (ups next year)server DDR5 12800... Please make math for me what will be bandwidth with 16 channel DDR5 12800! ;)
 
Latest EPYC is Turin with 691.2GB/s bandwidth(DDR5-6000). Next generation EPYC with ZEN 6 will get 16 DDR5 channels and I think that also with much faster RAM than DDR5-6000. Not sure but is possible to be up to newest (ups next year)server DDR5 12800... Please make math for me what will be bandwidth with 16 channel DDR5 12800! ;)

Do you mean DDR6? Even then seems like a tall order, I find it extremely unlikely the next generation of servers will achieve 12800 MT/s, with error correction and conservative timings considered. It's not something even a binned 285K on a Z890 Apex with binned CUDIMMs will reliably achieve. In any case, these machines will obviously be priced way outside of most people's reach, and even in the enterprise realm, with their budget it makes much more sense to simply buy specialized accelerators instead.

Strix Halo's unified memory advantage here is just about the same of the Apple M-series chips, it's not nearly as much about the raw bandwidth or compute power as it is about the capacity attainable, and large LLMs love capacity, which takes precedence over BW and compute power both - a 5090 or R9700 will run a model that fits under 32 GB faster than pretty much anything else in and around their price ranges, as they've got both the performance and the bandwidth, but they won't run a model that an SoC with 128 GB of unified memory can run nearly as well as they otherwise should. If you want that out of a traditional GPU, prepare your wallet for a RTX Pro 6000, pricing starts at a nice 5 digits.
 
Latest EPYC is Turin with 691.2GB/s bandwidth(DDR5-6000).
No, it has a max theoretical of 576GB/s using DDR5-6000 across its 12 channels.

Next generation EPYC with ZEN 6 will get 16 DDR5 channels and I think that also with much faster RAM than DDR5-6000. Not sure but is possible to be up to newest (ups next year)server DDR5 12800... Please make math for me what will be bandwidth with 16 channel DDR5 12800!
I don't think 12800 is realistic whatsoever for Venice. JEDEC top spec is currently at 8800MT/s.
I believe Venice would support 7200MT/s at best.

Nonetheless, the math for memory bandwidth is # of channels * bits per channel * frequency / 8. So for 16c DDR5 12800 we would have ~1.6TB/s.
Using a more realistic 6400MHz scenario, it'd be close to 820GB/s.
Micron expects to offer 128GB – 256GB MCRDIMM modules with a data transfer rate of 8800 MT/s in 2025, and then MRDIMMs with capacities of over 256GB and a data transfer rate of 12800 MT/s in 2026 or 2027.
Venice is supposed to be a 2026 product, so I doubt it'll have MRDIMM support to begin with.
 
No, it has a max theoretical of 576GB/s using DDR5-6000 across its 12 channels.


I don't think 12800 is realistic whatsoever for Venice. JEDEC top spec is currently at 8800MT/s.
I believe Venice would support 7200MT/s at best.

Nonetheless, the math for memory bandwidth is # of channels * bits per channel * frequency / 8. So for 16c DDR5 12800 we would have ~1.6TB/s.
Using a more realistic 6400MHz scenario, it'd be close to 820GB/s.


Venice is supposed to be a 2026 product, so I doubt it'll have MRDIMM support to begin with.
Rumor for AMD Zen 6 is for dual IMC or double today situation if memory controllers for EPYC got doubled number...Will see what will be when is ready for use.
 

Well, unless a major breakthrough is achieved until then, I think this 2023 projection of theirs was quite optimistic and is unlikely to be met, especially by Micron (with SK hynix A-die being established as this generation's "Samsung B-die" from the DDR4 days) :D

9600 MT/s CKD sticks should be much more prevalent and easy to obtain at this point in time if they were on-track to achieve that, especially at those densities, but best we can do right now is a nearly perma out-of-stock, $500, 2x24 kit that pretty much requires the Z890 Apex and a lucky 285K processor to break the 10000 MT/s ceiling :(
 
Latest EPYC is Turin with 691.2GB/s bandwidth(DDR5-6000). Next generation EPYC with ZEN 6 will get 16 DDR5 channels and I think that also with much faster RAM than DDR5-6000. Not sure but is possible to be up to newest (ups next year)server DDR5 12800... Please make math for me what will be bandwidth with 16 channel DDR5 12800! ;)
Well, if it was 12800 on 16 channels, at 1024 bit, that would be 13,107.2GB/s.

Personally I highly doubt we will ever see anything close to that on DDR5. We cant even get 9500 to consistently run stable in soldered LPDDR5X form, to hit higher then that on standard SODIMMs? I dont think so.
 
Back
Top