• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Unveils Vision for an Open AI Ecosystem, Detailing New Silicon, Software and Systems at Advancing AI 2025

Nomad76

News Editor
Staff member
Joined
May 21, 2024
Messages
1,431 (3.65/day)
AMD delivered its comprehensive, end-to-end integrated AI platform vision and introduced its open, scalable rack-scale AI infrastructure built on industry standards at its 2025 Advancing AI event.

AMD and its partners showcased:
  • How they are building the open AI ecosystem with the new AMD Instinct MI350 Series accelerators
  • The continued growth of the AMD ROCm ecosystem
  • The company's powerful, new, open rack-scale designs and roadmap that bring leadership rack-scale AI performance beyond 2027



"AMD is driving AI innovation at an unprecedented pace, highlighted by the launch of our AMD Instinct MI350 series accelerators, advances in our next generation AMD 'Helios' rack-scale solutions, and growing momentum for our ROCm open software stack," said Dr. Lisa Su, AMD chair and CEO. "We are entering the next phase of AI, driven by open standards, shared innovation and AMD's expanding leadership across a broad ecosystem of hardware and software partners who are collaborating to define the future of AI."

AMD Delivers Leadership Solutions to Accelerate an Open AI Ecosystem
AMD announced a broad portfolio of hardware, software and solutions to power the full spectrum of AI:
  • AMD unveiled the Instinct MI350 Series GPUs, setting a new benchmark for performance, efficiency and scalability in generative AI and high-performance computing. The MI350 Series, consisting of both Instinct MI350X and MI355X GPUs and platforms, delivers a 4x, generation-on-generation AI compute increase and a 35x generational leap in inferencing, paving the way for transformative AI solutions across industries. MI355X also delivers significant price-performance gains, generating up to 40% more tokens-per-dollar compared to competing solutions. More details are available in this blog from Vamsi Boppana, AMD SVP, AI.
  • AMD demonstrated end-to-end, open-standards rack-scale AI infrastructure—already rolling out with AMD Instinct MI350 Series accelerators, 5th Gen AMD EPYC processors and AMD Pensando Pollara NICs in hyperscaler deployments such as Oracle Cloud Infrastructure (OCI) and set for broad availability in 2H 2025.
  • AMD also previewed its next generation AI rack called "Helios." It will be built on the next-generation AMD Instinct MI400 Series GPUs - which compared to the previous generation are expected to deliver up to 10x more performance running inference on Mixture of Experts models, the "Zen 6"-based AMD EPYC "Venice" CPUs and AMD Pensando "Vulcano" NICs. More details are available in this blog post.
  • The latest version of the AMD open-source AI software stack, ROCm 7, is engineered to meet the growing demands of generative AI and high-performance computing workloads—while dramatically improving developer experience across the board. ROCm 7 features improved support for industry-standard frameworks, expanded hardware compatibility and new development tools, drivers, APIs and libraries to accelerate AI development and deployment. More details are available in this blog post from Anush Elangovan, AMD CVP of AI Software Development.
  • The Instinct MI350 Series exceeded AMD's five-year goal to improve the energy efficiency of AI training and high-performance computing nodes by 30x, ultimately delivering a 38x improvement. AMD also unveiled a new 2030 goal to deliver a 20x increase in rack-scale energy efficiency from a 2024 base year, enabling a typical AI model that today requires more than 275 racks to be trained in fewer than one fully utilized rack by 2030, using 95% less electricity. More details are available in this blog post from Sam Naffziger, AMD SVP and Corporate Fellow.
  • AMD also announced the broad availability of the AMD Developer Cloud for the global developer and open-source communities. Purpose-built for rapid, high-performance AI development, users will have access to a fully managed cloud environment with the tools and flexibility to get started with AI projects - and grow without limits. With ROCm 7 and the AMD Developer Cloud, AMD is lowering barriers and expanding access to next-gen compute. Strategic collaborations with leaders like Hugging Face, OpenAI and Grok are proving the power of co-developed, open solutions.

Broad Partner Ecosystem Showcases AI Progress Powered by AMD
Today, seven of the 10 largest model builders and Al companies are running production workloads on Instinct accelerators. Among those companies are Meta, OpenAI, Microsoft and xAI, who joined AMD and other partners at Advancing AI, to discuss how they are working with AMD for AI solutions to train today's leading AI models, power inference at scale and accelerate AI exploration and development:
  • Meta detailed how Instinct MI300X is broadly deployed for Llama 3 and Llama 4 inference. Meta shared excitement for MI350 and its compute power, performance-per-TCO and next-generation memory. Meta continues to collaborate closely with AMD on AI roadmaps, including plans for the Instinct MI400 Series platform.
  • OpenAI CEO Sam Altman discussed the importance of holistically optimized hardware, software and algorithms and OpenAI's close partnership with AMD on AI infrastructure, with research and GPT models on Azure in production on MI300X, as well as deep design engagements on MI400 Series platforms.
  • Oracle Cloud Infrastructure (OCI) is among the first industry leaders to adopt the AMD open rack-scale AI infrastructure with AMD Instinct MI355X GPUs. OCI leverages AMD CPUs and GPUs to deliver balanced, scalable performance for AI clusters, and announced it will offer zettascale AI clusters accelerated by the latest AMD Instinct processors with up to 131,072 MI355X GPUs to enable customers to build, train and inference AI at scale.
  • HUMAIN discussed its landmark agreement with AMD to build open, scalable, resilient and cost-efficient AI infrastructure leveraging the full spectrum of computing platforms only AMD can provide.
  • Microsoft announced Instinct MI300X is now powering both proprietary and open-source models in production on Azure.
  • Cohere shared that its high-performance, scalable Command models are deployed on Instinct MI300X, powering enterprise-grade LLM inference with high throughput, efficiency and data privacy.
  • Red Hat described how its expanded collaboration with AMD enables production-ready AI environments, with AMD Instinct GPUs on Red Hat OpenShift AI delivering powerful, efficient AI processing across hybrid cloud environments.
  • Astera Labs highlighted how the open UALink ecosystem accelerates innovation and delivers greater value to customers and shared plans to offer a comprehensive portfolio of UALink products to support next-generation AI infrastructure.
  • Marvell joined AMD to highlight its collaboration as part of the UALink Consortium developing an open interconnect, bringing the ultimate flexibility for AI infrastructure.

View at TechPowerUp Main Site | Source
 
  • AMD also previewed its next generation AI rack called "Helios." It will be built on the next-generation AMD Instinct MI400 Series GPUs - which compared to the previous generation are expected to deliver up to 10x more performance running inference on Mixture of Experts models, the "Zen 6"-based AMD EPYC "Venice" CPUs and AMD Pensando "Vulcano" NICs. More details are available in this blog post.
That's nice, but their comparisons against the B200 NVL72 are kinda moot given that the blackwell rack is already available, blackwell ultra should be here soon, and once that Helios rack comes out next year it'll be competing against Rubin instead.


Apart from that, nice to see they are getting some wins when it comes to large scale inference, now to see if they'll manage a good success case for training those models.
 
"We are entering the next phase of AI, driven by open standards, shared innovation

A shame that open standards is valued so low these days.

But who knows, maybe it will be embraced.
 
A shame that open standards is valued so low these days.
But who knows, maybe it will be embraced.
There is no question of 'if' anymore, as the two new global standards have been published - UALink and UltraEthernet.
Now entire industry can work with those standards and clients will be able to mix and match their racks with multiple vendors.

That's nice, but their comparisons against the B200 NVL72 are kinda moot given that the blackwell rack is already available, blackwell ultra should be here soon, and once that Helios rack comes out next year it'll be competing against Rubin instead.
It does not matter. If you have watched the presentation, you would have found out that raw performance in this industry is not the main concern anymore, but tokens/Watt and tokens/$. And this opens up new possibilities and opportunities, especially in countries that do not have a huge energy surplus.
 
That's nice, but their comparisons against the B200 NVL72 are kinda moot given that the blackwell rack is already available, blackwell ultra should be here soon, and once that Helios rack comes out next year it'll be competing against Rubin instead.


Apart from that, nice to see they are getting some wins when it comes to large scale inference, now to see if they'll manage a good success case for training those models.
They aren't comparing Helios to B200, they are comparing a MI355x rack to B200 as they both deploy this year, and compare mi400 Helios to Rubin.
This is a very condensed article, there was a lot in the keynote.
1749794089510.png


and 3 more articles if you dont want to watch the keynote.
 
Interesting, it's a press release for a multi-billion dollar company but there aren't multiple people complaining about it in here, I wonder what the difference between this and the other press releases are :confused:

On the topic itself, I wonder how AMD's [potential] growth and large scale success in the data centre market could shift their priority away from Radeon, or at the very least wafer allocation.... they already seem to barely put in effort as it is, and although the new 9000 series is a step in the right direction at least in some ways, that [potentially continued] success itself will put more pressure on divided priorities or allocation.
 
Interesting, it's a press release for a multi-billion dollar company but there aren't multiple people complaining about it in here, I wonder what the difference between this and the other press releases are :confused:

On the topic itself, I wonder how AMD's [potential] growth and large scale success in the data centre market could shift their priority away from Radeon, or at the very least wafer allocation.... they already seem to barely put in effort as it is, and although the new 9000 series is a step in the right direction at least in some ways, that [potentially continued] success itself will put more pressure on divided priorities or allocation.
Nothing wrong with the release, but the 2 pictures TPU chose (that aren't in the release) are 2 different products, leading to some confusion for igor.
And for some reason TPU snipped off all the hyperlinks on AMDs release for more details on each product.
So this is a condensed... because there was just soo much announced. I am guessing Nomad just doesn't have expertise in this area and that's fine, kind of expected TPU isn't really an enterprise focused site.
STH is, hence my suggestion if you are interested in knowing more.
 
Last edited:
Nothing wrong with the release
I'm referring to the regular occurrence of users complaining there are too many press releases posted as news here, I believe I am noticing a common theme for the ones that get complained about, they're always about a particular company (not AMD, clearly).
 
I'm referring to the regular occurrence of users complaining there are too many press releases posted as news here, I believe I am noticing a common theme for the ones that get complained about, they're always about a particular company (not AMD, clearly).
Ah, Yes, I do detest marketing speak, and after having just completed some nvidia training... this keynote was a bit of fresh air with AMD denoting what precisions they were comparing rather than just saying... AI performance is X ... they said, we got this speedup going from fp8 to fp4. (shrugs) AMD still needs to deliver on the software, heavily. Like, its getting there, for LLM workloads at-least. But the amount of libraries nvidia has is just insane. Kinda why the marketing speak in their training drives me nuts, they act insecure, let the product speak for itself. And Nvidia's recent opening of NV-link (partially) shows they are fearful of UAlink.
 
Ah, Yes, I do detest marketing speak....
Interesting insights on the differences between the particular ones you've experienced, but it was not the point I was specifically making about complaints in news posts that are almost entirely irrelevant to the actual content of the press release.
 
Interesting insights on the differences between the particular ones you've experienced, but it was not the point I was specifically making about complaints in news posts that are almost entirely irrelevant to the actual content of the press release.
It would be nice if all info from an event like this were condensed into a single super article up at the top of the page in the review article slider. This could be done for ALL companies as I’ve suggested before. Instead we get multiple and sometimes overlapping mini articles that you have to scroll through. That’s fine too I guess.
 
They aren't comparing Helios to B200, they are comparing a MI355x rack to B200 as they both deploy this year, and compare mi400 Helios to Rubin.
This is a very condensed article, there was a lot in the keynote.
View attachment 403555

and 3 more articles if you dont want to watch the keynote.
Thanks for the clarification. I had seen that exact slide that you posted in another thread today and got really confused.

Will look into that page and may give the presentation a go.
I'll for sure look into the servethehome link, those usually have all the relevant information, thanks!

It would be nice if all info from an event like this were condensed into a single super article up at the top of the page in the review article slider. This could be done for ALL companies as I’ve suggested before. Instead we get multiple and sometimes overlapping mini articles that you have to scroll through. That’s fine too I guess.
Agreed, something similar to the liveblog I've seen happening with some other events.
The current way we end up with lots of smaller pieces spread all over the place, with many duplicates.
 
Back
Top