• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA's Dominance Challenged as Largest AI Lab Adopts Google TPUs

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
3,260 (1.13/day)
NVIDIA's AI hardware dominance is challenged as the world's leading AI lab, OpenAI, taps into Google TPU hardware, showing significant efforts to move away from single-vendor solutions. In June 2025, OpenAI began leasing Google Cloud's Tensor Processing Units to handle ChatGPT's increasing inference workload. This is the first time OpenAI has relied on non-NVIDIA chips in large-scale production. Until recently, NVIDIA GPUs powered both model training and inference for OpenAI's products. Training large language models on those cards remains costly, but it was a periodic process. Inference, by contrast, runs continuously and carries its own substantial expense. ChatGPT now serves more than 100 million daily active users, including 25 million paid subscribers. Inference operations account for nearly half of OpenAI's estimated $40 billion annual compute budget. Google's TPUs, like v6e "Trillium" provide a more cost-effective solution for steady-state inference, as they are designed specifically for high throughput and low latency.

Beyond cost savings, this decision reflects OpenAI's desire to reduce reliance on any single vendor. Microsoft Azure has been its primary cloud provider since early investments and collaborations. However, GPU supply shortages and price fluctuations exposed a weakness in relying too heavily on a single source. By adding Google Cloud to its infrastructure mix, OpenAI gains greater flexibility, avoids vendor lock-in, and can scale more smoothly during usage peaks. For Google, winning OpenAI as a TPU customer offers strong validation for its in-house chip development. TPUs were once reserved almost exclusively for internal projects such as powering the Gemini model. Now, they are attracting leading organizations like Apple and Anthropic. Note that, beyond v6e inference, Google also designs TPUs for training (yet-to-be-announced v6p), which means companies can scale their entire training runs on Google's infrastructure on demand.



View at TechPowerUp Main Site | Source
 
Here's to hoping that Nvidia's monopoly falls as fast as it arose.
This is hard to happen. NVDA offers the best software stack, best resources for devs, best support, and most performant hardware. Until someone matches it, the monopoly remains, especially in AI training. Plus, Google doesn't have enough capacity internally and at TSMC to produce enough TPUs for everyone.
 
This is hard to happen. NVDA offers the best software stack, best resources for devs, best support, and most performant hardware. Until someone matches it, the monopoly remains, especially in AI training. Plus, Google doesn't have enough capacity internally and at TSMC to produce enough TPUs for everyone.
We definitely don't want any one company to make anything for everyone. That's what a monopoly is. Since we have over 100 years of data showing that monopolies lead to stagnation and stifling of innovation, a combination of things have to happen to finally get to a near monopoly free market in the technology world. We need larger fab capacity spread over three or more fab companies. We need IT buying agents to consider multiple vendors for all large-scale deployments by requiring them to find the best technological offering and not buddy-buddy buying. We need more open source and open standards to ensure that no one gets locked into proprietary technology. Finally, we need to allow workers the freedom to move to better and better jobs and not get locked in by anti-poaching rules and strict contracts. Companies that enable the best working conditions and salaries would be rewarded with the best results from our hard-working software and hardware developers.
 
We definitely don't want any one company to make anything for everyone. That's what a monopoly is. Since we have over 100 years of data showing that monopolies lead to stagnation and stifling of innovation, a combination of things have to happen to finally get to a near monopoly free market in the technology world. We need larger fab capacity spread over three or more fab companies. We need IT buying agents to consider multiple vendors for all large-scale deployments by requiring them to find the best technological offering and not buddy-buddy buying. We need more open source and open standards to ensure that no one gets locked into proprietary technology. Finally, we need to allow workers the freedom to move to better and better jobs and not get locked in by anti-poaching rules and strict contracts. Companies that enable the best working conditions and salaries would be rewarded with the best results from our hard-working software and hardware developers.
I don't want a monopoly either. It's just that everyone else is now the second choice. Until others catch up, NVIDIA remains the king of AI infra.
 
be aware of this company: Zhipu AI from China
we will hear from this fellas more offen soon

 
Plus, Google doesn't have enough capacity internally and at TSMC to produce enough TPUs for everyone.
Fwiw, I don't think google will be ramping up TPU production whatsoever. OpenAI is just renting the existing, available TPU nodes from GCP and that'd be it.
That's a different endeavor from the cases where they are actually looking into partners building new data centers full of GPUs to meet their demands, such as the likes of CoreWeave.

OpenAI has also been using AMD Instinct GPUs for some inference workloads as well. They are really eager for any kind of tensor compute they can get, no matter the vendor. Nvidia is their top choice, but given that they haven't been able to get enough of those, anything else at scale will do.
 
Whenever news like this comes out there's always a bunch of misguided debates/hot takes that can be settled by one sentence: "Training=/=Inference".

Microsoft, Meta have long been ordering (some) AMD GPUs for training, Anthropic uses its inhouse data flow chip (similar to TPUs) for training, THIS IS NOTHING NEW.

The problem is that training a model is a one time cost that is dwarfed by the cost of serving that model to millions of users every day, furthermore, scaling that service to hundreds of millions of users.

Unlike training, Nvidia has practically no competition in inference. Some accelerators may offer competitive performance, but they can't be combined into massive systems like the NVL72 (72 GPUs, 36 CPUs), let alone the upcoming NVL576 (576 GPUs, 144 CPUs).

And if a company have an NVL576 competitor, do they have priority status with TSMC so they can order something like 50k wafers MONTHLY? Can they order enough High Bandwidth Memory? All HBM supplier are fully booked till second-half 2026.

You get the picture. If news like "Microsoft announces deal with [not Nvidia] to build 10 data centers for AI inference" comes out, my jaw would genuinely drop due to its implications, but everything else don't have much implications.
 
Microsoft, Meta have long been ordering (some) AMD GPUs for training
Afaik it's the other way around. Those AMD GPUs are being used for inference.
Meta has their own in-house accelerator for some of their training workloads, but as far as I'm aware mostly of their training still uses Nvidia GPUs.

The problem is that training a model is a one time cost that is dwarfed by the cost of serving that model to millions of users every day, furthermore, scaling that service to hundreds of millions of users.
Training is the part that requires most compute and is the hardest to scale given the required communication required between devices/nodes/racks.

Unlike training, Nvidia has practically no competition in inference. Some accelerators may offer competitive performance, but they can't be combined into massive systems like the NVL72 (72 GPUs, 36 CPUs), let alone the upcoming NVL576 (576 GPUs, 144 CPUs).
No, it's the exact opposite. Inference is where the other players are getting at. You don't need to scale across racks for inference. You do so for training.
For inference a single node (not even rack) is often more than enough to serve those full blown models. The bandwidth required is also way lower for inference than training.
It's also way easier to optimize weights for inference in specific devices.


You get the picture. If news like "Microsoft announces deal with [not Nvidia] to build 10 data centers for AI inference" comes out, my jaw would genuinely drop due to its implications, but everything else don't have much implications.
Sure:
https://techcommunity.microsoft.com...md-mi300-a-collaborative-breakthrough/4407673

I guess you're confusing training with inference. You won't be seeing many labs using AMD for training.
 
nGreedia is brute-forcing their AI chips, just boosting clock speeds and adding more tiles. One day that will not be possible anymore, and I believe that is when they will be overtaken by every other major AI player. I look forward to that day.
 
Back
Top