- Most AI GPUs run at surprisingly low utilization across production systems
- Companies pay for twenty times more GPU capacity than necessary
- Excess provisions increase sharply instead of improving year after year
Companies across the tech industry are rushing to buy massive amounts of AI infrastructure, but most of it barely does any useful work.
A report from Cast AI, based on tens of thousands of Kubernetes clusters on AWS, Azure, and GCP, found that average GPU utilization is just 5%.
Many teams deploy sophisticated AI tools to manage their applications, but those same tools are not used to optimize the underlying infrastructure.
Article continues below.
The numbers are getting worse, not better
Organizations pay for approximately 20 times more GPU capacity than their workloads are actually using at any given time.
The numbers come from direct measurements of production clusters and millions of computing resources before any optimization was applied.
“This is the third year we have published this report. The numbers are worse,” said Laurent Gil, co-founder and president of Cast AI. “CPU utilization fell from 10% to 8%. Memory fell from 23% to 20%.”
The report also measured something called overprovisioning, which is the gap between what workloads actually need and what teams allocate to them.
CPU overprovisioning increased from 40% to 69% year over year, while memory overprovisioning now reaches 79%.
This means that organizations reserve almost twice as much CPU resources and four times more memory than their workloads actually consume.
In short, organizations are paying for infrastructure that their workloads aren't even requesting, and the trend is accelerating rather than improving.
The situation becomes even more expensive when directly comparing CPU and GPU costs. An idle CPU core costs just cents per hour, but an idle GPU costs dollars per hour.
For the first time since the launch of EC2 in 2006, GPU prices are rising instead of falling.
In January 2026, AWS increased prices for the H200 capacity block by 15%, citing supply and demand, breaking a two-decade precedent.
“At 5% utilization, the math doesn't work,” the report states. The instinct to hoard makes sense because delivery times are long, but that same hoarding fuels the cycle of scarcity that drives up prices even more.
Not all clusters perform this poorly, and one organization achieved 49% utilization on H200s and 30% utilization on H100s, well above the 5% average.
The difference comes down to automation rather than luck or better hardware. Tools to fix this problem already exist, including auto-sizing, GPU sharing or time-splitting, and Spot management.
However, most teams never get there because over-provisioning seems safer than running out of capacity, but that safety comes at a high price.
Teams that closed the gap stopped treating resource efficiency as a one-time manual task and started treating it as a continuous, automated process.
But Cast AI data reveals that most companies seem willing to continue paying high fees rather than change their habits.
Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds.





