back to top
HomeTechMicrosoft and Uber Are Running Into an AI Cost Problem

Microsoft and Uber Are Running Into an AI Cost Problem

- Advertisement -

The pitch was impressive. AI tools would make developers faster, reduce headcount costs, and pay for themselves many times over. Companies that moved early would have a structural advantage over those that waited.

Microsoft believed it. So did Uber. Both pushed hard on AI coding tool adoption across their engineering teams. Both are now dealing with same problem: the faster their employees embraced the tools, the faster the bills grew. In some cases those bills have started exceeding what the same work would have cost with human labor.

The problem is what happens to the economics when thousands of employees use something that charges per unit of thought.

The token trap nobody planned for

AI models charge per token, the basic unit of text the model processes and generates.

When Uber’s CTO disclosed that the company had burned through its entire 2026 AI coding budget in four months, the detail that got less attention was how it happened. Uber had been actively pushing adoption, running internal leaderboards to rank teams by AI tool usage. More encouragement meant more usage. More usage meant more tokens. More tokens meant more compute. The budget math that looked reasonable in January looked catastrophic by April.

Amazon has been telling staff to “tokenmaxx,” meaning use as many tokens as possible. Meta built an internal tracking tool called Claudeonomics to monitor which employees were using AI most heavily. These are companies treating token consumption as a metric to maximize, which is exactly backwards if the goal is cost efficiency.

The paradox is structural. Agentic AI systems, the ones that work autonomously across multiple steps consume more tokens per task than standard models. Goldman Sachs forecasts a 24-fold increase in enterprise token consumption by 2030 as agentic deployments scale. Gartner projects that inference costs will fall nearly 90% by the same year. But Gartner also warned that cheaper tokens will not produce cheaper bills, because consumption growth will outpace price declines and AI providers are unlikely to pass through the full benefit of cost reductions to business customers.

Cheaper per token. Higher total bill. The more you use it the worse the math gets.

When compute costs more than the employee

The most uncomfortable acknowledgment of where this is heading came from Bryan Catanzaro, Vice President of applied deep learning at Nvidia, the company that supplies the chips powering essentially all of this infrastructure.

“For my team, the cost of compute is far beyond the costs of the employees,” he said.

That statement carries weight because of who said it. Nvidia has more financial interest in AI compute spending than almost any other company on earth. When its own executive acknowledges that compute costs are exceeding labor costs for his team, it is not a bearish take on AI. It is an honest description of the current economics from someone with no incentive to understate them.

Microsoft’s situation illustrates the same point from a different angle. The company cancelled most of its direct Claude Code licences after thousands of employees adopted the tool faster than anyone anticipated. The move doesn’t touch Microsoft’s $5 billion investment in Anthropic or its commercial relationship with the company. It’s a pure cost control decision on a tool its own engineers had grown to depend on. When the company that built GitHub Copilot, owns the dominant AI coding platform, and made one of the largest AI bets in the industry pulls back on AI coding spend, the economics are the only explanation that makes sense.

You May Like: Anthropic Says Mythos Isn’t Public Yet. ‘Mythos 1’ Keeps Appearing Anyway.

Where the math actually works

MIT research found AI is only economically viable in a limited number of job roles at current pricing. The tasks where it clears the bar tend to share common characteristics: well-defined scope, high repetition, low need for judgment across long sessions. Boilerplate generation, test scaffolding, documentation, straightforward refactors. Tasks where a developer might spend twenty minutes doing something mechanical and the AI does it in thirty seconds.

The tasks where the math breaks down are the ones that require sustained context, iterative judgment, and long agentic sessions. Those are also the tasks the industry has been most aggressively promoting AI for. The gap between where AI is cost-effective and where it is being deployed is where the Microsoft and Uber problem lives.

AI coding tools are currently better described as expensive productivity multipliers for specific task types than as wholesale replacements for engineering labor costs. The companies that figure out how to use them right, rather than encouraging blanket maximum adoption, will likely see the economics work. The ones that ran internal leaderboards rewarding token consumption are learning that lesson the hard way.

You May Like: Open Source Tools That Do What Your OS Should Have Done Already

The bill is coming due

AI was sold as the great labor cost reduction play. The early returns from two companies that believed that part hardest suggest the reality is more complicated.

The tools work. The economics at scale don’t, at least not yet. Cheaper tokens haven’t produced cheaper bills. Encouraged adoption has produced budget crises. And the executive most invested in AI compute spending just admitted his compute costs exceed his payroll.

Jensen Huang has said he imagines 100 AI agents working alongside every human employee at Nvidia one day. That future may still arrive. But if token consumption keeps rising faster than unit costs fall, it will arrive with a price tag nobody has fully reckoned with yet. Microsoft and Uber just got the first invoice.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Anthropic claude mythos 1 perparation for calude code and security

Anthropic Says Mythos Isn’t Public Yet. ‘Mythos 1’ Keeps Appearing Anyway.

0
On Friday, Anthropic said Claude Mythos would remain restricted. The company was clear about it: stronger safeguards were needed before any general release, and for now the model would stay limited to roughly 40 selected organizations through Project Glasswing. The next day, users started seeing "Mythos 1" inside Claude Code. The model appeared in the UI briefly, with a preview label reading "claude-mythos-1-preview," then disappeared again. TestingCatalog found new strings in the source code: "Access to the Claude Mythos model in Claude Code and Claude Security." Screenshots circulated on X. Then the traces were gone.
qwen 3.7 max

Alibaba’s Qwen3.7-Max Ran Autonomously for 35 Hours on Unfamiliar Hardware. It Still Kept Getting...

0
Alibaba gave Qwen3.7-Max a kernel optimization task on a hardware platform the model had never encountered before. No documentation or profiling data. No example kernels for the architecture. Just a task description, an existing implementation, and an evaluation script. The model ran for 35 hours. It made 1,158 tool calls. It wrote, compiled, profiled, and rewrote the kernel repeatedly, diagnosing failures, fixing bugs, identifying blocks, and redesigning the architecture multiple times without anyone watching. After 30 hours it was still finding meaningful improvements. The final result was a 10x speedup over the reference implementation. For context: GLM 5.1 ran the same task and reached 7.3x. Kimi K2.6 reached 5x. DeepSeek V4 Pro reached 3.3x. The models that stopped early did so because they issued no tool calls for five consecutive rounds, they concluded they couldn't make further progress and stopped. Qwen3.7-Max didn't stop.
AI Content Got Too Real. Now OpenAI and Nvidia Are Using Google’s Watermarking System

AI Content Got Too Real. Now OpenAI and Nvidia Are Using Google’s Watermarking System.

0
Three years ago, Google introduced a watermarking system for AI-generated content called SynthID. Nobody was required to use it. It was just Google's answer to a problem the rest of the industry hadn't fully admitted existed yet. Now OpenAI is using it. So is Nvidia. So are ElevenLabs and Kakao. And Google says SynthID has already been applied to 100 billion images and videos, plus 60,000 years worth of audio. The timing matters. AI-generated images and video have gotten good enough that the old tells, the extra fingers, the smeared text, the wrong shadows, are mostly gone. What replaces them as a detection method isn't human judgment. It's watermarking inserted into the content at the point of generation, before it ever reaches anyone's feed. SynthID is Google's bet on how that works at scale, and a growing number of the industry's biggest names are now betting alongside it.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy