back to top
HomeTechMicrosoft and Uber Are Running Into an AI Cost Problem

Microsoft and Uber Are Running Into an AI Cost Problem

- Advertisement -

The pitch was impressive. AI tools would make developers faster, reduce headcount costs, and pay for themselves many times over. Companies that moved early would have a structural advantage over those that waited.

Microsoft believed it. So did Uber. Both pushed hard on AI coding tool adoption across their engineering teams. Both are now dealing with same problem: the faster their employees embraced the tools, the faster the bills grew. In some cases those bills have started exceeding what the same work would have cost with human labor.

The problem is what happens to the economics when thousands of employees use something that charges per unit of thought.

The token trap nobody planned for

AI models charge per token, the basic unit of text the model processes and generates.

When Uber’s CTO disclosed that the company had burned through its entire 2026 AI coding budget in four months, the detail that got less attention was how it happened. Uber had been actively pushing adoption, running internal leaderboards to rank teams by AI tool usage. More encouragement meant more usage. More usage meant more tokens. More tokens meant more compute. The budget math that looked reasonable in January looked catastrophic by April.

Amazon has been telling staff to “tokenmaxx,” meaning use as many tokens as possible. Meta built an internal tracking tool called Claudeonomics to monitor which employees were using AI most heavily. These are companies treating token consumption as a metric to maximize, which is exactly backwards if the goal is cost efficiency.

The paradox is structural. Agentic AI systems, the ones that work autonomously across multiple steps consume more tokens per task than standard models. Goldman Sachs forecasts a 24-fold increase in enterprise token consumption by 2030 as agentic deployments scale. Gartner projects that inference costs will fall nearly 90% by the same year. But Gartner also warned that cheaper tokens will not produce cheaper bills, because consumption growth will outpace price declines and AI providers are unlikely to pass through the full benefit of cost reductions to business customers.

Cheaper per token. Higher total bill. The more you use it the worse the math gets.

When compute costs more than the employee

The most uncomfortable acknowledgment of where this is heading came from Bryan Catanzaro, Vice President of applied deep learning at Nvidia, the company that supplies the chips powering essentially all of this infrastructure.

“For my team, the cost of compute is far beyond the costs of the employees,” he said.

That statement carries weight because of who said it. Nvidia has more financial interest in AI compute spending than almost any other company on earth. When its own executive acknowledges that compute costs are exceeding labor costs for his team, it is not a bearish take on AI. It is an honest description of the current economics from someone with no incentive to understate them.

Microsoft’s situation illustrates the same point from a different angle. The company cancelled most of its direct Claude Code licences after thousands of employees adopted the tool faster than anyone anticipated. The move doesn’t touch Microsoft’s $5 billion investment in Anthropic or its commercial relationship with the company. It’s a pure cost control decision on a tool its own engineers had grown to depend on. When the company that built GitHub Copilot, owns the dominant AI coding platform, and made one of the largest AI bets in the industry pulls back on AI coding spend, the economics are the only explanation that makes sense.

You May Like: Anthropic Says Mythos Isn’t Public Yet. ‘Mythos 1’ Keeps Appearing Anyway.

Where the math actually works

MIT research found AI is only economically viable in a limited number of job roles at current pricing. The tasks where it clears the bar tend to share common characteristics: well-defined scope, high repetition, low need for judgment across long sessions. Boilerplate generation, test scaffolding, documentation, straightforward refactors. Tasks where a developer might spend twenty minutes doing something mechanical and the AI does it in thirty seconds.

The tasks where the math breaks down are the ones that require sustained context, iterative judgment, and long agentic sessions. Those are also the tasks the industry has been most aggressively promoting AI for. The gap between where AI is cost-effective and where it is being deployed is where the Microsoft and Uber problem lives.

AI coding tools are currently better described as expensive productivity multipliers for specific task types than as wholesale replacements for engineering labor costs. The companies that figure out how to use them right, rather than encouraging blanket maximum adoption, will likely see the economics work. The ones that ran internal leaderboards rewarding token consumption are learning that lesson the hard way.

You May Like: Open Source Tools That Do What Your OS Should Have Done Already

The bill is coming due

AI was sold as the great labor cost reduction play. The early returns from two companies that believed that part hardest suggest the reality is more complicated.

The tools work. The economics at scale don’t, at least not yet. Cheaper tokens haven’t produced cheaper bills. Encouraged adoption has produced budget crises. And the executive most invested in AI compute spending just admitted his compute costs exceed his payroll.

Jensen Huang has said he imagines 100 AI agents working alongside every human employee at Nvidia one day. That future may still arrive. But if token consumption keeps rising faster than unit costs fall, it will arrive with a price tag nobody has fully reckoned with yet. Microsoft and Uber just got the first invoice.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Google Built Gemma 4 12B Without Multimodal Encoders

Google Built Gemma 4 12B Without Multimodal Encoders

0
Every multimodal model you've used has the same basic system. Text goes in one way, images go through a vision encoder first, audio goes through an audio encoder first, and then everything gets handed off to the language model in a form it can work with. The encoders are load-bearing and you don't just remove them.Google actually removed them.Gemma 4 12B takes raw image patches and raw audio waveforms and projects them directly into the same embedding space as text tokens. There is no vision encoder or audio encoder. One decoder handling everything.
MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

0
Most models quit around submission 30 because they stop finding improvement and exit on their own. That's what happened when MiniMax ran a CUDA kernel optimization task against a field of frontier models. Every model except two called it done within the first 30 submissions. M3's best result came on submission 145. After 24 hours. After multiple plateaus where the numbers stopped moving and a reasonable model would have concluded there was nothing left to find. That's the thing MiniMax released yesterday. An AI model with a 1M token context window, native multimodality, and apparently a problem with knowing when to stop.
Anthropic Files for an IPO. AI Is Entering Its Public Company Era

Anthropic Files for an IPO. AI Is Entering Its Public Company Era.

0
Anthropic has officially taken its first step toward becoming a public company. In a brief announcement on Monday, the company said it had confidentially submitted a draft S-1 registration statement to the U.S. Securities and Exchange Commission for a proposed initial public offering. The filing doesn't reveal a share price, a fundraising target, or even a timeline. For now, it simply gives Anthropic the option to go public once the SEC review process is complete. Just a few years ago, Anthropic was a small group of former OpenAI researchers trying to build an alternative vision for advanced AI. Today, it sits among the handful of companies shaping the industry's future and that's why this filing matters. It's one of the world's most influential AI labs beginning the transition from a privately funded research company to a business that may eventually answer to public shareholders. For most of the AI boom, the biggest bets were made behind closed doors. Venture firms, sovereign wealth funds, and tech giants supplied the capital while the public watched from the outside. Anthropic's filing suggests that era may be starting to change.