Claude users have spent months playing a very specific game, how much work can you squeeze out of Opus before the rate limits slam shut?
Anthropic is finally loosening things up. The company says it’s doubling Claude Code limits, removing peak-hour reductions for paid users, and significantly raising Opus API caps. The reason is also there in the same announcement. Anthropic now has access to all the compute capacity at SpaceX’s Colossus 1 data center. That’s over 220,000 NVIDIA GPUs.
That’s the kind of announcement that makes you realize AI companies aren’t just shipping models anymore. They’re building power infrastructure.
This Isn’t really about Chatbot Anymore
The easiest way to understand Anthropic’s announcement is to look at what people are actually using Claude for now.
A normal chatbot session doesn’t burn through compute like this. You ask a few questions, maybe upload a file, then leave. That’s not the workload forcing companies to secure hundreds of thousands of GPUs.
Claude Code is different. People are leaving it running for hours inside terminals. They’re asking it to refactor projects, debug broken dependencies, analyze huge codebases, write tests, retry failed tasks, call tools, and keep context alive across long sessions. One coding agent can easily consume far more tokens than a casual chat user ever would.
That’s probably why the double limits part of Anthropic’s announcement matters more than it first appears.
For months, Claude had this weird split reputation among developers. The model itself was excellent, especially for coding, but heavy users kept running into walls. Long sessions would suddenly slow down. Opus users learned to ration prompts. Some developers even changed workflows entirely around avoiding rate limits.
Now Anthropic is signaling something pretty clearly, they expect usage to keep getting heavier And honestly, that tracks with where AI tooling is heading. The industry keeps talking about chatbots, but the real compute monster may end up being autonomous agents quietly running in the background for hours at a time.
| Tier | Maximum Input Tokens per Minute | Maximum Output Tokens per Minute |
| 1 | 30,000 -> 500,000 | 8,000 -> 80,000 |
| 2 | 450,000 -> 2,000,000 | 90,000 -> 200,000 |
| 3 | 800,000 -> 5,000,000 | 160,000 -> 400,000 |
| 4 | 2,000,000 -> 10,000,000 | 400,000 -> 800,000 |
Those new limits aren’t small bumps either. Some Opus tiers are seeing output caps increase by 10x. Which also explains why Anthropic immediately followed the announcement, a compute partnership with SpaceX.
You May Like Best AI Coding Models for Consumer Hardware
The real flex wasn’t the higher limits
The company says its SpaceX partnership gives it access to all of the compute capacity at Colossus 1, adding more than 300 megawatts of capacity and over 220,000 NVIDIA GPUs within the month.
That’s an absurd amount of compute for what most people still casually describe as “a chatbot.”
And Anthropic isn’t talking like this in isolation anymore. Over the past year, major AI companies have increasingly started announcing infrastructure deals the way they used to announce model launches. Amazon, Google, Microsoft, NVIDIA, xAI everybody suddenly sounds half software company, half utility provider.
The shift makes sense once you look at where AI usage is heading. Reasoning models are expensive to run. Coding agents stay active for long stretches. Enterprise workloads don’t disappear after a few prompts. The more capable these systems become, the more compute they consume in the background.
A year ago, companies competed on benchmark screenshots. Now they’re competing on who can secure enough GPUs to keep the systems online at scale.
This feels like the next phase of the AI race
For a while, AI companies competed mostly on model quality. Better benchmarks, reasoning, coding. But now the conversation is shifting towards, who can actually keep these systems running at scale without constantly slamming users into limits.
Anthropic partnering with SpaceX for compute sounds unusual today. It probably won’t a year from now. Because at this point, the bottleneck isn’t really ideas anymore. It’s power, GPUs, and who can secure enough infrastructure before everyone else does.




