back to top
HomeTechGoogle's Next AI Bet Isn't on Chatbots. It's on Agents That Do...

Google’s Next AI Bet Isn’t on Chatbots. It’s on Agents That Do the Work.

- Advertisement -

For the last three years, Google has been playing catch-up in the chatbot race. ChatGPT arrived, Gemini followed, and the conversation quickly became about which AI could answer questions better, faster, and more accurately.

Google I/O this week suggested the company is done competing on chat alone.

Gemini 3.5 Flash launched Tuesday, and Google barely framed it as a conversational product. Instead, the company focused on coding pipelines, autonomous research, multi-agent coordination, and one demo that stood out across the industry: building an operating system from scratch with minimal human input.

The model can reportedly operate autonomously for hours. Google says it’s up to 4× faster than other frontier models, with an optimized version reaching 12× faster speeds at similar quality.

What 3.5 Flash is built for

The speed numbers Google is citing aren’t marketing. They’re architectural decisions that only make sense if you’re building for agents rather than conversations.

gemini 3.5 evaluation chart

A chatbot doesn’t need to be 12x faster than its predecessor. A response that takes two seconds instead of 24 seconds doesn’t meaningfully change the experience of asking a question and reading an answer. But in an agentic workflow where multiple AI instances are running in parallel on different components of the same task, latency compounds. Slow agents create bottlenecks. Fast agents create throughput.

Gemini 3.5 Flash was co-developed with Antigravity, Google’s agentic development platform, specifically so agents would have what DeepMind’s chief technologist Koray Kavukcuoglu described as “a native environment where they can live, work, and execute.” That’s a different design philosophy than building a model and then figuring out what to do with it afterward. The model and the environment were built together with agents in mind from the start.

The benchmarks back the direction. Kavukcuoglu told reporters ahead of I/O that 3.5 Flash outperforms Gemini 3.1 Pro on nearly all benchmarks including coding, agentic tasks, and multimodal reasoning. A Flash model beating the previous generation’s Pro model on capability benchmarks while being significantly faster is the kind of result that makes the agentic bet look credible rather than aspirational.

The OS demo

The demonstration that got the most attention at I/O was Google engineer Varun Mohan showing agents spawning off inside Antigravity to work on separate components before coming together to build a full operating system.

It’s easy to dismiss demos like this. Labs have been staging impressive controlled environments for years and the gap between what works in a keynote and what works in production is well documented.

What makes this one worth paying attention to is the coordination pattern. Multiple agents running simultaneously on distinct subtasks, merging outputs into a coherent whole. That’s the architecture that makes long-horizon agentic work possible. A single agent working sequentially hits context limits and coherence problems on complex tasks. A fleet of specialized agents working in parallel and combining results is a fundamentally different approach.

Google says 3.5 Flash is already producing actual results for partners outside the demo environment. Banks and fintechs automating multi-week workflows. Data science teams surfacing insights in complex environments. They’re production claims, and production claims are where the actual thing gets told over the next few months.

You May Like: Small But Powerful AI Models You Can Run Locally on Your System (No Cloud Needed)

How Pro and Flash work together

Google’s senior director Tulsee Doshi framed it clearly. Pro becomes the orchestrator, the model doing high-level planning, reasoning through what needs to happen and in what order. Flash becomes the executor, the sub-agents carrying out specific tasks at speed. The reasoning power sits at the top of the hierarchy where it’s needed. The brute force tool use sits at the execution layer where throughput matters more than deliberation.

That’s a meaningful architecture for anyone building serious agentic systems. You’re not choosing between a smart slow model and a fast capable one. You’re using both in the roles they’re actually suited for. Pro thinks, Flash does.

3.5 Flash is available today through Antigravity, the Gemini API, Gemini Enterprise, the Gemini app, and AI Mode in Search globally. It’s also the model powering Gemini Spark, Google’s new personal agent designed to run continuously helping users manage their digital life. 3.5 Pro doesn’t have a release date yet.

The part Google didn’t lead with

Autonomous agents that run for hours, spawn sub-agents, and execute multi-step workflows without human input are genuinely useful. They’re also a category of technology that makes the safety question harder than it was when AI was just answering questions.

Google is currently facing a lawsuit after a man nearly committed a mass casualty event and died by suicide following extended conversations with Gemini. That case involved a chatbot. The implications compound when the same underlying model is running autonomously for hours with access to tools, code execution, and real systems.

Google says Gemini 3.5 has strengthened safeguards around cyber threats and CBRN risks including chemical, biological, radiological, and nuclear. It’s also been calibrated to engage with sensitive questions rather than refuse them outright, which is a reasonable approach to making the model more useful but creates its own tradeoffs.

The model will pause and ask for human input when it hits decision points or permission issues that require judgment. That’s a meaningful design choice — keeping humans in the loop at the moments that matter most. Whether that’s sufficient for the level of autonomy Google is describing is a question the industry hasn’t fully answered yet.

Gemini Spark, the personal agent running 24/7 to help consumers manage their digital lives, brings this question closest to home. Most people using Spark won’t think about autonomous agents or safety architecture. They’ll just have something running continuously in the background with access to their calendar, email, and files. What that looks like when something goes wrong hasn’t been written yet.

Google is moving fast. That’s the point. The responsibility that comes with that speed is the part I/O didn’t spend much time on.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Google Built Gemma 4 12B Without Multimodal Encoders

Google Built Gemma 4 12B Without Multimodal Encoders

0
Every multimodal model you've used has the same basic system. Text goes in one way, images go through a vision encoder first, audio goes through an audio encoder first, and then everything gets handed off to the language model in a form it can work with. The encoders are load-bearing and you don't just remove them.Google actually removed them.Gemma 4 12B takes raw image patches and raw audio waveforms and projects them directly into the same embedding space as text tokens. There is no vision encoder or audio encoder. One decoder handling everything.
MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

0
Most models quit around submission 30 because they stop finding improvement and exit on their own. That's what happened when MiniMax ran a CUDA kernel optimization task against a field of frontier models. Every model except two called it done within the first 30 submissions. M3's best result came on submission 145. After 24 hours. After multiple plateaus where the numbers stopped moving and a reasonable model would have concluded there was nothing left to find. That's the thing MiniMax released yesterday. An AI model with a 1M token context window, native multimodality, and apparently a problem with knowing when to stop.
Anthropic Files for an IPO. AI Is Entering Its Public Company Era

Anthropic Files for an IPO. AI Is Entering Its Public Company Era.

0
Anthropic has officially taken its first step toward becoming a public company. In a brief announcement on Monday, the company said it had confidentially submitted a draft S-1 registration statement to the U.S. Securities and Exchange Commission for a proposed initial public offering. The filing doesn't reveal a share price, a fundraising target, or even a timeline. For now, it simply gives Anthropic the option to go public once the SEC review process is complete. Just a few years ago, Anthropic was a small group of former OpenAI researchers trying to build an alternative vision for advanced AI. Today, it sits among the handful of companies shaping the industry's future and that's why this filing matters. It's one of the world's most influential AI labs beginning the transition from a privately funded research company to a business that may eventually answer to public shareholders. For most of the AI boom, the biggest bets were made behind closed doors. Venture firms, sovereign wealth funds, and tech giants supplied the capital while the public watched from the outside. Anthropic's filing suggests that era may be starting to change.