back to top
HomeTechGoogle's Next AI Bet Isn't on Chatbots. It's on Agents That Do...

Google’s Next AI Bet Isn’t on Chatbots. It’s on Agents That Do the Work.

- Advertisement -

For the last three years, Google has been playing catch-up in the chatbot race. ChatGPT arrived, Gemini followed, and the conversation quickly became about which AI could answer questions better, faster, and more accurately.

Google I/O this week suggested the company is done competing on chat alone.

Gemini 3.5 Flash launched Tuesday, and Google barely framed it as a conversational product. Instead, the company focused on coding pipelines, autonomous research, multi-agent coordination, and one demo that stood out across the industry: building an operating system from scratch with minimal human input.

The model can reportedly operate autonomously for hours. Google says it’s up to 4× faster than other frontier models, with an optimized version reaching 12× faster speeds at similar quality.

What 3.5 Flash is built for

The speed numbers Google is citing aren’t marketing. They’re architectural decisions that only make sense if you’re building for agents rather than conversations.

gemini 3.5 evaluation chart

A chatbot doesn’t need to be 12x faster than its predecessor. A response that takes two seconds instead of 24 seconds doesn’t meaningfully change the experience of asking a question and reading an answer. But in an agentic workflow where multiple AI instances are running in parallel on different components of the same task, latency compounds. Slow agents create bottlenecks. Fast agents create throughput.

Gemini 3.5 Flash was co-developed with Antigravity, Google’s agentic development platform, specifically so agents would have what DeepMind’s chief technologist Koray Kavukcuoglu described as “a native environment where they can live, work, and execute.” That’s a different design philosophy than building a model and then figuring out what to do with it afterward. The model and the environment were built together with agents in mind from the start.

The benchmarks back the direction. Kavukcuoglu told reporters ahead of I/O that 3.5 Flash outperforms Gemini 3.1 Pro on nearly all benchmarks including coding, agentic tasks, and multimodal reasoning. A Flash model beating the previous generation’s Pro model on capability benchmarks while being significantly faster is the kind of result that makes the agentic bet look credible rather than aspirational.

The OS demo

The demonstration that got the most attention at I/O was Google engineer Varun Mohan showing agents spawning off inside Antigravity to work on separate components before coming together to build a full operating system.

It’s easy to dismiss demos like this. Labs have been staging impressive controlled environments for years and the gap between what works in a keynote and what works in production is well documented.

What makes this one worth paying attention to is the coordination pattern. Multiple agents running simultaneously on distinct subtasks, merging outputs into a coherent whole. That’s the architecture that makes long-horizon agentic work possible. A single agent working sequentially hits context limits and coherence problems on complex tasks. A fleet of specialized agents working in parallel and combining results is a fundamentally different approach.

Google says 3.5 Flash is already producing actual results for partners outside the demo environment. Banks and fintechs automating multi-week workflows. Data science teams surfacing insights in complex environments. They’re production claims, and production claims are where the actual thing gets told over the next few months.

You May Like: Small But Powerful AI Models You Can Run Locally on Your System (No Cloud Needed)

How Pro and Flash work together

Google’s senior director Tulsee Doshi framed it clearly. Pro becomes the orchestrator, the model doing high-level planning, reasoning through what needs to happen and in what order. Flash becomes the executor, the sub-agents carrying out specific tasks at speed. The reasoning power sits at the top of the hierarchy where it’s needed. The brute force tool use sits at the execution layer where throughput matters more than deliberation.

That’s a meaningful architecture for anyone building serious agentic systems. You’re not choosing between a smart slow model and a fast capable one. You’re using both in the roles they’re actually suited for. Pro thinks, Flash does.

3.5 Flash is available today through Antigravity, the Gemini API, Gemini Enterprise, the Gemini app, and AI Mode in Search globally. It’s also the model powering Gemini Spark, Google’s new personal agent designed to run continuously helping users manage their digital life. 3.5 Pro doesn’t have a release date yet.

The part Google didn’t lead with

Autonomous agents that run for hours, spawn sub-agents, and execute multi-step workflows without human input are genuinely useful. They’re also a category of technology that makes the safety question harder than it was when AI was just answering questions.

Google is currently facing a lawsuit after a man nearly committed a mass casualty event and died by suicide following extended conversations with Gemini. That case involved a chatbot. The implications compound when the same underlying model is running autonomously for hours with access to tools, code execution, and real systems.

Google says Gemini 3.5 has strengthened safeguards around cyber threats and CBRN risks including chemical, biological, radiological, and nuclear. It’s also been calibrated to engage with sensitive questions rather than refuse them outright, which is a reasonable approach to making the model more useful but creates its own tradeoffs.

The model will pause and ask for human input when it hits decision points or permission issues that require judgment. That’s a meaningful design choice — keeping humans in the loop at the moments that matter most. Whether that’s sufficient for the level of autonomy Google is describing is a question the industry hasn’t fully answered yet.

Gemini Spark, the personal agent running 24/7 to help consumers manage their digital lives, brings this question closest to home. Most people using Spark won’t think about autonomous agents or safety architecture. They’ll just have something running continuously in the background with access to their calendar, email, and files. What that looks like when something goes wrong hasn’t been written yet.

Google is moving fast. That’s the point. The responsibility that comes with that speed is the part I/O didn’t spend much time on.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Open Source AI Coding Agents That Don't Need a Subscription

7 Open Source AI Coding Agents That Don’t Need a Subscription

0
Open almost any "best AI coding tools" list and you'll see the same names: Cursor, GitHub Copilot, Claude Code. They're good tools but they're also closed source and paid. What's changed over the past year isn't the quality of those products, it's how quickly the open-source alternatives have caught up. Some can orchestrate multiple agents, remember your projects across sessions, and automate complex development workflows. Many let you bring your own model, whether that's a local LLM, OpenRouter, OpenAI, GLM-5.2, Ornith, DeepSeek, or something else entirely. More importantly, you're in control. You decide where your code runs, which model powers it, and how your workflow evolves without being locked into a single company's ecosystem. If you've only looked at the paid options, these are the open-source AI coding tools worth knowing about.
Ornith Coding model that beats Claude opus 4.7

Ornith 1.0: The New Open-Source AI Model for Agentic Coding

0
Most reinforcement learning setups for coding models work the same way. Researchers build a harness, a fixed scaffold that tells the model how to approach a category of task, then the model gets rewarded for solving problems inside that structure. The harness stays fixed. Only the model's answers change. Ornith-1.0, a new open-source coding model family from DeepReinforce is not just about coding, Instead the model writes its own scaffold. At every training step, it looks at the task in front of it and the scaffold it used last time, then proposes a better version of that scaffold before even attempting an answer. The reward doesn't just grade the solution. It grades the scaffold that produced it. That's a small architectural choice with a strange consequence. A model that gets to design its own training process can, in theory, design one that cheats the verifier instead of solving the actual problem, and DeepReinforce is upfront that this happened during training. The fix they built for it is also worth understanding before getting to the benchmark numbers.
OpenAI Built Its First AI Chip. It's Not Trying to Replace NVIDIA

OpenAI Built Its First AI Chip. It’s Not Trying to Replace NVIDIA.

0
When the news broke that OpenAI had built a custom chip, the instinct was to frame it as a NVIDIA story. Another lab trying to cut the cord, reduce dependence on H100s, claw back some margin from the company that's been printing money off the AI boom. That's not quite what's happening here. The chip is called Jalapeño, built with Broadcom, and it doesn't touch training at all. It's an inference chip, meaning it only runs models after they're already built, when a user sends a message and ChatGPT has to respond. The compute-heavy work of actually training those models still runs on NVIDIA hardware. OpenAI isn't replacing NVIDIA. It's going after a different part of the problem entirely, the part that happens millions of times a day, every time someone uses one of their products. That distinction matters because inference is where AI costs actually accumulate at scale. Training happens once per model. Inference never stops.