For the last three years, Google has been playing catch-up in the chatbot race. ChatGPT arrived, Gemini followed, and the conversation quickly became about which AI could answer questions better, faster, and more accurately.
Google I/O this week suggested the company is done competing on chat alone.
Gemini 3.5 Flash launched Tuesday, and Google barely framed it as a conversational product. Instead, the company focused on coding pipelines, autonomous research, multi-agent coordination, and one demo that stood out across the industry: building an operating system from scratch with minimal human input.
The model can reportedly operate autonomously for hours. Google says it’s up to 4× faster than other frontier models, with an optimized version reaching 12× faster speeds at similar quality.
What 3.5 Flash is built for
The speed numbers Google is citing aren’t marketing. They’re architectural decisions that only make sense if you’re building for agents rather than conversations.

A chatbot doesn’t need to be 12x faster than its predecessor. A response that takes two seconds instead of 24 seconds doesn’t meaningfully change the experience of asking a question and reading an answer. But in an agentic workflow where multiple AI instances are running in parallel on different components of the same task, latency compounds. Slow agents create bottlenecks. Fast agents create throughput.
Gemini 3.5 Flash was co-developed with Antigravity, Google’s agentic development platform, specifically so agents would have what DeepMind’s chief technologist Koray Kavukcuoglu described as “a native environment where they can live, work, and execute.” That’s a different design philosophy than building a model and then figuring out what to do with it afterward. The model and the environment were built together with agents in mind from the start.
The benchmarks back the direction. Kavukcuoglu told reporters ahead of I/O that 3.5 Flash outperforms Gemini 3.1 Pro on nearly all benchmarks including coding, agentic tasks, and multimodal reasoning. A Flash model beating the previous generation’s Pro model on capability benchmarks while being significantly faster is the kind of result that makes the agentic bet look credible rather than aspirational.
The OS demo
The demonstration that got the most attention at I/O was Google engineer Varun Mohan showing agents spawning off inside Antigravity to work on separate components before coming together to build a full operating system.
It’s easy to dismiss demos like this. Labs have been staging impressive controlled environments for years and the gap between what works in a keynote and what works in production is well documented.
What makes this one worth paying attention to is the coordination pattern. Multiple agents running simultaneously on distinct subtasks, merging outputs into a coherent whole. That’s the architecture that makes long-horizon agentic work possible. A single agent working sequentially hits context limits and coherence problems on complex tasks. A fleet of specialized agents working in parallel and combining results is a fundamentally different approach.
Google says 3.5 Flash is already producing actual results for partners outside the demo environment. Banks and fintechs automating multi-week workflows. Data science teams surfacing insights in complex environments. They’re production claims, and production claims are where the actual thing gets told over the next few months.
You May Like: Small But Powerful AI Models You Can Run Locally on Your System (No Cloud Needed)
How Pro and Flash work together
Google’s senior director Tulsee Doshi framed it clearly. Pro becomes the orchestrator, the model doing high-level planning, reasoning through what needs to happen and in what order. Flash becomes the executor, the sub-agents carrying out specific tasks at speed. The reasoning power sits at the top of the hierarchy where it’s needed. The brute force tool use sits at the execution layer where throughput matters more than deliberation.
That’s a meaningful architecture for anyone building serious agentic systems. You’re not choosing between a smart slow model and a fast capable one. You’re using both in the roles they’re actually suited for. Pro thinks, Flash does.
3.5 Flash is available today through Antigravity, the Gemini API, Gemini Enterprise, the Gemini app, and AI Mode in Search globally. It’s also the model powering Gemini Spark, Google’s new personal agent designed to run continuously helping users manage their digital life. 3.5 Pro doesn’t have a release date yet.
The part Google didn’t lead with
Autonomous agents that run for hours, spawn sub-agents, and execute multi-step workflows without human input are genuinely useful. They’re also a category of technology that makes the safety question harder than it was when AI was just answering questions.
Google is currently facing a lawsuit after a man nearly committed a mass casualty event and died by suicide following extended conversations with Gemini. That case involved a chatbot. The implications compound when the same underlying model is running autonomously for hours with access to tools, code execution, and real systems.
Google says Gemini 3.5 has strengthened safeguards around cyber threats and CBRN risks including chemical, biological, radiological, and nuclear. It’s also been calibrated to engage with sensitive questions rather than refuse them outright, which is a reasonable approach to making the model more useful but creates its own tradeoffs.
The model will pause and ask for human input when it hits decision points or permission issues that require judgment. That’s a meaningful design choice — keeping humans in the loop at the moments that matter most. Whether that’s sufficient for the level of autonomy Google is describing is a question the industry hasn’t fully answered yet.
Gemini Spark, the personal agent running 24/7 to help consumers manage their digital lives, brings this question closest to home. Most people using Spark won’t think about autonomous agents or safety architecture. They’ll just have something running continuously in the background with access to their calendar, email, and files. What that looks like when something goes wrong hasn’t been written yet.
Google is moving fast. That’s the point. The responsibility that comes with that speed is the part I/O didn’t spend much time on.




