back to top
HomeTechMistral Just Replaced Three of Its Own Models With One. Meet Medium...

Mistral Just Replaced Three of Its Own Models With One. Meet Medium 3.5

- Advertisement -

Mistral has been shipping specialized models for a while now. One for coding. One for reasoning. One for chat. Each one doing its thing separately and requiring a different deployment decision.

Medium 3.5 ends that confusion. One 128B dense model, one set of weights, handling instruction following, reasoning, and coding together. Mistral didn’t just release a new model, they retired three existing ones to make room for it. Devstral 2, Magistral and even Medium 3.1 is gone. Medium 3.5 is what replaced all of them.

That’s either a sign of real confidence or a very expensive consolidation bet. Looking at the benchmarks, it’s starting to look like the former.

What they actually built

128B parameters, dense architecture, no mixture of experts routing. That’s a deliberate choice in an era where everyone is going sparse. Dense means predictable, same compute per token every time, no routing overhead, no expert load balancing headaches in production.

The context window sits at 256K tokens. Vision is built in. Mistral trained the encoder from scratch to handle variable image sizes and aspect ratios rather than bolting on an existing one. Reasoning effort is configurable per request, meaning the same model handles a quick chat reply and a complex multi-step agentic run without switching between deployments.

But before anything else: the license is a modified MIT, not standard MIT. Commercial use is allowed but companies above a certain revenue threshold hit exceptions. Mistral has been loose with the word open source here and that’s worth knowing before you build on it. More on that later.

The three models it replaced and why

Devstral 2 was Mistral’s dedicated coding agent. It lived inside their Vibe CLI and handled agentic coding tasks specifically. Medium 3.5 now powers Vibe instead and according to Mistral’s own benchmarks, it outperforms Devstral 2 across every agentic benchmark they ran.

Magistral was their reasoning model. The one you reached for when you needed the model to think carefully through a problem. Medium 3.5 replaces it in Le Chat with configurable reasoning effort that toggles between fast response mode and deep reasoning mode per request. Same deployment, two behaviors.

Medium 3.1 was the previous generation of this exact tier. The internal comparison tells that cleanly. TAU3 Telecom went from 60.5 on Medium 1.2 to 91.4 on Medium 3.5. TAU3 Airline from 53.5 to 72.0. TAU3 Retail from 70.2 to 76.1. These are the kind of numbers that make you question why you’d keep the old version around at all.

Mistral retire those models because Medium 3.5 made them redundant.

Related: Mistral Small 4: The Open Source Model Replacing Three of Mistral’s Own AI Models

The agentic numbers

Agentic Benchmarks of Mistral Medium 3.5 with competing models

SWE-bench Verified at 77.6 is the headline coding result. For context Claude Sonnet 4.6 sits at 79.6, Claude Sonnet 4.5 at 77.2, and Kimi K2.5 at 76.8. Medium 3.5 is in that conversation, not leading it but genuinely competitive with models that get significantly more attention.

Where it pulls ahead is TAU3 Telecom. 91.4 against GLM-5.1’s 98.7 and Qwen3.5’s 97.8 those two lead, Medium 3.5 is third. On TAU3 Airline it scores 72.0, matching Claude Sonnet 4.5 exactly. TAU3 Retail at 76.1 sits between Sonnet 4.5 at 72.4 and Sonnet 4.6 at 75.9.

BrowseComp is where it gets more honest. 48.6 against Kimi K2.5’s 74.7 and Claude Sonnet 4.5’s 43.9. Ahead of Sonnet 4.5, well behind Kimi. TAU3 Banking at 13.4 is the weakest number. Claude Sonnet 4.5 at 22.4 and Kimi at 14.9 both tell you this category is hard across the board, but 13.4 is still the bottom of that chart.

Medium 3.5 competes in agentic coding and multi-step execution. It doesn’t dominate. It earns its place in the conversation without being the clear winner of it.

Benchmarks

BenchmarkWhat it testsMistral Medium 3.5Claude Sonnet 4.6Kimi K2.5GLM-5.1
SWE-bench VerifiedReal GitHub issue resolution77.679.676.880.2
TAU3 TelecomMulti-step agent execution91.470.486.898.7
TAU3 AirlineMulti-step agent execution72.083.076.579.5
TAU3 RetailMulti-step agent execution76.175.972.876.3
TAU3 BankingMulti-step agent execution13.428.414.916.2
BrowseCompWeb browsing and research48.643.974.774.9

All comparisons are against Mistral’s own published benchmarks. Self-reported results, standard condition applies.

One thing to note that TAU3 Banking at 13.4 is low across every model on this chart. That’s less a Mistral problem and more a signal that banking domain agent tasks are genuinely hard for current models regardless of who built them.

You May Like: SenseNova-U1: Open Source AI That Understands and Generates Images in One Model

The license question

The actual license is a modified MIT with revenue exceptions for companies above a certain size. That’s not standard MIT and it’s not Apache 2.0.

For individual developers, small teams, and most startups this won’t matter day to day. Commercial use is allowed within the license terms. But if you’re at a company with significant revenue and you’re planning to build production systems on top of this, read the full modified MIT license before committing. The exceptions are real and the definition of what triggers them matters.

Mistral has been consistently loose with the “open source” label across their releases. This one follows that pattern. Open weight with a permissive custom license is the accurate description. Worth knowing before you architect around it.

How to try it

Available on Ollama for the quickest start. GGUFs from Unsloth work through llama.cpp and LM Studio support is listed as work in progress. For production serving vLLM and SGLang both have day-one support. EAGLE speculative decoding is available for both to speed up local inference if latency matters.

For most developers the Mistral API is the practical starting point before committing to local infrastructure. The model powers Le Chat now so you can get a feel for it there without any setup.

One important note if you’re using GGUFs: there was a bug in the original Transformers config that caused long-context performance degradation. Make sure any GGUF you download was generated after the fix was merged. Older files will give you subpar results especially on long sessions.

Who should care

If you were already using Devstral 2 for agentic coding this is your direct upgrade. Same workflow, better numbers, one fewer deployment to maintain.

If you were using Magistral for reasoning tasks, Medium 3.5 covers that with configurable effort. The convenience argument alone is worth evaluating.

If you’re building production agentic systems and need a model that handles coding, reasoning, and instruction following in one deployment without managing multiple specialized models, this is the most direct answer Mistral has given to that problem.

128B dense is not consumer hardware. You’re looking at serious GPU infrastructure or the Mistral API for anything production grade. If that’s not your situation, the model is still accessible via Ollama for evaluation but local inference at full precision needs decent hardware.

Also if revenue thresholds matter to your legal team, get the modified MIT terms reviewed before building on this.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Elon Musk Lost His OpenAI Lawsuit. The Jury Never Actually Decided If He Was Right

Elon Musk Lost His OpenAI Lawsuit. The Bigger Question Was Never Put to the...

0
Elon Musk spent months in a California courtroom trying to prove that Sam Altman stole a charity. He got nine jurors, weeks of testimony from some of the biggest names in Silicon Valley, and a front row seat to the most revealing airing of OpenAI's founding history ever put on public record. Then the jury came back in under two hours and told him he'd filed too late. Not that he was wrong. Not that Altman and Brockman acted properly. Just that whatever happened between them and Musk, the legal clock had already run out before he decided to do something about it. The question of whether OpenAI actually betrayed its founding mission, the question that made this case worth following in the first place never got answered.
Apple New Siri Could Auto-Delete Chats. Google Gemini Is Reportedly Under the Hood

Apple’s New Siri Could Auto-Delete Chats. Google Gemini Is Reportedly Under the Hood.

0
Apple has a Siri problem and everyone knows it. ChatGPT became a verb. Gemini is powering half the Android ecosystem. Claude is showing up in enterprise workflows. Meanwhile Siri is still struggling to set timers reliably. WWDC is in June and Apple is reportedly planning its biggest Siri overhaul yet. A standalone app, a proper chatbot experience, and a privacy pitch front and center. According to Bloomberg's Mark Gurman, Apple executives plan to argue they're taking a more privacy-friendly approach than every other AI company out there. That argument gets complicated quickly. The model powering this new Siri is Google Gemini.
zero language for ai agents

Vercel Built a Programming Language for AI Agents. The Compiler Speaks JSON.

0
Every serious coding agent including Claude Code, Cursor, Copilot, whatever you're using shares the same quiet problem. The agent writes code, the compiler throws an error, and the agent has to read text written for a human engineer to figure out what went wrong and how to fix it. That sounds like a minor inconvenience. In practice it's one of the main reasons agentic coding loops break down. Error message formats change between compiler versions. The same underlying problem gets described differently depending on context. There's no built-in concept of a repair action, just prose that an agent has to parse and hope it understood correctly. Vercel Labs just released Zero, an experimental systems language built from day one around the idea that the compiler should talk to agents as clearly as it talks to humans. Its Apache 2.0 licensed, available now and genuinely interesting even at v0.1.1.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy