back to top
HomeTechAI ModelsSarvam's New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B...

Sarvam’s New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B Active Parameters

- Advertisement -

Most open source model releases follow a predictable pattern. A lab drops weights, publishes benchmark numbers, and the community spends the next week figuring out if any of it holds up in real use. Sarvam’s 30B and 105B are different in one specific way — both are already in production.

The 105B is powering Indus, Sarvam’s reasoning and agentic assistant. The 30B is handling live multilingual voice calls on Samvaad, their conversational agent platform. These aren’t research models waiting to be tested. They shipped first and released the weights after.

What makes them technically interesting is the architecture. Both use Mixture of Experts, which means despite the parameter counts the models are only activating a fraction of their weights on any given token. The 105B activates 10.3B parameters. The 30B activates just 2.4B. That gap between total size and active compute is where the interesting performance story lives.

Here is what they actually do.

What Are These Models Actually Built For?

Sarvam built two models for two very different jobs. The 30B is a deployment model. It was designed to run fast, stay cheap, and handle real-time interactions without breaking a sweat. If you need an AI that can take a phone call in Hindi, understand a tool request mid-conversation, and respond before the user notices a delay, that’s what the 30B was built for.

The 105B is a reasoning model. It was built for the tasks where you need the AI to think, plan multiple steps ahead, use web search, write code, and execute complex workflows. It powers Indus, Sarvam’s AI assistant for complex queries.

Think of it this way. The 30B is what you deploy. The 105B is what you use when the problem is hard.

What Sarvam 105B Can Do?

Sarvam105B

The 105B is built for tasks that require actual thinking. Complex math problems, multi-step reasoning, coding challenges, and agentic workflows where the model needs to plan and execute across several turns. On those fronts it holds up well against models significantly larger than itself.

Where it genuinely stands out is web search and agentic tasks. On BrowseComp, a benchmark that tests how well a model finds real answers through live web search, it scored 49.5 against GLM-4.5-Air’s 21.3.

On Beyond AIME, which tests deep mathematical reasoning, the 105B scored 69.1 against GPT-OSS-120B’s 51.0. On τ² Bench, which measures long horizon agentic task completion, it scored 68.3 against GPT-OSS-120B’s 65.8. A 105B model outperforming a 120B one on the benchmarks that actually matter for real work is worth paying attention to.

That said GPT-OSS-120B still leads on LiveCodeBench, GPQA Diamond, and Arena Hard v2. Both are strong, just in different areas.

Limitations

The honest limitation is writing and instruction following. If your primary use case is creative writing or highly structured outputs, stronger options exist in this class.

But for reasoning, tool use, and long horizon tasks it punches well above what a 105B model should realistically deliver.

What Sarvam 30B Can Do?

sarvam30b

The 30B is built for real world deployment where speed and efficiency matter. It handles live multilingual voice calls, executes tool calls mid-conversation, and does all of this on resource constrained hardware without stuttering. On Samvaad (Sarvam’s conversational agent platform) it is already managing real phone conversations in Hindi and Tamil. The 2.4B active parameter design is not a compromise, it is the whole point.

Where it genuinely stands out is how it competes against much larger models on coding and math. It scores 97.0 on Math500 and 70.0 on LiveCodeBench, outperforming several models with significantly more active compute. For a deployment focused model those numbers are unexpected.

Limitations

SWE-Bench Verified at 34.0 is where the 30B shows its ceiling. Complex real world software engineering tasks remain challenging. If you are building something that requires deep code understanding across large repositories, the 30B will struggle. The 105B handles that better, and even then stronger options exist for pure coding workloads.

But for conversational deployment, voice, multilingual tool use, and real time applications it is genuinely hard to find an open source alternative at this size that performs as consistently.

How to run them locally?

Not on Ollama. If that’s where you were heading, there’s nothing there yet except Sarvam-1, which is their older model and not what we’re talking about.

The official options are HuggingFace with Transformers, SGLang, and vLLM. The vLLM path is the messiest of the three right now. Native support isn’t merged yet so you’re either building from source or running a hotpatch script. It works, but it’s not a five minute setup.

SGLang is the cleanest path at the moment. HuggingFace Transformers works too if you just want to get something running quickly.

Both models are available for download on HuggingFace and AI Kosh. The HuggingFace model pages have the most up to date setup instructions since they get updated as support improves.

If you don’t have the hardware or just don’t want to deal with the setup, Sarvam has an official API for both models. It’s OpenAI-compatible & Worth checking out their official API documentation if that’s the easier path for you.

Also Read: How GLM-5 Became the Most Talked-About “Nvidia-Free” AI Model This Week

So where does this leave us?

Two production-ready open source models with Apache 2.0 licenses that you can download today. One handles real-time voice calls in Hindi and Tamil on constrained hardware. The other matches frontier closed models on agentic benchmarks. Both came out of the same lab, trained entirely in India on Indian compute.

Whether that impresses you or not probably depends on what you expected open source to look like in 2026. For me it’s getting harder to argue that you need a paid API for most workloads.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
glm 5.2 ai open weights

GLM-5.2 Is the Closest an Open Model Has Come to Claude

0
What does it take for an open-weight model to stop chasing Claude and actually beat it? Every open-weight release for two years has told some version of the same story: closer, but not quite. The chart shrinks, the wording softens to "competitive with," and the conversation moves on until the next model repeats the cycle. GLM-5.2 breaks that pattern. The model is built to survive long, messy coding work, the kind that runs for hours without losing the thread. That's the pitch its maker is leading with. But scroll down their own benchmark table and something else is sitting there quietly: on a couple of standard math evals, this open model isn't approaching Claude Opus 4.8, GPT-5.5, or Gemini 3.1 Pro. It's beating all three, on the same table. It loses plenty of ground elsewhere, and that part matters just as much as the wins. But a model anyone can download under an MIT license, with no usage restrictions attached, coming out ahead of the lab everyone else measures themselves against, is worth pausing on before getting to what the rest of the numbers actually say.
Open-Source AI Tools Worth Trying Right Now

5 Open-Source AI Tools You Probably Haven’t Tried Yet

0
Every week brings another open source AI release, and most of them require setting up a Python environment. Find out the model card lied about VRAM requirements. By the time something actually runs, the appeal has mostly worn off. The five tools below skip most of that. One turns image and video generation into something closer to a desktop app. One gives DeepSeek an actual workspace instead of a browser tab. One builds UI prototypes using coding agents you probably already have installed. One quietly builds a memory system out of your own apps. And one is, literally, a desktop pet.
Claude Mythos 5 and Claude Fable 5

Claude Mythos 5 Was Too Powerful to Ship. Anthropic Released Fable 5 Instead.

0
Anthropic gave stripe early access to Fable 5 and set it loose on a 50 million line Ruby codebase. The migration that would have taken a full engineering team over two months got done in a day. That's a real company's real codebase and a task with real consequences if it goes wrong. Anthropic leads with it because it's the kind of result that's hard to argue with & because it sets up everything else they need to tell you about why this launch looks the way it does. Because here's the thing. The model Anthropic actually built Claude Mythos 5, isn't what most people are getting today. What's going live for general use is Claude Fable 5. Same underlying model. Different version. The parts Anthropic decided were too dangerous for public release got a separate wrapper, a separate name, and a separate approval process controlled in part by the US government.