back to top
HomeTechAI ModelsSarvam's New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B...

Sarvam’s New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B Active Parameters

- Advertisement -

Most open source model releases follow a predictable pattern. A lab drops weights, publishes benchmark numbers, and the community spends the next week figuring out if any of it holds up in real use. Sarvam’s 30B and 105B are different in one specific way — both are already in production.

The 105B is powering Indus, Sarvam’s reasoning and agentic assistant. The 30B is handling live multilingual voice calls on Samvaad, their conversational agent platform. These aren’t research models waiting to be tested. They shipped first and released the weights after.

What makes them technically interesting is the architecture. Both use Mixture of Experts, which means despite the parameter counts the models are only activating a fraction of their weights on any given token. The 105B activates 10.3B parameters. The 30B activates just 2.4B. That gap between total size and active compute is where the interesting performance story lives.

Here is what they actually do.

What Are These Models Actually Built For?

Sarvam built two models for two very different jobs. The 30B is a deployment model. It was designed to run fast, stay cheap, and handle real-time interactions without breaking a sweat. If you need an AI that can take a phone call in Hindi, understand a tool request mid-conversation, and respond before the user notices a delay, that’s what the 30B was built for.

The 105B is a reasoning model. It was built for the tasks where you need the AI to think, plan multiple steps ahead, use web search, write code, and execute complex workflows. It powers Indus, Sarvam’s AI assistant for complex queries.

Think of it this way. The 30B is what you deploy. The 105B is what you use when the problem is hard.

What Sarvam 105B Can Do?

Sarvam105B

The 105B is built for tasks that require actual thinking. Complex math problems, multi-step reasoning, coding challenges, and agentic workflows where the model needs to plan and execute across several turns. On those fronts it holds up well against models significantly larger than itself.

Where it genuinely stands out is web search and agentic tasks. On BrowseComp, a benchmark that tests how well a model finds real answers through live web search, it scored 49.5 against GLM-4.5-Air’s 21.3.

On Beyond AIME, which tests deep mathematical reasoning, the 105B scored 69.1 against GPT-OSS-120B’s 51.0. On τ² Bench, which measures long horizon agentic task completion, it scored 68.3 against GPT-OSS-120B’s 65.8. A 105B model outperforming a 120B one on the benchmarks that actually matter for real work is worth paying attention to.

That said GPT-OSS-120B still leads on LiveCodeBench, GPQA Diamond, and Arena Hard v2. Both are strong, just in different areas.

Limitations

The honest limitation is writing and instruction following. If your primary use case is creative writing or highly structured outputs, stronger options exist in this class.

But for reasoning, tool use, and long horizon tasks it punches well above what a 105B model should realistically deliver.

What Sarvam 30B Can Do?

sarvam30b

The 30B is built for real world deployment where speed and efficiency matter. It handles live multilingual voice calls, executes tool calls mid-conversation, and does all of this on resource constrained hardware without stuttering. On Samvaad (Sarvam’s conversational agent platform) it is already managing real phone conversations in Hindi and Tamil. The 2.4B active parameter design is not a compromise, it is the whole point.

Where it genuinely stands out is how it competes against much larger models on coding and math. It scores 97.0 on Math500 and 70.0 on LiveCodeBench, outperforming several models with significantly more active compute. For a deployment focused model those numbers are unexpected.

Limitations

SWE-Bench Verified at 34.0 is where the 30B shows its ceiling. Complex real world software engineering tasks remain challenging. If you are building something that requires deep code understanding across large repositories, the 30B will struggle. The 105B handles that better, and even then stronger options exist for pure coding workloads.

But for conversational deployment, voice, multilingual tool use, and real time applications it is genuinely hard to find an open source alternative at this size that performs as consistently.

How to run them locally?

Not on Ollama. If that’s where you were heading, there’s nothing there yet except Sarvam-1, which is their older model and not what we’re talking about.

The official options are HuggingFace with Transformers, SGLang, and vLLM. The vLLM path is the messiest of the three right now. Native support isn’t merged yet so you’re either building from source or running a hotpatch script. It works, but it’s not a five minute setup.

SGLang is the cleanest path at the moment. HuggingFace Transformers works too if you just want to get something running quickly.

Both models are available for download on HuggingFace and AI Kosh. The HuggingFace model pages have the most up to date setup instructions since they get updated as support improves.

If you don’t have the hardware or just don’t want to deal with the setup, Sarvam has an official API for both models. It’s OpenAI-compatible & Worth checking out their official API documentation if that’s the easier path for you.

Also Read: How GLM-5 Became the Most Talked-About “Nvidia-Free” AI Model This Week

So where does this leave us?

Two production-ready open source models with Apache 2.0 licenses that you can download today. One handles real-time voice calls in Hindi and Tamil on constrained hardware. The other matches frontier closed models on agentic benchmarks. Both came out of the same lab, trained entirely in India on Indian compute.

Whether that impresses you or not probably depends on what you expected open source to look like in 2026. For me it’s getting harder to argue that you need a paid API for most workloads.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Granite 4.1 IBM's 8B Model Is Competing With Models Four Times Its Size

Granite 4.1: IBM’s 8B Model Is Competing With Models Four Times Its Size

0
IBM just released Granite 4.1, a family of open source language models built specifically for enterprise use. Three sizes, Apache 2.0 licensed and trained on 15 trillion tokens with a level of pipeline obsession that's worth understanding. But there's one result in the benchmarks I keep coming back to. The 8B model. Dense architecture, no MoE tricks, no extended reasoning chains. It matches or beats Granite 4.0-H-Small across basically every benchmark they ran. That older model has 32B parameters with 9B active. This one has 8 billion. Full stop. That result is either very impressive or it means the old model was underbuilt. Probably both. Here's how they built it, what the numbers actually say, and whether any of it matters for your use case.
Laguna XS.2 AI Model For Coding By Poolside AI

Laguna XS.2 Feels Like a Model That Was Never Meant to Be Public. It...

0
Poolside AI spent years building AI for governments and public sector clients, the kind of organizations with security requirements so strict that most software never gets near them. Air-gapped deployments, on-premise infrastructure, clearance levels most developers don't think about. That's the world Poolside was operating in while the rest of the AI industry was racing to ship consumer products. Laguna XS.2 is their first open source release. Its Apache 2.0 Licensed, weights on HuggingFace, runs on a Mac with 36GB of RAM and available on Ollama right now. A model trained on the same infrastructure with the same rigor as something built for high security government environments, free for anyone to download and build with. That backstory matters because it shapes what this model actually is. It wasn't built to win a benchmark leaderboard. It was built to work reliably on hard problems in environments where failure is not an option. The open source release is almost an afterthought, a decision to share what they've learned.
Open Source Tools That Do What Your OS Should Have Done Already

8 Open Source Tools That Do What Your OS Should Have Done Already

0
Your OS was built for everyone. Which means it was optimized for no one in particular. The clipboard works the same way it did decades ago. Audio is still one slider for everything. Window management is still a guessing game. And nobody is coming to fix any of it because technically it works. Just not the way you actually want it to. The open source community noticed. And they got to work. These 8 tools don't ask you to switch operating systems or learn a new workflow. They just quietly fix the things that slow you down every single day. Some of them will feel so obvious you'll wonder why your OS never shipped them in the first place.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy