back to top
HomeTechAI ModelsSarvam's New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B...

Sarvam’s New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B Active Parameters

- Advertisement -

Most open source model releases follow a predictable pattern. A lab drops weights, publishes benchmark numbers, and the community spends the next week figuring out if any of it holds up in real use. Sarvam’s 30B and 105B are different in one specific way — both are already in production.

The 105B is powering Indus, Sarvam’s reasoning and agentic assistant. The 30B is handling live multilingual voice calls on Samvaad, their conversational agent platform. These aren’t research models waiting to be tested. They shipped first and released the weights after.

What makes them technically interesting is the architecture. Both use Mixture of Experts, which means despite the parameter counts the models are only activating a fraction of their weights on any given token. The 105B activates 10.3B parameters. The 30B activates just 2.4B. That gap between total size and active compute is where the interesting performance story lives.

Here is what they actually do.

What Are These Models Actually Built For?

Sarvam built two models for two very different jobs. The 30B is a deployment model. It was designed to run fast, stay cheap, and handle real-time interactions without breaking a sweat. If you need an AI that can take a phone call in Hindi, understand a tool request mid-conversation, and respond before the user notices a delay, that’s what the 30B was built for.

The 105B is a reasoning model. It was built for the tasks where you need the AI to think, plan multiple steps ahead, use web search, write code, and execute complex workflows. It powers Indus, Sarvam’s AI assistant for complex queries.

Think of it this way. The 30B is what you deploy. The 105B is what you use when the problem is hard.

What Sarvam 105B Can Do?

Sarvam105B

The 105B is built for tasks that require actual thinking. Complex math problems, multi-step reasoning, coding challenges, and agentic workflows where the model needs to plan and execute across several turns. On those fronts it holds up well against models significantly larger than itself.

Where it genuinely stands out is web search and agentic tasks. On BrowseComp, a benchmark that tests how well a model finds real answers through live web search, it scored 49.5 against GLM-4.5-Air’s 21.3.

On Beyond AIME, which tests deep mathematical reasoning, the 105B scored 69.1 against GPT-OSS-120B’s 51.0. On τ² Bench, which measures long horizon agentic task completion, it scored 68.3 against GPT-OSS-120B’s 65.8. A 105B model outperforming a 120B one on the benchmarks that actually matter for real work is worth paying attention to.

That said GPT-OSS-120B still leads on LiveCodeBench, GPQA Diamond, and Arena Hard v2. Both are strong, just in different areas.

Limitations

The honest limitation is writing and instruction following. If your primary use case is creative writing or highly structured outputs, stronger options exist in this class.

But for reasoning, tool use, and long horizon tasks it punches well above what a 105B model should realistically deliver.

What Sarvam 30B Can Do?

sarvam30b

The 30B is built for real world deployment where speed and efficiency matter. It handles live multilingual voice calls, executes tool calls mid-conversation, and does all of this on resource constrained hardware without stuttering. On Samvaad (Sarvam’s conversational agent platform) it is already managing real phone conversations in Hindi and Tamil. The 2.4B active parameter design is not a compromise, it is the whole point.

Where it genuinely stands out is how it competes against much larger models on coding and math. It scores 97.0 on Math500 and 70.0 on LiveCodeBench, outperforming several models with significantly more active compute. For a deployment focused model those numbers are unexpected.

Limitations

SWE-Bench Verified at 34.0 is where the 30B shows its ceiling. Complex real world software engineering tasks remain challenging. If you are building something that requires deep code understanding across large repositories, the 30B will struggle. The 105B handles that better, and even then stronger options exist for pure coding workloads.

But for conversational deployment, voice, multilingual tool use, and real time applications it is genuinely hard to find an open source alternative at this size that performs as consistently.

How to run them locally?

Not on Ollama. If that’s where you were heading, there’s nothing there yet except Sarvam-1, which is their older model and not what we’re talking about.

The official options are HuggingFace with Transformers, SGLang, and vLLM. The vLLM path is the messiest of the three right now. Native support isn’t merged yet so you’re either building from source or running a hotpatch script. It works, but it’s not a five minute setup.

SGLang is the cleanest path at the moment. HuggingFace Transformers works too if you just want to get something running quickly.

Both models are available for download on HuggingFace and AI Kosh. The HuggingFace model pages have the most up to date setup instructions since they get updated as support improves.

If you don’t have the hardware or just don’t want to deal with the setup, Sarvam has an official API for both models. It’s OpenAI-compatible & Worth checking out their official API documentation if that’s the easier path for you.

Also Read: How GLM-5 Became the Most Talked-About “Nvidia-Free” AI Model This Week

So where does this leave us?

Two production-ready open source models with Apache 2.0 licenses that you can download today. One handles real-time voice calls in Hindi and Tamil on constrained hardware. The other matches frontier closed models on agentic benchmarks. Both came out of the same lab, trained entirely in India on Indian compute.

Whether that impresses you or not probably depends on what you expected open source to look like in 2026. For me it’s getting harder to argue that you need a paid API for most workloads.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
YOU MAY ALSO LIKE
Andrej Karpathy autoresearch AI agent running experiments overnight on a single GPU

Shopify’s CEO Let Karpathy’s AI Agent Run Overnight and Woke Up to a 19%...

0
On Sunday, Shopify CEO Tobi Lütke did something most machine learning engineers spend months trying to achieve. He improved a core model's performance by 19% while he was asleep & didn't use a massive compute cluster or a team of researchers. He used a 630-line weekend project released by Andrej Karpathy called autoresearch. By the time he woke up, the agent had run 37 experiments, tested dozens of hyperparameter combinations, and handed him a 0.8B model that outperformed the 1.6B model it was meant to replace. Karpathy's response when he heard? "Who knew early singularity could be this fun." That's the story everyone is sharing. But the more interesting story is what autoresearch actually is, how it works, and what it quietly says about where AI research is heading.
Open Source AI Image Editing Models That Challenge Google Nano Banana

6 Open Source AI Image Editing Models That Challenge Google’s Nano Banana

0
When people talk about AI image editing, the same names come up. Nano Banana, GPT Image or Maybe one or two others. And they're good, no argument there. But they all have something in common. You're on their servers, their terms & some generates watermark along with your image. What if I told you the open source community has been building real alternatives? Models you can actually run on your own hardware with no watermarks & no usage limits. Some of them are hitting benchmark scores that are really impressive. A few of them you can even build on top of, fine tune, or deploy in your own products. I went through what's out there and narrowed it down to six that are genuinely worth your time
GPT-5.4 Is Outperforming Humans at Work. But the Real Story Is What OpenAI Isn't Telling You

GPT-5.4 Is Outperforming Humans at Work. But the Real Story Is What OpenAI Isn’t...

0
OpenAI dropped their latest model yesterday and buried inside the benchmarks is a number that deserves more attention than it's getting. On GDPval, a test that puts AI agents through real professional tasks across 44 actual occupations, GPT-5.4 matched or outperformed human professionals 83% of the time. The previous version sat at 71%. That's not a small jump. And this isn't GPT writing emails or summarizing documents anymore. This version can move a mouse, click buttons, fill out forms, and work across applications the way a person sitting at a desk would. It scored 75% on OSWorld, a benchmark that tests exactly that. The average office worker scores 72.4%. The model is already better at operating a computer than most people who use one for a living & 83% is just the beginning of what this release actually means.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy