back to top
HomeTechAI ModelsSarvam's New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B...

Sarvam’s New Open Source Models Match GPT-OSS-120B and One Only Uses 2.4B Active Parameters

- Advertisement -

Most open source model releases follow a predictable pattern. A lab drops weights, publishes benchmark numbers, and the community spends the next week figuring out if any of it holds up in real use. Sarvam’s 30B and 105B are different in one specific way — both are already in production.

The 105B is powering Indus, Sarvam’s reasoning and agentic assistant. The 30B is handling live multilingual voice calls on Samvaad, their conversational agent platform. These aren’t research models waiting to be tested. They shipped first and released the weights after.

What makes them technically interesting is the architecture. Both use Mixture of Experts, which means despite the parameter counts the models are only activating a fraction of their weights on any given token. The 105B activates 10.3B parameters. The 30B activates just 2.4B. That gap between total size and active compute is where the interesting performance story lives.

Here is what they actually do.

What Are These Models Actually Built For?

Sarvam built two models for two very different jobs. The 30B is a deployment model. It was designed to run fast, stay cheap, and handle real-time interactions without breaking a sweat. If you need an AI that can take a phone call in Hindi, understand a tool request mid-conversation, and respond before the user notices a delay, that’s what the 30B was built for.

The 105B is a reasoning model. It was built for the tasks where you need the AI to think, plan multiple steps ahead, use web search, write code, and execute complex workflows. It powers Indus, Sarvam’s AI assistant for complex queries.

Think of it this way. The 30B is what you deploy. The 105B is what you use when the problem is hard.

What Sarvam 105B Can Do?

Sarvam105B

The 105B is built for tasks that require actual thinking. Complex math problems, multi-step reasoning, coding challenges, and agentic workflows where the model needs to plan and execute across several turns. On those fronts it holds up well against models significantly larger than itself.

Where it genuinely stands out is web search and agentic tasks. On BrowseComp, a benchmark that tests how well a model finds real answers through live web search, it scored 49.5 against GLM-4.5-Air’s 21.3.

On Beyond AIME, which tests deep mathematical reasoning, the 105B scored 69.1 against GPT-OSS-120B’s 51.0. On τ² Bench, which measures long horizon agentic task completion, it scored 68.3 against GPT-OSS-120B’s 65.8. A 105B model outperforming a 120B one on the benchmarks that actually matter for real work is worth paying attention to.

That said GPT-OSS-120B still leads on LiveCodeBench, GPQA Diamond, and Arena Hard v2. Both are strong, just in different areas.

Limitations

The honest limitation is writing and instruction following. If your primary use case is creative writing or highly structured outputs, stronger options exist in this class.

But for reasoning, tool use, and long horizon tasks it punches well above what a 105B model should realistically deliver.

What Sarvam 30B Can Do?

sarvam30b

The 30B is built for real world deployment where speed and efficiency matter. It handles live multilingual voice calls, executes tool calls mid-conversation, and does all of this on resource constrained hardware without stuttering. On Samvaad (Sarvam’s conversational agent platform) it is already managing real phone conversations in Hindi and Tamil. The 2.4B active parameter design is not a compromise, it is the whole point.

Where it genuinely stands out is how it competes against much larger models on coding and math. It scores 97.0 on Math500 and 70.0 on LiveCodeBench, outperforming several models with significantly more active compute. For a deployment focused model those numbers are unexpected.

Limitations

SWE-Bench Verified at 34.0 is where the 30B shows its ceiling. Complex real world software engineering tasks remain challenging. If you are building something that requires deep code understanding across large repositories, the 30B will struggle. The 105B handles that better, and even then stronger options exist for pure coding workloads.

But for conversational deployment, voice, multilingual tool use, and real time applications it is genuinely hard to find an open source alternative at this size that performs as consistently.

How to run them locally?

Not on Ollama. If that’s where you were heading, there’s nothing there yet except Sarvam-1, which is their older model and not what we’re talking about.

The official options are HuggingFace with Transformers, SGLang, and vLLM. The vLLM path is the messiest of the three right now. Native support isn’t merged yet so you’re either building from source or running a hotpatch script. It works, but it’s not a five minute setup.

SGLang is the cleanest path at the moment. HuggingFace Transformers works too if you just want to get something running quickly.

Both models are available for download on HuggingFace and AI Kosh. The HuggingFace model pages have the most up to date setup instructions since they get updated as support improves.

If you don’t have the hardware or just don’t want to deal with the setup, Sarvam has an official API for both models. It’s OpenAI-compatible & Worth checking out their official API documentation if that’s the easier path for you.

Also Read: How GLM-5 Became the Most Talked-About “Nvidia-Free” AI Model This Week

So where does this leave us?

Two production-ready open source models with Apache 2.0 licenses that you can download today. One handles real-time voice calls in Hindi and Tamil on constrained hardware. The other matches frontier closed models on agentic benchmarks. Both came out of the same lab, trained entirely in India on Indian compute.

Whether that impresses you or not probably depends on what you expected open source to look like in 2026. For me it’s getting harder to argue that you need a paid API for most workloads.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
OpenMythos

OpenMythos: The Closest Thing to Claude Mythos You Can Run (And It’s Open Source)

0
Anthropic hasn't told anyone how Claude Mythos works. No architecture paper or model card with details. Just a product that keeps surprising people and a company that stays quiet about why. That silence has been driving the research community a little crazy. So one developer Kye Gomez did something about it. He read every public paper he could find on recurrent transformers, looped architectures, and inference-time scaling. He studied the behavioral patterns people were reporting from Mythos. Then he built what he thinks is inside it, published the code under MIT, and made it pip installable. It's called OpenMythos. It is not Claude Mythos. Gomez is explicit about that but the hypothesis behind it is serious, the architecture is real, and the reasoning for why Mythos might work this way is harder to dismiss than you'd expect.
Nucleus-Image AI image MOE model

Nucleus-Image: 17B Open-Source MoE Image Model Delivering GPT-Image Level Performance

0
The mixture-of-experts trick changed how people think about LLMs. Instead of running every parameter on every token, you activate a small fraction of the network per forward pass and somehow the quality stays competitive while the compute drops. It's the reason models like Mixtral punched above their weight. Everyone in the LLM space understood it immediately. Nobody had done it openly for image generation. Until now. Nucleus-Image is a 17B parameter diffusion transformer that activates roughly 2B parameters per forward pass. It beats Imagen4 on OneIG-Bench, sits at number one on DPG-Bench overall, and matches Qwen-Image on GenEval. It's also a base model. No fine-tuning, reinforcement learning or human preference tuning. What you're seeing in those benchmarks is raw pre-training performance. That's either impressive or a caveat depending on what you need it for, probably both.
ERNIE-Image Open-Source 8B Text-to-Image Model for Posters Comics and control

ERNIE-Image: Open-Source 8B Text-to-Image Model for Posters, Comics & Structured Generation

0
Text rendering in open source AI image generation has been broken for a long time. Ask most models to put readable words on a poster, lay out a comic panel, or generate anything where the text actually has to make sense and only few models can do it accurately and from rest you get something that looks like it was written by someone who learned the alphabet from a fever dream. ERNIE-Image is Baidu's answer to that specific problem. It's an 8B open weight text-to-image model built on a Diffusion Transformer and it's genuinely good at dense text, structured layouts, posters, infographics and multi-panel compositions. It can run on a 24GB consumer GPU, it's on Hugging Face right now, and it comes in two versions, a full quality model and a turbo variant that gets there in 8 steps instead of 50.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy