back to top
HomeTechAI ModelsHow Sarvam AI Outscored Gemini in India's Toughest Document Test

How Sarvam AI Outscored Gemini in India’s Toughest Document Test

How an Indian AI model Outscored Google's Gemini at reading documents despite being 600x smaller

- Advertisement -

Google’s spending billions training AI models. OpenAI’s hiring armies of engineers. And somehow, a startup in Bengaluru just outperformed both of them.

Sarvam AI’s new Vision model scored 84.3% on olmOCR-Bench, a brutal test that makes AI models read messy scanned documents, handwritten notes, and complex tables. Google Gemini 3 Pro got 80.2%. ChatGPT? A distant 69.8%.

If you’re thinking “okay, cool benchmark, but who cares?”—fair. Here’s why this matters: billions of documents across India are locked away in regional languages. Government records in Gujarati. Medical files in Tamil. Historical archives in Bengali. The big AI models can read these languages, but they mess up constantly—wrong characters, mangled words, useless output.

Sarvam doesn’t. It’s specifically trained on Indian scripts, and the results show. For the first time, Indian companies have an AI tool that can reliably digitize documents in 22 languages without sending everything to Google or OpenAI’s servers.

But there’s a catch & it’s a big one.

What Sarvam Vision Actually Did

On February 5, Sarvam AI co-founder Pratyush Kumar dropped benchmarks that turned heads across the AI world. The company’s Vision model didn’t just compete with the big players, it beats them.

Here’s how the scores broke down on olmOCR-Bench, a test designed to measure how well AI can extract text from real-world documents:

AI ModelAccuracy Score
Sarvam Vision84.3%
Chandra82.0%
Mistral OCR 381.7%
Google Gemini 3 Pro80.2%
PaddleOCR VL 1.579.3%
DeepSeek OCR v278.8%
Gemini 3 Flash77.5%
GPT 5.269.8%

Sarvam just edge out Gemini & ChatGPT in these benchmarks.

But what does olmOCR-Bench actually test? Think of it as the nightmare scenario for document AI: scanned PDFs with smudged text, complex tables with merged cells, old typewritten forms, handwritten notes, and mixed-language content.

The test runs pass-fail unit tests that are deterministic and machine-verifiable. Either the AI extracts the right text, or it doesn’t. No partial credit.

Sarvam Vision also crushed another benchmark called OmniDocBench V1.5, scoring 93.28% overall. It particularly excelled at the stuff that breaks most AI models: technical tables, mathematical formulas, and documents with complex layouts.

The Catch: Sarvam Isn’t Trying to Replace ChatGPT

Here’s where the story gets interesting.

Sarvam Vision has 3 billion parameters. That’s the number of individual adjustable values the AI uses to make decisions. Google Gemini 3 Pro? Rumored to have nearly 2 trillion parameters.

In AI, more parameters generally mean more capability. A bigger model can handle more tasks, understand more context, and solve more complex problems. So how did a 3-billion-parameter model beat a 2-trillion-parameter giant?

Because Sarvam Vision isn’t trying to do everything.

Gemini can generate a mock JEE test paper, explain quantum physics, write Python code, analyze X-ray images, and chat about philosophy. Sarvam Vision can’t do any of that.

What it can do is read Indian documents better than anything else

Also Read: ‘Don’t Shut Me Down’: As Claude 4.6 Launches, a Viral ‘Blackmail’ Safety Test Resurfaces

The Indian Advantage: Why Sarvam Won?

Sarvam Vision was built specifically for Indic scripts, the writing systems used across India’s 22 official languages. Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, and 14 others.

These scripts are fundamentally different from English. Different character sets, different writing directions, different rules for how letters combine. And most importantly, they’re written differently in the real world.

Sarvam created what they call the Sarvam Indic OCR Bench, a dataset of 20,267 document samples spanning all 22 Indian languages, from historical manuscripts dating back to the 1800s to modern government forms.

Sarvam’s “Sovereign Stack”

Sarvam Vision isn’t alone. It’s part of what the company calls a “Sovereign Stack” for India AI tools built specifically for Indian use cases.

Alongside Vision, Sarvam just released Bulbul V3, a text-to-speech model that generates natural-sounding Indian voices. In blind listening tests across 11 languages, Bulbul beat ElevenLabs (the global leader in AI voice) on telephony-grade audio and matched it on high-quality output.

The company is also working with state governments. Partnerships with Odisha and Tamil Nadu are already signed. The goal: use Sarvam’s AI stack to digitize public services, automate document processing, and make government systems accessible in regional languages.

This is the part that gets called “AI independence” in headlines, but let’s be clear about what it actually means. It’s not about nationalism or shutting out foreign tech. It’s about having AI tools that actually work for Indian contexts—because right now, the global models don’t.

Also Read: How GLM-5 Became the Most Talked-About “Nvidia-Free” AI Model This Week

The Takeaway

Sarvam’s win proves something important: you don’t need to build the biggest model to build the best model.

The AI race has been dominated by a simple equation: more data + more GPUs + more parameters = better AI. Companies pour billions into training ever-larger models, assuming scale solves everything.

Sarvam took a different path. Build smaller. Train smarter. Focus on what matters.

The result? A 3-billion-parameter model that beats 2-trillion-parameter giants at a specific task. That’s not a fluke—it’s a strategy.

It also highlights what limits AI development in India. It’s not talent or capability. Indian engineers are building world-class models. The bottleneck is infrastructure. Training large models requires compute resources that aren’t available locally yet.

But Sarvam Vision and Bulbul prove something else: maybe you don’t need trillion-parameter models for most real-world problems.

Maybe the future of AI isn’t one giant model that does everything poorly, but specialized models that do specific things brilliantly.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
YOU MAY ALSO LIKE
Reka Edge is The 7B Multimodal AI Model That Beats Gemini 3 Pro on Object Detection

Reka Edge: The 7B Multimodal AI Model That Beats Gemini 3 Pro on Object...

0
Most people assume beating a Google model requires another massive frontier model. More parameters. More compute. That is just how the hierarchy usually works. Reka Edge is a 7-billion-parameter model. Yet it manages to outperform Gemini 3 Pro on object detection benchmarks, and with quantization it can even run on devices like the Samsung S25. That combination should not exist. A model small enough to fit on a phone outperforming a frontier AI system from Google on a specific but genuinely useful task is not something you expect to see in 2026. Yet here we are. This is not a model that beats Gemini at everything. It does not. But where it wins it wins convincingly.
Helios 14B AI Model That Generates Minute-Long Videos in Real Time

Helios: The 14B AI Model That Generates Minute-Long Videos in Real Time

0
Most open source video generation models make you wait. You write a prompt, hit generate, and then sit there hoping the output is what you imagined. If it is not you tweak the prompt and wait again. That loop gets old fast. Helios works differently. It generates video in real time at 19.5 frames per second on a single GPU. You can see it being created, interrupt mid generation if something looks off, tweak and continue. Up to a full minute of video without starting over every time something does not look right. With group offloading it runs on around 6GB of VRAM. Consumer GPU territory.
Open Source LLMs That Rival ChatGPT and Claude

7 Open Source LLMs That Rival ChatGPT and Claude

0
Two years ago if you wanted a genuinely capable AI model your options were basically ChatGPT, Claude, Gemini or Grok. Open source existed but the gap was real and everyone knew it. That gap is closing faster than most people expected. In some areas it is already gone. Today open source models do not just compete with closed source. Some of them beat closed source on specific benchmarks that actually matter. And the list of categories where that is true keeps getting longer. If you are curious about what open source AI actually looks like at full power or you are building something serious and evaluating your options this list is for you. One thing worth saying upfront, these are not consumer GPU friendly models. You will need serious hardware to run them at full capacity. Quantized versions exist for most of them but expect performance and quality to reflect that. I went through a lot of options to put this list together. These seven are the ones that actually made me stop and pay attention.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy