back to top
HomeTechCohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

- Advertisement -

Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems. Data analysis across spreadsheets. Multi-session memory that has to hold up in production. Command A+ is what came out of that, a model shaped by a year of watching enterprise workflows break and figuring out why.

The result is a 218B mixture-of-experts model with 25B active parameters at inference time, available today on Hugging Face under Apache 2.0. It replaces five separate models in the Command A family, each of which handled one thing. This one handles all of them, and on most of the tasks those specialist models were built for, it wins.

Five models became one

The Command A family going into this release was fragmented. Command A for general use, Reasoning for complex problem solving, Vision for multimodal, Translate for multilingual and tool use comes in separately. Five models with five sets of infrastructure to manage.

Command A+ consolidates all of it. One model, 48 language support up from 23, multimodal reasoning included, tool use built in, reasoning mode available. For an enterprise team managing private deployments that matters. Fewer models means fewer hardware configurations, fewer versioning headaches.

The consolidation only works if the unified model actually matches the specialists. On the agentic tasks that matter most for North, it doesn’t just match them. Agentic QA accuracy improved 20% over Command A Reasoning. Spreadsheet analysis quality improved 32%. Memory performance, testing whether the model can use context from a previous session to answer questions in a new one, jumped from 39% to 54%. They’re meaningful gains over the specialist it replaced.

The efficiency numbers

218B total parameters sounds like a cluster problem. It isn’t, and that distinction is the whole point of the MoE architecture here.

In a dense model every parameter fires for every token. Command A+ activates 25B parameters at inference time and leaves the rest idle. The practical result is that it runs on two NVIDIA H100s at W4A4 quantization, or a single Blackwell GPU, with what Cohere describes as imperceptible quality difference versus the full precision version. For teams trying to deploy privately, on their own hardware, without routing sensitive data through an external API, that minimum spec changes the conversation.

Speed is also meaningfully better than its predecessor. Against Command A Reasoning at the same quantization and concurrency levels, Command A+ delivers up to 63% higher output tokens per second and cuts time to first token by up to 17%. The W4A4 quantization adds another 47% speed increase on top of that. Cohere also used speculative decoding optimized specifically for the MoE architecture, adding a further 1.5 to 1.6x inference speedup.

There’s also a new tokenizer. Command A+ is the first Cohere model to use it, and the compression gains matter especially for non-European languages, Arabic tokenization improved 20%, Korean 16%, Japanese 18%. Fewer tokens per response means lower inference cost per query, which compounds quickly at enterprise scale.

You May Like: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters.

Where it’s genuinely strong

The benchmark Cohere is most confident about is the one that’s hardest to fake: τ²-Bench Telecom, which tests multi-step agentic task completion in realistic enterprise scenarios. Command A Reasoning scored 37% on it. Command A+ scores 85%. That’s not a incremental gain, that’s a different category of capability on the task the model was explicitly built for.

Terminal-Bench Hard went from 3% to 25%. That’s still not a number that makes Command A+ a coding specialist, but it reflects what happens when a model designed around real workflow completion gets properly trained on the full agentic loop rather than just code generation in isolation.

Multimodal reasoning is new to this model and the numbers are solid. MMMU Pro at 63%, MathVista at 80.6% up from 73.5% with Command A Vision, CharXiv reasoning at 52.7% up from 46.9%. Document understanding across charts, tables, and mixed-format files is where enterprise multimodal use actually lives, and these benchmarks test exactly that.

The multilingual part is also genuinely expanded. 48 languages versus 23 in the previous generation, with reasoning capability extending to Arabic, Japanese, and Korean in a way the earlier models didn’t support. Cohere tested this with an internal Arabic, Japanese, and Korean translation of AIME 2025, a mathematics benchmark, to verify that reasoning quality holds across languages, not just translation fluency. That’s a meaningful distinction for global enterprise deployments.

On the Artificial Analysis Intelligence Index, Command A+ scores 37, which Cohere says outperforms other leading open models. That index is a composite of general capability across tasks, and the score reflects a model that’s genuinely strong across multiple dimensions rather than optimized narrowly for one benchmark category.

What it doesn’t do well

General chat quality is not a priority here. If you’re evaluating this as a conversational assistant or a writing tool, the benchmarks will disappoint. That’s not a flaw in the model, it’s a design choice, but it’s worth being clear about before someone deploys it expecting a well-rounded assistant and gets a very capable but narrowly focused one instead.

The model also requires vLLM or Transformers for inference. That’s standard for open weights models at this scale, but enterprise teams running custom inference stacks should verify compatibility before assuming it drops into existing infrastructure cleanly.

Hardware is the other honest constraint. Two H100s is the minimum, and minimum specs in practice often mean acceptable performance rather than good performance. Teams expecting to run demanding agentic workflows at scale will likely need more than the floor. A single Blackwell GPU works too, but Blackwell hardware is still not cheap or widely available outside major cloud providers.

The agentic coding number, 25% on Terminal-Bench Hard, is better than its predecessor but still limited in absolute terms. For teams where coding is the primary use case, there are open models better suited to that specific task.

Who is this for

The Apache 2.0 license and the two H100 minimum spec are doing a lot of work here, and they’re pointing at the same customer.

Enterprise teams who need to keep data on their own infrastructure. Companies in regulated industries where sending queries to an external API isn’t an option. Organizations that have been told sovereign AI matters but haven’t had an open model with this capability profile available to actually deploy.

Command A+ is not trying to be the best general purpose chatbot. The useful part is agentic task completion, private deployment, multilingual reasoning, and multimodal document understanding, packaged into a single model that a team with two H100s can actually run.

For developers who want to try it before committing to infrastructure, the weights are on Hugging Face in BF16, FP8, and W4A4 quantizations. Cohere also has a free Space to test it and a managed inference option through Model Vault for teams that want enterprise-grade deployment without managing the hardware themselves.

The open source release also means the community gets visibility into how the model is built, something Cohere has been less forthcoming about in previous generations. Whether that translates into meaningful community contributions or just more informed evaluation remains to be seen.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
AI Was Used to Recreate the Voices of Dead Pilots. The NTSB Responded by Locking Down Its Database

AI Was Used to Recreate the Voices of Dead Pilots. The NTSB Responded by...

0
Last year, a UPS cargo plane went down in Louisville, Kentucky. The crew didn't survive. The NTSB opened an investigation, as it does with every major crash, and added the case files to its public docket system, as it also does. Transcripts, data, findings, all of it accessible to anyone who wanted to look. What nobody thought about was the spectrogram. A spectrogram is a visual representation of sound. It takes audio signals, breaks them down into frequencies, and renders them as an image. The NTSB included one in the Flight 2976 docket because federal law prohibits it from releasing actual cockpit voice recordings. The spectrogram felt like a reasonable middle ground, you could see that audio existed without being able to hear it. Then Scott Manley, a YouTuber with a background in physics, pointed out on X that spectrograms encode enough data to work backwards from. The image wasn't just a picture of sound. It contained the sound. People ran with it. Using AI tools, they took the spectrogram and the publicly available transcript and reconstructed approximations of what the cockpit voice recorder actually captured. The voices of two pilots who died in that crash started circulating online. The NTSB shut its entire public docket system down.
Meta Quietly Built a Reddit Competitor Around Facebook Groups

Meta Quietly Built a Reddit Competitor Around Facebook Groups

0
Meta launched a new standalone app called Forum this week, and the easiest way to describe it is: Facebook Groups trying to become Reddit. The app revolves around discussions instead of algorithmic feeds. Users can post with nicknames, follow conversations across communities, and use an AI-powered “Ask” feature that pulls answers from discussions happening in different groups. Meta says the goal is helping people see “what real people are saying, not just what’s trending.” A few years ago, this probably would have looked like another random Meta side project destined for the company’s graveyard of abandoned apps. Right now though, the timing feels more interesting. Social platforms are running into a weird problem in the AI era. Feeds are getting flooded with synthetic content, engagement bait, AI generated replies, and recommendation systems that increasingly feel detached from actual human conversation. At the same time, places built around real discussions, Reddit, Discord communities, niche forums, even group chats, suddenly feel more valuable again. And now Meta, the company that spent years optimizing social media around scale and algorithmic feeds, is building a product around smaller communities and conversation quality instead.
ByteDance Just Released a 3B Model That Handles Images, Video, Editing, and Reasoning Together

ByteDance Open-Sourced a 3B Model for Images, Video, Editing, and Reasoning

0
Most multimodal AI systems today are still collections of separate tools pretending to be one product. One model generates images. Another edits them. A different one handles video. The entire stack works, but it often feels stitched together behind the scenes. ByteDance just used a different approach. The company just released Lance, a new open multimodal model that tries to handle image generation, video generation, editing, and visual reasoning inside one native framework. The surprising part is not just the scope. It is the size. Lance runs with only 3 billion active parameters while still posting competitive numbers across image, video, and editing benchmarks. The industry has spent the last two years building specialized AI systems for every separate media task imaginable. Lance is part of a growing push in the opposite direction: fewer models, more unified behavior, and systems that can move between understanding and generation.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy