back to top
HomeTechCohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

- Advertisement -

Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems. Data analysis across spreadsheets. Multi-session memory that has to hold up in production. Command A+ is what came out of that, a model shaped by a year of watching enterprise workflows break and figuring out why.

The result is a 218B mixture-of-experts model with 25B active parameters at inference time, available today on Hugging Face under Apache 2.0. It replaces five separate models in the Command A family, each of which handled one thing. This one handles all of them, and on most of the tasks those specialist models were built for, it wins.

Five models became one

The Command A family going into this release was fragmented. Command A for general use, Reasoning for complex problem solving, Vision for multimodal, Translate for multilingual and tool use comes in separately. Five models with five sets of infrastructure to manage.

Command A+ consolidates all of it. One model, 48 language support up from 23, multimodal reasoning included, tool use built in, reasoning mode available. For an enterprise team managing private deployments that matters. Fewer models means fewer hardware configurations, fewer versioning headaches.

The consolidation only works if the unified model actually matches the specialists. On the agentic tasks that matter most for North, it doesn’t just match them. Agentic QA accuracy improved 20% over Command A Reasoning. Spreadsheet analysis quality improved 32%. Memory performance, testing whether the model can use context from a previous session to answer questions in a new one, jumped from 39% to 54%. They’re meaningful gains over the specialist it replaced.

The efficiency numbers

218B total parameters sounds like a cluster problem. It isn’t, and that distinction is the whole point of the MoE architecture here.

In a dense model every parameter fires for every token. Command A+ activates 25B parameters at inference time and leaves the rest idle. The practical result is that it runs on two NVIDIA H100s at W4A4 quantization, or a single Blackwell GPU, with what Cohere describes as imperceptible quality difference versus the full precision version. For teams trying to deploy privately, on their own hardware, without routing sensitive data through an external API, that minimum spec changes the conversation.

Speed is also meaningfully better than its predecessor. Against Command A Reasoning at the same quantization and concurrency levels, Command A+ delivers up to 63% higher output tokens per second and cuts time to first token by up to 17%. The W4A4 quantization adds another 47% speed increase on top of that. Cohere also used speculative decoding optimized specifically for the MoE architecture, adding a further 1.5 to 1.6x inference speedup.

There’s also a new tokenizer. Command A+ is the first Cohere model to use it, and the compression gains matter especially for non-European languages, Arabic tokenization improved 20%, Korean 16%, Japanese 18%. Fewer tokens per response means lower inference cost per query, which compounds quickly at enterprise scale.

You May Like: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters.

Where it’s genuinely strong

The benchmark Cohere is most confident about is the one that’s hardest to fake: τ²-Bench Telecom, which tests multi-step agentic task completion in realistic enterprise scenarios. Command A Reasoning scored 37% on it. Command A+ scores 85%. That’s not a incremental gain, that’s a different category of capability on the task the model was explicitly built for.

Terminal-Bench Hard went from 3% to 25%. That’s still not a number that makes Command A+ a coding specialist, but it reflects what happens when a model designed around real workflow completion gets properly trained on the full agentic loop rather than just code generation in isolation.

Multimodal reasoning is new to this model and the numbers are solid. MMMU Pro at 63%, MathVista at 80.6% up from 73.5% with Command A Vision, CharXiv reasoning at 52.7% up from 46.9%. Document understanding across charts, tables, and mixed-format files is where enterprise multimodal use actually lives, and these benchmarks test exactly that.

The multilingual part is also genuinely expanded. 48 languages versus 23 in the previous generation, with reasoning capability extending to Arabic, Japanese, and Korean in a way the earlier models didn’t support. Cohere tested this with an internal Arabic, Japanese, and Korean translation of AIME 2025, a mathematics benchmark, to verify that reasoning quality holds across languages, not just translation fluency. That’s a meaningful distinction for global enterprise deployments.

On the Artificial Analysis Intelligence Index, Command A+ scores 37, which Cohere says outperforms other leading open models. That index is a composite of general capability across tasks, and the score reflects a model that’s genuinely strong across multiple dimensions rather than optimized narrowly for one benchmark category.

What it doesn’t do well

General chat quality is not a priority here. If you’re evaluating this as a conversational assistant or a writing tool, the benchmarks will disappoint. That’s not a flaw in the model, it’s a design choice, but it’s worth being clear about before someone deploys it expecting a well-rounded assistant and gets a very capable but narrowly focused one instead.

The model also requires vLLM or Transformers for inference. That’s standard for open weights models at this scale, but enterprise teams running custom inference stacks should verify compatibility before assuming it drops into existing infrastructure cleanly.

Hardware is the other honest constraint. Two H100s is the minimum, and minimum specs in practice often mean acceptable performance rather than good performance. Teams expecting to run demanding agentic workflows at scale will likely need more than the floor. A single Blackwell GPU works too, but Blackwell hardware is still not cheap or widely available outside major cloud providers.

The agentic coding number, 25% on Terminal-Bench Hard, is better than its predecessor but still limited in absolute terms. For teams where coding is the primary use case, there are open models better suited to that specific task.

Who is this for

The Apache 2.0 license and the two H100 minimum spec are doing a lot of work here, and they’re pointing at the same customer.

Enterprise teams who need to keep data on their own infrastructure. Companies in regulated industries where sending queries to an external API isn’t an option. Organizations that have been told sovereign AI matters but haven’t had an open model with this capability profile available to actually deploy.

Command A+ is not trying to be the best general purpose chatbot. The useful part is agentic task completion, private deployment, multilingual reasoning, and multimodal document understanding, packaged into a single model that a team with two H100s can actually run.

For developers who want to try it before committing to infrastructure, the weights are on Hugging Face in BF16, FP8, and W4A4 quantizations. Cohere also has a free Space to test it and a managed inference option through Model Vault for teams that want enterprise-grade deployment without managing the hardware themselves.

The open source release also means the community gets visibility into how the model is built, something Cohere has been less forthcoming about in previous generations. Whether that translates into meaningful community contributions or just more informed evaluation remains to be seen.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Google Built Gemma 4 12B Without Multimodal Encoders

Google Built Gemma 4 12B Without Multimodal Encoders

0
Every multimodal model you've used has the same basic system. Text goes in one way, images go through a vision encoder first, audio goes through an audio encoder first, and then everything gets handed off to the language model in a form it can work with. The encoders are load-bearing and you don't just remove them.Google actually removed them.Gemma 4 12B takes raw image patches and raw audio waveforms and projects them directly into the same embedding space as text tokens. There is no vision encoder or audio encoder. One decoder handling everything.
MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

0
Most models quit around submission 30 because they stop finding improvement and exit on their own. That's what happened when MiniMax ran a CUDA kernel optimization task against a field of frontier models. Every model except two called it done within the first 30 submissions. M3's best result came on submission 145. After 24 hours. After multiple plateaus where the numbers stopped moving and a reasonable model would have concluded there was nothing left to find. That's the thing MiniMax released yesterday. An AI model with a 1M token context window, native multimodality, and apparently a problem with knowing when to stop.
Anthropic Files for an IPO. AI Is Entering Its Public Company Era

Anthropic Files for an IPO. AI Is Entering Its Public Company Era.

0
Anthropic has officially taken its first step toward becoming a public company. In a brief announcement on Monday, the company said it had confidentially submitted a draft S-1 registration statement to the U.S. Securities and Exchange Commission for a proposed initial public offering. The filing doesn't reveal a share price, a fundraising target, or even a timeline. For now, it simply gives Anthropic the option to go public once the SEC review process is complete. Just a few years ago, Anthropic was a small group of former OpenAI researchers trying to build an alternative vision for advanced AI. Today, it sits among the handful of companies shaping the industry's future and that's why this filing matters. It's one of the world's most influential AI labs beginning the transition from a privately funded research company to a business that may eventually answer to public shareholders. For most of the AI boom, the biggest bets were made behind closed doors. Venture firms, sovereign wealth funds, and tech giants supplied the capital while the public watched from the outside. Anthropic's filing suggests that era may be starting to change.