back to top
HomeTechOpenMythos: The Closest Thing to Claude Mythos You Can Run (And It’s...

OpenMythos: The Closest Thing to Claude Mythos You Can Run (And It’s Open Source)

- Advertisement -

Anthropic hasn’t told anyone how Claude Mythos works. No architecture paper or model card with details. Just a product that keeps surprising people and a company that stays quiet about why.

That silence has been driving the research community a little crazy. So one developer, Kye Gomez did something about it. He read every public paper he could find on recurrent transformers, looped architectures, and inference-time scaling. He studied the behavioral patterns people were reporting from Mythos. Then he built what he thinks is inside it, published the code under MIT, and made it pip installable.

It’s called OpenMythos. It is not Claude Mythos. Gomez is explicit about that but the hypothesis behind it is serious, the architecture is real, and the reasoning for why Mythos might work this way is harder to dismiss than you’d expect.

What OpenMythos actually is

Most open source model releases give you weights. OpenMythos gives you a blueprint.

No pretrained weights exist yet. What Gomez published is the full architecture he believes Mythos is built on, a training script to actually build it yourself, and seven size options from 1B to 1T parameters. You pick your scale, point it at your data, and train it. The pip install takes seconds. The training takes considerably longer.

What’s sitting inside that blueprint is where things get genuinely interesting and to understand it, you need to understand one design decision that separates this from every other open model you’ve probably heard of.

The architecture theory

Every model you’ve used before like Llama, Gemma, Mistral, whatever, stacks layers. Hundreds of them, each running once, passing results to the next one down the line. More layers means smarter model, but also bigger, heavier, more expensive to run.

Gomez’s theory is that Mythos doesn’t stack. It loops. Instead of hundreds of unique layers each running once, a small set of layers runs through the same computation multiple times before the model produces any output. Same weights, repeated passes, progressively deeper reasoning without the parameter explosion that usually comes with depth.

Think of it like drafting an answer in your head. First pass you get the rough shape. Second pass you catch what you missed. Third pass you refine. By the time you speak, you’ve already worked through several versions internally. Nobody watching saw any of that, they just got the final answer.

That’s roughly what’s happening here. Each loop updates the model’s internal state, building on the previous pass. The original input gets re-injected at every loop so the model stays anchored to what you actually asked without that, it would drift. After enough passes it produces output. All the intermediate work happened silently, never becoming visible tokens.

This is why the theory fits Mythos behavior so well. Mythos consistently handles hard multi-step problems without showing its work by default. A looped architecture would do exactly that, the reasoning lives inside the loops, not in the output stream.

There’s a practical upside too. A model that reasons through looping can be dramatically more parameter-efficient than one that reasons through sheer layer depth. You get deeper thinking without paying for it in model size.

The catch is that looped models are historically painful to train and the internal state can spiral out of control across iterations. OpenMythos implements a fix from recent research that constrains the architecture so stability is guaranteed by design, not by luck. The repo even prints a stability check at runtime so you can verify it’s behaving.

You May Like: Open source AI agentic models built for real autonomous work

Why this might actually explain Mythos

To be clear, this is speculation. Educated, well-researched speculation, but nobody outside Anthropic actually knows.

That said, four things about Mythos behavior map oddly well onto this theory. Mythos handles problems it’s never seen before better than models of comparable size. Looped transformers are specifically good at this, the capability doesn’t emerge gradually, it phase-transitions in after enough training. Mythos also handles deeply compositional problems like ten-step math, long arguments, multi-layer code without explicit chain-of-thought.

More loops at inference means deeper reasoning chains, which is exactly the mechanism a looped model would use. The reasoning also happens silently, in continuous space, which matches how Mythos behaves when it’s not in extended thinking mode. And the parameter efficiency story fits a model that reasons through looping needs far fewer parameters to achieve the same depth as a stacked architecture.

None of this proves anything. It’s a theory that fits the observed behavior. Which is exactly what makes OpenMythos interesting to follow.

You May Like: Small But Powerful AI Models You Can Run Locally on Your System

What you can run today

Seven model scales ship with the repo, 1B through 1T each preconfigured so you’re not tuning architecture by hand. The 1B and 3B variants are realistic on consumer hardware. Anything above 50B needs a proper cluster.

The training script for the 3B on FineWeb-Edu is included and works single GPU or multi-GPU out of the box via torchrun. The tokenizer uses OpenAI’s gpt-oss-20b. Training runs in bfloat16 on modern GPUs, float16 with gradient scaling on older ones.

Attention is your choice, MLA or GQA, set in config before you initialize. MLA is closer to what DeepSeek uses and is more parameter efficient. GQA is simpler and better supported across inference engines.

There are no pretrained weights to download. You’re training from scratch. That’s where this project is today.

Is this for you?

If you research transformer architectures or study inference-time scaling, clone the repo tonight. The Parcae stability implementation alone is worth reading through.

If you build on open models and keep hitting a ceiling on complex reasoning tasks, this gives you a genuinely different architectural direction to experiment with.

And if you’re just someone who finds it fascinating that a developer sat down, read every public paper he could find, and tried to reconstruct one of the most capable closed models in existence, that’s reason enough to bookmark this one.

The weights don’t exist yet. The theory might be wrong. But the code is real, the license is clean, and the question it’s asking is one Anthropic still hasn’t answered.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Nucleus-Image AI image MOE model

Nucleus-Image: 17B Open-Source MoE Image Model Delivering GPT-Image Level Performance

0
The mixture-of-experts trick changed how people think about LLMs. Instead of running every parameter on every token, you activate a small fraction of the network per forward pass and somehow the quality stays competitive while the compute drops. It's the reason models like Mixtral punched above their weight. Everyone in the LLM space understood it immediately. Nobody had done it openly for image generation. Until now. Nucleus-Image is a 17B parameter diffusion transformer that activates roughly 2B parameters per forward pass. It beats Imagen4 on OneIG-Bench, sits at number one on DPG-Bench overall, and matches Qwen-Image on GenEval. It's also a base model. No fine-tuning, reinforcement learning or human preference tuning. What you're seeing in those benchmarks is raw pre-training performance. That's either impressive or a caveat depending on what you need it for, probably both.
ERNIE-Image Open-Source 8B Text-to-Image Model for Posters Comics and control

ERNIE-Image: Open-Source 8B Text-to-Image Model for Posters, Comics & Structured Generation

0
Text rendering in open source AI image generation has been broken for a long time. Ask most models to put readable words on a poster, lay out a comic panel, or generate anything where the text actually has to make sense and only few models can do it accurately and from rest you get something that looks like it was written by someone who learned the alphabet from a fever dream. ERNIE-Image is Baidu's answer to that specific problem. It's an 8B open weight text-to-image model built on a Diffusion Transformer and it's genuinely good at dense text, structured layouts, posters, infographics and multi-panel compositions. It can run on a 24GB consumer GPU, it's on Hugging Face right now, and it comes in two versions, a full quality model and a turbo variant that gets there in 8 steps instead of 50.
MOSS-TTS-Nano Real-Time Voice AI on CPU

MOSS-TTS-Nano: Real-Time Voice AI on CPU, Part of an Open-Source Stack Rivaling Gemini

0
Most text-to-speech tools fall into two camps. The ones that sound good need serious hardware. The ones that run on anything sound robotic. MOSS-TTS-Nano is trying to be neither. It's a 100 million parameter model that runs on a regular CPU and it actually sounds good. Good enough that the team behind it built an entire family of speech models around the same core technology, one of which has gone head to head with Gemini 2.5 Pro and ElevenLabs and come out ahead on speaker similarity. It just dropped on April 10th and it's the newest addition to the MOSS-TTS family, a collection of five open source speech models from MOSI.AI and the OpenMOSS team. The family doesn't just cover lightweight local deployment. One of its models MOSS-TTSD outperforms Gemini 2.5 Pro and ElevenLabs on speaker similarity in benchmarks. Another generates voices purely from text descriptions with no reference audio needed. And one is built specifically for real-time voice agents with a 180ms first-byte latency. Nano is the entry point. The family is the story.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy