back to top
HomeTechOpenMythos: The Closest Thing to Claude Mythos You Can Run (And It’s...

OpenMythos: The Closest Thing to Claude Mythos You Can Run (And It’s Open Source)

- Advertisement -

Anthropic hasn’t told anyone how Claude Mythos works. No architecture paper or model card with details. Just a product that keeps surprising people and a company that stays quiet about why.

That silence has been driving the research community a little crazy. So one developer, Kye Gomez did something about it. He read every public paper he could find on recurrent transformers, looped architectures, and inference-time scaling. He studied the behavioral patterns people were reporting from Mythos. Then he built what he thinks is inside it, published the code under MIT, and made it pip installable.

It’s called OpenMythos. It is not Claude Mythos. Gomez is explicit about that but the hypothesis behind it is serious, the architecture is real, and the reasoning for why Mythos might work this way is harder to dismiss than you’d expect.

What OpenMythos actually is

Most open source model releases give you weights. OpenMythos gives you a blueprint.

No pretrained weights exist yet. What Gomez published is the full architecture he believes Mythos is built on, a training script to actually build it yourself, and seven size options from 1B to 1T parameters. You pick your scale, point it at your data, and train it. The pip install takes seconds. The training takes considerably longer.

What’s sitting inside that blueprint is where things get genuinely interesting and to understand it, you need to understand one design decision that separates this from every other open model you’ve probably heard of.

The architecture theory

Every model you’ve used before like Llama, Gemma, Mistral, whatever, stacks layers. Hundreds of them, each running once, passing results to the next one down the line. More layers means smarter model, but also bigger, heavier, more expensive to run.

Gomez’s theory is that Mythos doesn’t stack. It loops. Instead of hundreds of unique layers each running once, a small set of layers runs through the same computation multiple times before the model produces any output. Same weights, repeated passes, progressively deeper reasoning without the parameter explosion that usually comes with depth.

Think of it like drafting an answer in your head. First pass you get the rough shape. Second pass you catch what you missed. Third pass you refine. By the time you speak, you’ve already worked through several versions internally. Nobody watching saw any of that, they just got the final answer.

That’s roughly what’s happening here. Each loop updates the model’s internal state, building on the previous pass. The original input gets re-injected at every loop so the model stays anchored to what you actually asked without that, it would drift. After enough passes it produces output. All the intermediate work happened silently, never becoming visible tokens.

This is why the theory fits Mythos behavior so well. Mythos consistently handles hard multi-step problems without showing its work by default. A looped architecture would do exactly that, the reasoning lives inside the loops, not in the output stream.

There’s a practical upside too. A model that reasons through looping can be dramatically more parameter-efficient than one that reasons through sheer layer depth. You get deeper thinking without paying for it in model size.

The catch is that looped models are historically painful to train and the internal state can spiral out of control across iterations. OpenMythos implements a fix from recent research that constrains the architecture so stability is guaranteed by design, not by luck. The repo even prints a stability check at runtime so you can verify it’s behaving.

You May Like: Open source AI agentic models built for real autonomous work

Why this might actually explain Mythos

To be clear, this is speculation. Educated, well-researched speculation, but nobody outside Anthropic actually knows.

That said, four things about Mythos behavior map oddly well onto this theory. Mythos handles problems it’s never seen before better than models of comparable size. Looped transformers are specifically good at this, the capability doesn’t emerge gradually, it phase-transitions in after enough training. Mythos also handles deeply compositional problems like ten-step math, long arguments, multi-layer code without explicit chain-of-thought.

More loops at inference means deeper reasoning chains, which is exactly the mechanism a looped model would use. The reasoning also happens silently, in continuous space, which matches how Mythos behaves when it’s not in extended thinking mode. And the parameter efficiency story fits a model that reasons through looping needs far fewer parameters to achieve the same depth as a stacked architecture.

None of this proves anything. It’s a theory that fits the observed behavior. Which is exactly what makes OpenMythos interesting to follow.

You May Like: Small But Powerful AI Models You Can Run Locally on Your System

What you can run today

Seven model scales ship with the repo, 1B through 1T each preconfigured so you’re not tuning architecture by hand. The 1B and 3B variants are realistic on consumer hardware. Anything above 50B needs a proper cluster.

The training script for the 3B on FineWeb-Edu is included and works single GPU or multi-GPU out of the box via torchrun. The tokenizer uses OpenAI’s gpt-oss-20b. Training runs in bfloat16 on modern GPUs, float16 with gradient scaling on older ones.

Attention is your choice, MLA or GQA, set in config before you initialize. MLA is closer to what DeepSeek uses and is more parameter efficient. GQA is simpler and better supported across inference engines.

There are no pretrained weights to download. You’re training from scratch. That’s where this project is today.

Is this for you?

If you research transformer architectures or study inference-time scaling, clone the repo tonight. The Parcae stability implementation alone is worth reading through.

If you build on open models and keep hitting a ceiling on complex reasoning tasks, this gives you a genuinely different architectural direction to experiment with.

And if you’re just someone who finds it fascinating that a developer sat down, read every public paper he could find, and tried to reconstruct one of the most capable closed models in existence, that’s reason enough to bookmark this one.

The weights don’t exist yet. The theory might be wrong. But the code is real, the license is clean, and the question it’s asking is one Anthropic still hasn’t answered.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
ideogram 4.0 ai model

Ideogram 4 Topped the Open-Weight Leaderboard. Then We Read the License.

0
Ideogram was founded by former Google Brain researchers who worked on Imagen, Google's own text-to-image system. When that team releases an open-weight model, you pay attention. Ideogram 4 tops the open-weight design leaderboard by a margin that isn't close. Professional designers picked it first in blind typography tests nearly half the time. At 9.3B parameters it beats open models three times its size on text rendering. Then we read the license.
Google Built Gemma 4 12B Without Multimodal Encoders

Google Built Gemma 4 12B Without Multimodal Encoders

0
Every multimodal model you've used has the same basic system. Text goes in one way, images go through a vision encoder first, audio goes through an audio encoder first, and then everything gets handed off to the language model in a form it can work with. The encoders are load-bearing and you don't just remove them.Google actually removed them.Gemma 4 12B takes raw image patches and raw audio waveforms and projects them directly into the same embedding space as text tokens. There is no vision encoder or audio encoder. One decoder handling everything.
MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

MiniMax M3 Shows What Happens When AI Stops Thinking in Turns

0
Most models quit around submission 30 because they stop finding improvement and exit on their own. That's what happened when MiniMax ran a CUDA kernel optimization task against a field of frontier models. Every model except two called it done within the first 30 submissions. M3's best result came on submission 145. After 24 hours. After multiple plateaus where the numbers stopped moving and a reasonable model would have concluded there was nothing left to find. That's the thing MiniMax released yesterday. An AI model with a 1M token context window, native multimodality, and apparently a problem with knowing when to stop.