back to top
HomeTechNucleus-Image: 17B Open-Source MoE Image Model Delivering GPT-Image Level Performance

Nucleus-Image: 17B Open-Source MoE Image Model Delivering GPT-Image Level Performance

- Advertisement -

The mixture-of-experts trick changed how people think about LLMs. Instead of running every parameter on every token, you activate a small fraction of the network per forward pass and somehow the quality stays competitive while the compute drops. It’s the reason models like Mixtral punched above their weight. Everyone in the LLM space understood it immediately. Nobody had done it openly for image generation. Until now.

Nucleus-Image is a 17B parameter diffusion transformer that activates roughly 2B parameters per forward pass. It beats Imagen4 on OneIG-Bench, sits at number one on DPG-Bench overall, and matches Qwen-Image on GenEval.

It’s also a base model. No fine-tuning, reinforcement learning or human preference tuning. What you’re seeing in those benchmarks is raw pre-training performance. That’s either impressive or a caveat depending on what you need it for, probably both.

17B Parameters, 2B Doing the Work

Nucleus-Image AI image generations
Via: huggingface/Nucleus-Image

If you’ve used any of the recent MoE language models you already understand the basic idea. Instead of running every part of the network on every input, a router decides which experts (specialized sub-networks) are most relevant for this particular input and activates only those. The rest sit idle. You get the capacity of a large model at the compute cost of a much smaller one.

Nucleus-Image brings that same logic to image generation. Each of its 32 transformer layers except the first three which stay dense for training stability, replaces the standard feed-forward network with 64 routed experts plus one shared expert. For any given forward pass only a small fraction of those experts activate, keeping the active parameter count around 2B despite the total sitting at 17B.

What makes the routing design interesting is how it handles timesteps. Diffusion models denoise images across many steps and the network sees very different inputs at each stage. Most routing approaches let the timestep embedding influence which experts get selected which sounds reasonable but actually causes experts to specialize by timestep rather than by content or spatial region. Nucleus-Image separates those two things. The router sees the timestep to make its selection decision but the experts themselves receive the fully modulated representation. The result is experts that specialize in actual image semantics.

There’s also a practical inference benefit built directly into diffusers. Text tokens never enter the transformer backbone, they only contribute as key-value pairs in the attention layers. Those KV projections get cached across all denoising steps automatically when you enable TextKVCacheConfig. One flag, no changes to your inference loop, free speedup.

What No Post-Training Actually Means

Everything you see in the benchmarks like number one on DPG-Bench, beating Imagen4 on OneIG, matching Qwen-Image on GenEval is pre-training performance only. No DPO, RL or human preference tuning was applied. The team is explicit about this and it’s not a small detail.

Post-training is what takes a capable base model and makes it feel production ready. It’s what smooths out the weird outputs, aligns the model to what humans actually find appealing, and improves consistency across different types of prompts. Models like Seedream 4.5 and Nano Banana 2.0 have gone through that process. Nucleus-Image hasn’t. That means two things depending on who you are.

If you’re a researcher or someone who wants to fine-tune a strong foundation on your own data or aesthetic preferences, this is genuinely exciting. You’re starting from a base that already competes with post-trained models before any preference optimization. The headroom from here is real.

If you want something you can point at a prompt and get a consistently great result right now you might find the outputs less predictable than models that have been through full post-training. It’s not that it produces bad images. It’s that a polished fine-tuned model will feel more reliable in everyday use.

The training code on GitHub is listed as coming soon, so for now the weights on Hugging Face are your entry point. Apache 2.0 license, ready to use and build on commercially.

Related: ERNIE-Image: Open-Source 8B Text-to-Image Model for Posters, Comics & Structured Generation

What the Benchmarks Show

Three benchmarks, three different stories and Nucleus-Image holds up across all of them which is harder than it sounds.

On DPG-Bench it sits at number one overall with 88.79, leading Qwen-Image at 88.32 and Seedream 3.0 at 88.27. Leading four of six categories including entity, attribute, and overall while activating 2B parameters against models running 20B is the part worth stopping on. The weakest category is Global at 85.10, sitting 9.21 behind the leader there, so it’s not a clean sweep. But the overall result is impressive.

GenEval tells a spatial reasoning story. Nucleus-Image scores 0.87 overall, tied for first with Qwen-Image and CogView 4. The standout numbers are position accuracy at 0.85 and two-object handling at 0.95 among the strongest spatial reasoning results in the current field. Qwen-Image achieves that same 0.87 with 20B active parameters. CogView 4 gets there with 6B. Nucleus-Image does it with 2B. That efficiency gap is the actual headline.

The efficiency story gets sharper when you look at score per active billion parameters. Nucleus-Image scores 0.380, four times above the median across all models in the comparison. Qwen-Image, despite matching it on GenEval overall, scores 0.038 per billion. FLUX.1 Dev scores 0.053. The MoE architecture isn’t just a technical curiosity here, it’s producing a measurably different performance-to-compute ratio than anything else in the open source field right now.

On OneIG-Bench it scores 0.522, beating Imagen4 at 0.515 and Recraft V3 at 0.502 with strong style scores at 0.430.

As always these are self-reported numbers. Take them as directional rather than definitive. But the consistency across three different benchmarks and the efficiency angle make them harder to dismiss than a single cherry-picked result.

ModelDPG-BenchGenEvalOneIG-BenchActive Params
Nucleus-Image88.79 (#1)0.87 (#1)0.5222B / 17B total
Qwen-Image88.320.8720B
Seedream 3.088.270.84Undisclosed
CogView 487.290.876B
GPT Image 1 High85.150.84Undisclosed
HiDream-1-Full85.890.8313.2B
Imagen40.515

How to Run It

Diffusers is the only path right now. Install the latest version from GitHub, load NucleusAI/Nucleus-Image, and you’re generating in a few lines of Python.

The one thing worth enabling immediately is Text KV caching. It’s built into the diffusers pipeline natively, just pass TextKVCacheConfig and call enable_cache on the transformer before your first inference. No changes to your generation loop, automatic speedup across all denoising steps.

Recommended starting point is 1024×1024 at 50 inference steps with a guidance scale of 4.0. Seven aspect ratios are supported out of the box from 1:1 through 16:9 and 9:16 so you’re not locked into square outputs.

Training code is listed as coming soon on GitHub. For now the weights are your entry point and they’re enough to start building. The dataset release is planned as part of the full open source package so its worth watching the repo if that matters for your use case.

Who It’s Actually For

Nucleus-Image isn’t trying to be the most polished tool for casual image generation right now. If you want something you can use immediately with consistent pretty results, a post-trained model like Seedream 4.5 or Nano Banana Pro will feel more reliable today.

Where Nucleus-Image gets interesting is everything that comes after the base model. Researchers who want to study MoE architectures in diffusion models now have a fully open implementation to work with like weights, and soon training code and data. Fine-tuners who want a strong starting point before applying their own preference optimization are starting from a base that already competes with post-trained models on three benchmarks.

It’s also the first fully open MoE diffusion model at this quality tier. That has value independent of whether it’s the best image generator you’ve ever used. Someone has to be first and they shipped Apache 2.0 with everything included.

Worth You Attention

Base models don’t usually generate this much excitement because the gap between a capable base and a production ready product is real and most people feel it immediately. Nucleus-Image is different because that gap is unusually small here.

Number one on DPG-Bench with no post-training. Beating Imagen4 on OneIG with no human preference tuning. Matching Qwen-Image on GenEval before a single RL step. Whatever the post-trained version of this looks like when it arrives and the architecture makes it a genuinely interesting fine-tuning target, the base is already doing things that took other models full alignment pipelines to reach.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Elon Musk Lost His OpenAI Lawsuit. The Jury Never Actually Decided If He Was Right

Elon Musk Lost His OpenAI Lawsuit. The Bigger Question Was Never Put to the...

0
Elon Musk spent months in a California courtroom trying to prove that Sam Altman stole a charity. He got nine jurors, weeks of testimony from some of the biggest names in Silicon Valley, and a front row seat to the most revealing airing of OpenAI's founding history ever put on public record. Then the jury came back in under two hours and told him he'd filed too late. Not that he was wrong. Not that Altman and Brockman acted properly. Just that whatever happened between them and Musk, the legal clock had already run out before he decided to do something about it. The question of whether OpenAI actually betrayed its founding mission, the question that made this case worth following in the first place never got answered.
Apple New Siri Could Auto-Delete Chats. Google Gemini Is Reportedly Under the Hood

Apple’s New Siri Could Auto-Delete Chats. Google Gemini Is Reportedly Under the Hood.

0
Apple has a Siri problem and everyone knows it. ChatGPT became a verb. Gemini is powering half the Android ecosystem. Claude is showing up in enterprise workflows. Meanwhile Siri is still struggling to set timers reliably. WWDC is in June and Apple is reportedly planning its biggest Siri overhaul yet. A standalone app, a proper chatbot experience, and a privacy pitch front and center. According to Bloomberg's Mark Gurman, Apple executives plan to argue they're taking a more privacy-friendly approach than every other AI company out there. That argument gets complicated quickly. The model powering this new Siri is Google Gemini.
zero language for ai agents

Vercel Built a Programming Language for AI Agents. The Compiler Speaks JSON.

0
Every serious coding agent including Claude Code, Cursor, Copilot, whatever you're using shares the same quiet problem. The agent writes code, the compiler throws an error, and the agent has to read text written for a human engineer to figure out what went wrong and how to fix it. That sounds like a minor inconvenience. In practice it's one of the main reasons agentic coding loops break down. Error message formats change between compiler versions. The same underlying problem gets described differently depending on context. There's no built-in concept of a repair action, just prose that an agent has to parse and hope it understood correctly. Vercel Labs just released Zero, an experimental systems language built from day one around the idea that the compiler should talk to agents as clearly as it talks to humans. Its Apache 2.0 licensed, available now and genuinely interesting even at v0.1.1.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy