back to top
HomeTechAI ModelsVOID: Netflix's open source AI removes objects and fixes the physics they...

VOID: Netflix’s open source AI removes objects and fixes the physics they break

- Advertisement -

Netflix has a visual effects budget most film studios would kill for. They do not release open source AI tools for fun. When they do ship something publicly, it is worth paying attention.

VOID is their latest release. Video Object and Interaction Deletion. Point at an object in a video, and VOID removes it. Everything that object was doing to the world around it.

That last part is where every other tool has failed for years. Remove a person carrying a stack of boxes and the boxes hang in mid air. Remove a chair someone is sitting on and the person hovers. The physics of the scene breaks and the edit becomes unusable. Film editors have been cleaning this up by hand since video editing existed.

VOID does not just erase. It reasons about what should happen next. A vision language model looks at the scene first, identifies everything the removed object was physically affecting, and only then does the diffusion model generate what the world looks like without it. Remove the person, the boxes fall. Remove the chair, the person sits on the floor. The scene stays physically coherent.

The physics breakthrough

Most video removal tools work like a smart eraser. They look at the pixels around the removed object and fill the gap with something plausible. That works fine when the object is just sitting there. It falls apart the moment the object is doing something.

VOID approaches this differently. Before any inpainting happens, a vision language model reads the scene and asks a question most tools never bother with. What is this object actually affecting? A person carrying boxes affects the boxes. A ball mid-collision affects the trajectory of everything it is about to hit. A chair someone is sitting on affects where that person’s weight is going.

The answer to that question gets encoded into something called a quadmask. Four values, four regions. Zero marks what gets removed. 63 marks overlap regions. 127 marks everything causally affected by the removed object, the falling boxes, the displaced items, the changed trajectories. 255 marks what stays untouched. The diffusion model then generates a physically coherent version of the scene using that map as a guide.

That is the difference. Other tools guess what was behind the object. VOID reasons about what the scene should look like if the object was never there.

For most videos one pass is enough. For longer clips where object shapes start drifting over time, an optional second pass uses optical flow to warp the first pass output and stabilize shapes along the newly generated trajectories. It is not always necessary but it is there when you need it.

What this unlocks for creators

Film editors have been doing this work by hand for decades. A continuity error, an unwanted extra walking through the background, a prop that should not be in the shot. Fixing any of these in post has meant hours of frame by frame rotoscoping. VOID cuts that down to a masking step and an inference run.

For YouTubers and independent filmmakers the value is more immediate. Professional object removal has lived behind expensive software and even more expensive VFX artists. A tool that understands physics and runs on a single A100 changes that calculation significantly.

The less obvious use case is video dataset generation. Researchers building training data for robotics or autonomous systems need clean counterfactual examples, videos showing what a scene looks like with and without specific objects. VOID generates those automatically with physically consistent outcomes.

Hardware Requirements & how to run it

VOID requires a GPU with 40GB or more of VRAM. That means an A100 or equivalent. If you are on a consumer GPU, a 3090, a 4090, even a workstation card under 40GB, you cannot run this locally right now.

If you have access to the hardware, setup is straightforward. Clone the repo, run the included notebook and it handles everything. Downloads the models, runs inference on a sample video, shows you the result. The CLI is available for production pipelines.

You will need two things from HuggingFace. The base CogVideoX-Fun model at around 10GB and the VOID checkpoints at 22.3GB total. Pass 1 is the core inpainting model. Pass 2 is optional, only needed for longer clips where shape consistency becomes an issue.

For most people without access to an A100, the demo on the project page is the realistic option right now. Cloud GPU rentals on platforms like RunPod or Lambda Labs bring it within reach if you need to run it on real footage without owning the hardware.

You May Like: daVinci-MagiHuman Finally Makes Open-Source AI Video Feel Real

When tech giants contribute to open source

Netflix did not have to release this. They have the infrastructure to keep VOID internal, use it for their own productions, and leave independent creators with the same expensive workarounds they have always had. They shipped it anyway. Apache 2.0, weights on HuggingFace, full training code on GitHub.

That matters beyond this specific tool. When a company with Netflix’s resources open sources serious research, it sets a bar. Runway charges subscription fees for inferior object removal. VOID does something it cannot do. It understands the physics of what it is removing, and costs nothing to use if you have the hardware.

The 40GB VRAM requirement keeps it out of reach for most people today. That will change. Models get quantized, hardware gets cheaper, and someone will have a consumer-friendly wrapper running within months. The foundation Netflix shipped is serious enough that it will be worth revisiting when that happens.

If you are in video production or research right now and have access to an A100, there is no reason not to try it today. Everyone else, bookmark it and watch the GitHub.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Open Source AI Coding Agents That Don't Need a Subscription

7 Open Source AI Coding Agents That Don’t Need a Subscription

0
Open almost any "best AI coding tools" list and you'll see the same names: Cursor, GitHub Copilot, Claude Code. They're good tools but they're also closed source and paid. What's changed over the past year isn't the quality of those products, it's how quickly the open-source alternatives have caught up. Some can orchestrate multiple agents, remember your projects across sessions, and automate complex development workflows. Many let you bring your own model, whether that's a local LLM, OpenRouter, OpenAI, GLM-5.2, Ornith, DeepSeek, or something else entirely. More importantly, you're in control. You decide where your code runs, which model powers it, and how your workflow evolves without being locked into a single company's ecosystem. If you've only looked at the paid options, these are the open-source AI coding tools worth knowing about.
Ornith Coding model that beats Claude opus 4.7

Ornith 1.0: The New Open-Source AI Model for Agentic Coding

0
Most reinforcement learning setups for coding models work the same way. Researchers build a harness, a fixed scaffold that tells the model how to approach a category of task, then the model gets rewarded for solving problems inside that structure. The harness stays fixed. Only the model's answers change. Ornith-1.0, a new open-source coding model family from DeepReinforce is not just about coding, Instead the model writes its own scaffold. At every training step, it looks at the task in front of it and the scaffold it used last time, then proposes a better version of that scaffold before even attempting an answer. The reward doesn't just grade the solution. It grades the scaffold that produced it. That's a small architectural choice with a strange consequence. A model that gets to design its own training process can, in theory, design one that cheats the verifier instead of solving the actual problem, and DeepReinforce is upfront that this happened during training. The fix they built for it is also worth understanding before getting to the benchmark numbers.
OpenAI Built Its First AI Chip. It's Not Trying to Replace NVIDIA

OpenAI Built Its First AI Chip. It’s Not Trying to Replace NVIDIA.

0
When the news broke that OpenAI had built a custom chip, the instinct was to frame it as a NVIDIA story. Another lab trying to cut the cord, reduce dependence on H100s, claw back some margin from the company that's been printing money off the AI boom. That's not quite what's happening here. The chip is called Jalapeño, built with Broadcom, and it doesn't touch training at all. It's an inference chip, meaning it only runs models after they're already built, when a user sends a message and ChatGPT has to respond. The compute-heavy work of actually training those models still runs on NVIDIA hardware. OpenAI isn't replacing NVIDIA. It's going after a different part of the problem entirely, the part that happens millions of times a day, every time someone uses one of their products. That distinction matters because inference is where AI costs actually accumulate at scale. Training happens once per model. Inference never stops.