back to top
HomeTechAI ModelsVOID: Netflix's open source AI removes objects and fixes the physics they...

VOID: Netflix’s open source AI removes objects and fixes the physics they break

- Advertisement -

Netflix has a visual effects budget most film studios would kill for. They do not release open source AI tools for fun. When they do ship something publicly, it is worth paying attention.

VOID is their latest release. Video Object and Interaction Deletion. Point at an object in a video, and VOID removes it. Everything that object was doing to the world around it.

That last part is where every other tool has failed for years. Remove a person carrying a stack of boxes and the boxes hang in mid air. Remove a chair someone is sitting on and the person hovers. The physics of the scene breaks and the edit becomes unusable. Film editors have been cleaning this up by hand since video editing existed.

VOID does not just erase. It reasons about what should happen next. A vision language model looks at the scene first, identifies everything the removed object was physically affecting, and only then does the diffusion model generate what the world looks like without it. Remove the person, the boxes fall. Remove the chair, the person sits on the floor. The scene stays physically coherent.

The physics breakthrough

Most video removal tools work like a smart eraser. They look at the pixels around the removed object and fill the gap with something plausible. That works fine when the object is just sitting there. It falls apart the moment the object is doing something.

VOID approaches this differently. Before any inpainting happens, a vision language model reads the scene and asks a question most tools never bother with. What is this object actually affecting? A person carrying boxes affects the boxes. A ball mid-collision affects the trajectory of everything it is about to hit. A chair someone is sitting on affects where that person’s weight is going.

The answer to that question gets encoded into something called a quadmask. Four values, four regions. Zero marks what gets removed. 63 marks overlap regions. 127 marks everything causally affected by the removed object, the falling boxes, the displaced items, the changed trajectories. 255 marks what stays untouched. The diffusion model then generates a physically coherent version of the scene using that map as a guide.

That is the difference. Other tools guess what was behind the object. VOID reasons about what the scene should look like if the object was never there.

For most videos one pass is enough. For longer clips where object shapes start drifting over time, an optional second pass uses optical flow to warp the first pass output and stabilize shapes along the newly generated trajectories. It is not always necessary but it is there when you need it.

What this unlocks for creators

Film editors have been doing this work by hand for decades. A continuity error, an unwanted extra walking through the background, a prop that should not be in the shot. Fixing any of these in post has meant hours of frame by frame rotoscoping. VOID cuts that down to a masking step and an inference run.

For YouTubers and independent filmmakers the value is more immediate. Professional object removal has lived behind expensive software and even more expensive VFX artists. A tool that understands physics and runs on a single A100 changes that calculation significantly.

The less obvious use case is video dataset generation. Researchers building training data for robotics or autonomous systems need clean counterfactual examples, videos showing what a scene looks like with and without specific objects. VOID generates those automatically with physically consistent outcomes.

Hardware Requirements & how to run it

VOID requires a GPU with 40GB or more of VRAM. That means an A100 or equivalent. If you are on a consumer GPU, a 3090, a 4090, even a workstation card under 40GB, you cannot run this locally right now.

If you have access to the hardware, setup is straightforward. Clone the repo, run the included notebook and it handles everything. Downloads the models, runs inference on a sample video, shows you the result. The CLI is available for production pipelines.

You will need two things from HuggingFace. The base CogVideoX-Fun model at around 10GB and the VOID checkpoints at 22.3GB total. Pass 1 is the core inpainting model. Pass 2 is optional, only needed for longer clips where shape consistency becomes an issue.

For most people without access to an A100, the demo on the project page is the realistic option right now. Cloud GPU rentals on platforms like RunPod or Lambda Labs bring it within reach if you need to run it on real footage without owning the hardware.

You May Like: daVinci-MagiHuman Finally Makes Open-Source AI Video Feel Real

When tech giants contribute to open source

Netflix did not have to release this. They have the infrastructure to keep VOID internal, use it for their own productions, and leave independent creators with the same expensive workarounds they have always had. They shipped it anyway. Apache 2.0, weights on HuggingFace, full training code on GitHub.

That matters beyond this specific tool. When a company with Netflix’s resources open sources serious research, it sets a bar. Runway charges subscription fees for inferior object removal. VOID does something it cannot do. It understands the physics of what it is removing, and costs nothing to use if you have the hardware.

The 40GB VRAM requirement keeps it out of reach for most people today. That will change. Models get quantized, hardware gets cheaper, and someone will have a consumer-friendly wrapper running within months. The foundation Netflix shipped is serious enough that it will be worth revisiting when that happens.

If you are in video production or research right now and have access to an A100, there is no reason not to try it today. Everyone else, bookmark it and watch the GitHub.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Hackers Used a VS Code Extension to Reach GitHub’s Internal Repositories. The Pattern Should Worry Developers

Hackers Used a VS Code Extension to Reach GitHub’s Internal Repositories. The Pattern Should...

0
GitHub says hackers reached thousands of internal repositories after compromising an employee device through a malicious VS Code extension. That detail matters more than the breach itself because this keeps happening now. OpenAI got hit through a poisoned developer dependency earlier this year. The European Commission got compromised through a similar supply chain route. Attackers are increasingly targeting the tools developers trust instead of trying to break company infrastructure directly. And honestly, it makes sense. A developer machine already has access to everything attackers want. This GitHub incident is another reminder that the weakest point in modern software security might not be the company. It might be the extensions, packages, and tools sitting inside a developer’s editor.
Google's Next AI Bet Isn't on Chatbots. It's on Agents That Do the Work

Google’s Next AI Bet Isn’t on Chatbots. It’s on Agents That Do the Work.

0
For the last three years, Google has been playing catch-up in the chatbot race. ChatGPT arrived, Gemini followed, and the conversation quickly became about which AI could answer questions better, faster, and more accurately. Google I/O this week suggested the company is done competing on chat alone. Gemini 3.5 Flash launched Tuesday, and Google barely framed it as a conversational product. Instead, the company focused on coding pipelines, autonomous research, multi-agent coordination, and one demo that stood out across the industry: building an operating system from scratch with minimal human input. The model can reportedly operate autonomously for hours. Google says it’s up to 4× faster than other frontier models, with an optimized version reaching 12× faster speeds at similar quality.
Andrej Karpathy Is Joining Anthropic. What It Says About Where AI Is Heading

Andrej Karpathy Joined Anthropic. What It Says About Where AI Is Heading.

0
Andrej Karpathy doesn't make random career moves. He co-founded OpenAI in 2015, left to build Tesla's self-driving program, came back to OpenAI for a year, then left again in 2024 to start an AI education company. Every transition has been deliberate and every one of them has turned out to be worth paying attention to. On Tuesday he posted on X that he's joined Anthropic. "I think the next few years at the frontier of LLMs will be especially formative," he wrote. "I am very excited to join the team here and get back to R&D." The "get back to R&D" part is the signal. Karpathy has spent the last several years teaching, building, and explaining. Now he's going back to the frontier. And the specific place he's going says something about where the most important work in AI actually is right now.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy