back to top
HomePicksAI Picks4 Open Source AI Video Models for Editing and Generation

4 Open Source AI Video Models for Editing and Generation

- Advertisement -

If you have been looking for open source tools to work with video using AI you have probably noticed something. Most of what gets covered is generation like creating new videos from scratch. The editing side, actually modifying existing footage with AI, has been much quieter. That is starting to change.

There are now open source models that can swap outfits, replace backgrounds, remove objects, change characters and apply styles to existing video using plain text instructions. Some are built specifically for editing. Others are generation models that fit naturally into a creative video workflow.

Either way they are all worth your time.

1. Kiwi-Edit

Text based video editing sounds simple until you actually try it. Most models either ignore your instruction, change too much, or lose the original motion entirely. Kiwi-Edit can handle all three problems reasonably well.

You give it a video, a text instruction, and optionally a reference image. It makes the edit. The motion stays. The scene stays. What changes is what you asked to change.

The reference image part is what genuinely surprised me. You can hand it a photo of a specific background and tell it to swap the scene to match that image. Not a text description, an actual image. That level of control is rare even in paid tools.

On OpenVE-Bench it scores 3.02, the best among open source methods. It beats VACE at 14B and DITTO at 14B despite being only 5B. At 1280×720 the output is actually usable quality.

Features of Kiwi-Edit

  • Text instruction guided editing
  • Reference image guided editing
  • Style transfer, background replacement, object removal and insertion
  • Preserves original motion and composition

VRAM requirements 16-24GB for comfortable inference at 720p. A single RTX 4090 or A100 handles it.

2. Lucy-Edit-Dev

Lucy-Edit-Dev does one thing really well, changing what someone is wearing in a video while keeping everything else exactly the same. The motion stays. The person stays. Just the outfit changes.

That sounds narrow until you see what it actually covers. Swap an outfit for a kimono. Turn a person into Harley Quinn. Replace a character with a polar bear. Change a shirt into a sports jersey. All from a plain text instruction.

Built on top of Wan 2.2 5B so the architecture is solid and it plugs into existing Diffusers workflows without much friction.

A few honest caveats though. Color changes are hit or miss, sometimes subtle, sometimes way too aggressive. Adding objects tends to attach them to the subject rather than placing them naturally in the scene. Global transformations like turning a beach into a snowfield can mess with the subject’s identity. The model is honest about these limitations in their own docs which I appreciate.

One thing to check before using it, the license is non-commercial. Free for personal use, research and experimentation. If you are building a product read the terms first.

Features of Lucy-Edit-Dev

  • Clothing and outfit changes with motion preservation
  • Character replacement and object swaps
  • Scene and style transformations
  • Pure text instructions, no masks or finetuning required
  • Built on Wan 2.2 5B, Diffusers compatible

VRAM requirements Similar to Wan 2.2, 16GB minimum, 24GB recommended for comfortable inference.

3. MatAnyone 2

Background removal on video has always had one problem that nobody solved cleanly. Hair. Thin strands, flyaways, curly edges, every tool either chops them off or leaves a messy halo around the subject.

MatAnyone 2 handles this differently. Instead of just detecting where the subject ends it evaluates the quality of every pixel in the matte and corrects the ones it got wrong. The result is clean edges even on hair that would defeat most commercial tools.

Drop your video, click a few points on the first frame to mark the subject, and it handles the rest. Supports mp4, mov and avi.

Its Worth noting that training codes and the full dataset are still coming. What is available right now is inference only which is enough to use it practically. Check the license before using it commercially, it is NTU S-Lab License 1.0 not Apache or MIT.

I’ve covered MatAnyone 2 in detail in a dedicated article if you want the full breakdown including how it compares to other tools.

Features of MatAnyone 2

  • Pixel level quality evaluation for clean edge detection
  • Preserves hair strands and fine details other tools miss
  • Interactive demo on HuggingFace, no install needed to test
  • Supports mp4, mov and avi

VRAM requirements: Minimum 10GB. Runs on consumer GPUs comfortably.

4. LTX 2.3

LTX 2.3 is not a dedicated editing model. It generates video and audio together from scratch. But it earns its place here because synchronized audio-video generation opens up creative workflows that pure editing models cannot cover.

Most video AI tools handle video only. You generate the clip, then figure out audio separately. LTX 2.3 generates both in a single pass synchronized from the same model, same prompt, same generation run. For content creators building scenes from scratch that is a genuinely different workflow.

22 billion parameters. ComfyUI support built in. Distilled version available for faster generation at 8 steps. Spatial and temporal upscalers for higher resolution and frame rate output. Fully trainable base model if you want to fine-tune for a specific style or motion.

If you need to modify existing footage Kiwi-Edit or Lucy-Edit are the right tools. LTX 2.3 is for building new video content from a prompt.

Features of LTX 2.3

  • Synchronized audio and video generation in one model
  • Text to video, image to video generation
  • ComfyUI and PyTorch support
  • Distilled version for faster inference
  • Spatial and temporal upscalers included

VRAM requirements High end GPU recommended. 24GB VRAM for comfortable generation at reasonable resolutions.

The gap is closing

Dedicated video editing models are still rare. Kiwi-Edit and Lucy-Edit are genuinely impressive but the space is thin compared to what exists for image editing or even video generation.

The generation side is more mature. LTX 2.3 and the models in our dedicated video generation article show how far open source has come in the last year. Editing is catching up but it is not there yet.

If you are building a video workflow today the practical approach is combining what exists, generate with LTX 2.3, remove backgrounds with MatAnyone 2, edit specific elements with Kiwi-Edit. No single tool does everything yet.

That will change. The pace of open source video AI in 2026 suggests the gap between editing and generation capabilities will close faster than most people expect.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
YOU MAY ALSO LIKE
NVIDIA's Vera Rubin Explains Why Your Current GPU Was Never Built for AI Agents

NVIDIA’s Vera Rubin Explains Why Your Current GPU Was Never Built for AI Agents

0
Jensen Huang walked onto the GTC stage and said something that did not sound like a chip announcement. He called Vera Rubin "the greatest infrastructure buildout in history." That is a bold claim even for NVIDIA. But when you look at what Vera Rubin actually is the ambition makes more sense. This is not a faster GPU. It is seven chips designed to work together as one supercomputer, built specifically for a world where AI does not just answer questions but plans, executes, and runs continuously without stopping. Every GPU you have used until now was designed for training massive models or answering queries fast. Neither of those is the same as running an agent that plans, executes tools, checks its own work and keeps going for hours. Current infrastructure was simply never designed for that workload. Vera Rubin is NVIDIA's answer to that problem.
Mistral Small 4 The Open Source Model Replacing Three of Mistral's Own AI Models

Mistral Small 4: The Open Source Model Replacing Three of Mistral’s Own AI Models

0
Mistral just did something most AI companies avoid. Instead of releasing three separate specialized models and making developers juggle between them, they merged everything into one. Mistral Small 4 combines reasoning, multimodal and agentic coding into a single open source model. Until today if you wanted Mistral's best reasoning you used Magistral. Best coding agents you used Devstral. Image and document understanding you used Pixtral. Three different models, three different integrations & three different things to maintain. Now it is one model. Apache 2.0 licensed & Available on huggingface. It has 119 billion total parameters but only 6 billion active at any time. That efficiency gap is what makes it practical to actually deploy. If you have been waiting for an open source model that does not force you to choose between speed, reasoning and vision, this is worth paying attention to.
Reka Edge is The 7B Multimodal AI Model That Beats Gemini 3 Pro on Object Detection

Reka Edge: The 7B Multimodal AI Model That Beats Gemini 3 Pro on Object...

0
Most people assume beating a Google model requires another massive frontier model. More parameters. More compute. That is just how the hierarchy usually works. Reka Edge is a 7-billion-parameter model. Yet it manages to outperform Gemini 3 Pro on object detection benchmarks, and with quantization it can even run on devices like the Samsung S25. That combination should not exist. A model small enough to fit on a phone outperforming a frontier AI system from Google on a specific but genuinely useful task is not something you expect to see in 2026. Yet here we are. This is not a model that beats Gemini at everything. It does not. But where it wins it wins convincingly.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy