back to top
HomeSoftwareAI ToolsOvi AI Video + Audio Generator in ComfyUI — Best Open-Source Alternative...

Ovi AI Video + Audio Generator in ComfyUI — Best Open-Source Alternative to Veo 3 & Sora 2

- Advertisement -

File Information

PropertyDetails
NameComfyUI-Ovi
VersionLatest
PlatformWindows, Linux, macOS (via ComfyUI)
File TypeCustom Node Workflow
LicenseOpen Source (GitHub)
RepositoryComfyUI-Ovi
DependenciesPyTorch 2.4+, CUDA 12.x
VRAM Requirement16–24 GB (FP8) or >32 GB (BF16)
CategoryAI Video + Audio Generation Workflow

Description

Experience next-generation AI video and audio generation locally with Ovi in ComfyUI — the most powerful open-source workflow that rivals Google’s Veo 3 and OpenAI’s Sora 2.
With Ovi’s multimodal fusion engine and seamless integration into ComfyUI, you can create AI-generated videos with synchronized sound, all without depending on cloud services.

It’s inspired by Character.AI’s Ovi and integrates seamlessly into the ComfyUI node environment, offering a fully modular, GPU-accelerated, and privacy-friendly experience.

Think of it as a self-hosted alternative to proprietary systems like Veo 3 or Sora 2, giving you total creative freedom and zero cloud dependency.

Features of Ovi: Open Source Veo 3 & Sora 2 Alternative

FeatureDescription
Self-Bootstrapping LoaderAutomatically downloads and manages MMAudio assets and Ovi fusion weights.
Precision ControlChoose between BF16 (for 32 GB + GPUs) or FP8 (for 16–24 GB cards).
Attention SelectorSwitch dynamically between FlashAttention, SDPA, Sage, and more.
Multi-GPU OptimizationTargets specific GPUs in multi-card setups for faster inference.
Component ReuseReuses your existing Wan 2.2 VAE and UMT5 text encoder without duplication.
CPU Offload OptionMoves larger modules to RAM when VRAM is limited.
Automatic Directory SetupPlaces all required files (weights, encoders, VAEs) in proper directories automatically.
Fully Node-BasedIntegrated directly into ComfyUI as custom nodes, accessible under the “Ovi” category.
Fast & Flexible GenerationSupports text-to-video, iDirectory Structure
mage-to-video, video + audio fusion, and custom first-frame prompts.

Screenshots

Generation From Ovi AI Video + Audio Generator

System Requirements

ComponentMinimumRecommended
GPU16 GB (FP8 with offload)32 GB + (BF16 without offload)
CPU8-Core12 + Core
RAM32 GB64 GB + for large projects
Storage30 GB freeSSD preferred
CUDA12.x12.4 +
PyTorch2.4 +Latest Stable
OS SupportWindows, Linux, macOS (via ComfyUI)Windows/Linux preferred for CUDA acceleration

Directory Structure

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   ├── Ovi-11B-bf16.safetensors
│   │   └── Ovi-11B-fp8.safetensors
│   ├── text_encoders/umt5-xxl-enc-bf16.safetensors
│   └── vae/wan2.2_vae.safetensors
└── custom_nodes/ComfyUI-Ovi/ckpts/MMAudio/ext_weights/...

Available Ovi Nodes

NodeDescription
Ovi Engine LoaderDownloads missing weights, builds the fusion engine, and exposes OVI_ENGINE with selectable precision and device.
Ovi Wan Component LoaderConnects Ovi to existing Wan 2.2 VAE and UMT5 encoders.
Ovi Attention SelectorDynamically changes attention backend (FlashAttention, SDPA, etc.).
Ovi Video GeneratorGenerates AI-based video + audio latents from text prompts.
Ovi Latent DecoderConverts latents into viewable video + audio output.

How to Install Ovi Using ComfyUI

  1. Navigate to your ComfyUI custom nodes folder: cd ComfyUI/custom_nodes
  2. Clone the Ovi repository: git clone https://github.com/snicolast/ComfyUI-Ovi.git cd ComfyUI-Ovi
  3. Install dependencies: pip install -r requirements.txt
  4. Restart ComfyUI
    • Ovi nodes will now appear under the “Ovi” category in ComfyUI’s node search.

Workflow Example

  1. Drop Ovi Engine Loader — select your precision and enable CPU offload if needed.
  2. (Optional) Connect Ovi Wan Component Loader if your encoder/VAE is stored elsewhere.
  3. Add Attention Selector — pick FlashAttention, SDPA, or Auto.
  4. Generate Video — input your prompt (supports <S> speech and <AUDCAP> audio tags).
  5. Decode Latents — feed results into Ovi Latent Decoder for video + audio output.
  6. Export & Save — connect the outputs to your preferred save nodes in ComfyUI.

Troubleshooting & Tips

  • High VRAM after render: Use ComfyUI’s Unload Models button.
  • Missing weights: Place manually in the appropriate folders — loader will skip downloads if found.
  • Switching precision: Change in dropdown; no restart needed.
  • Backend errors: If FlashAttention/xFormers are missing, Ovi automatically falls back to native.

Why Ovi + ComfyUI is the Best Sora 2 & Veo 3 Alternative

Unlike closed-source AI video systems, ComfyUI-Ovi is:

  • 100 % open source and customizable
  • Runs completely offline
  • Uses existing ComfyUI assets (Wan 2.2, MMAudio)
  • Supports multi-GPU rendering
  • Lets you fine-tune, control precision, and select backend performance

Download Ovi AI Video Generator ComfyUI Workflow

Install Ovi AI Video + Audio Generator Best Veo 3 & Sora 2 alternative Directly

If you want to download and install Ovi AI Video Editor Diretly and run it using gradio interface then follow this Ovi Installation Guide, Enjoy!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Llamafile to Run AI Models Locally on Your PC with Just One File

Llamafile: Run AI Models Locally on Your PC with Just One File

0
Running a local LLM usually means a Python environment, CUDA drivers, and at least one Stack Overflow tab open before you've even started. llamafile skips all of that. Mozilla.ai packaged the whole runtime like model weights and everything into a single executable. On Windows you rename it to .exe. On Mac or Linux you chmod +x it. That's the setup.
Onyx Open-Source AI Platform for RAG, Agents & LLM Apps

Onyx: Open-Source AI Platform for RAG, Agents & LLM Apps

0
Most LLM tools feel like demos. You ask something, get an answer, and that’s about it. Onyx feels more like something you’d actually build on. It sits between you and the model and adds the stuff you end up needing anyway. Search, agents, file output, even running code. You can plug in OpenAI, Anthropic, or run your own models with Ollama. Swap things out when you feel like it. The agents part is what makes it more powerful. You can give them instructions, let them browse the web, generate files, call external tools. It can get heavy if you run the full version. There’s indexing, workers, caching, all that. But if you’re serious about using LLMs beyond basic chat, that’s kind of the point. Lite mode exists if you just want to poke around without setting up a whole system.
Another Open Source Android Screen Mirror & Controller for Desktop

Another – Open Source Android Screen Mirror & Controller for Desktop

0
Another puts your Android screen directly on your desktop and lets you control it entirely from your keyboard and mouse. It mirrors in real-time over USB or WiFi, forwards audio, lets you type directly into the device, and records your screen as a .webm file. There's also a macro system — record a sequence of interactions once, replay it whenever you need it. Useful for testing, demos, or anything repetitive.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy