back to top

Ovi: Open Source AI Video & Audio Generator Like Sora 2 and Veo 3

An Open Source Alternative to Veo 3 & Sora 2 AI Video Generator for free

- Advertisement -

File Information

PropertyDetails
NameOvi
VersionLatest commit on GitHub
File TypeGit repository (zip available)
LicenseOpen Source (Apache 2.0)
RepositoryGitHub Repository
PlatformWindows, Linux, macOS
Required Python3.10+
DependenciesPyTorch, torchvision, torchaudio, flash_attn, others

Description

Ovi is a groundbreaking open-source AI model developed by Character AI that revolutionizes the way creators generate audiovisual content. Ovi can simultaneously produce synchronized video and audio directly from text or a combination of text and images. Inspired by advanced models like Veo 3 and Sora 2, it empowers users to bring their ideas to life with stunning cinematic results, without the need for expensive software or high-end hardware.

With Ovi, creators can produce 5-second high-quality videos with audio at 24 frames per second, supporting multiple aspect ratios including 9:16, 16:9, and 1:1. This makes it perfect for social media clips, marketing content, and even educational short videos. The model’s flexible input system allows you to simply provide a text prompt describing the scene, or enhance it further by adding an image reference, giving you full creative control over the generated content.

One of Ovi’s most impressive features is its ability to generate synchronized audio and video seamlessly. Using special tags like <S> for speech and <AUDCAP> for audio descriptions, users can define exactly what dialogue or sound effects appear in the video. This ensures that every generated clip has perfectly timed narration, effects, or background sounds, opening endless possibilities for storytelling, music videos, or even animated scenes.

Ovi is also designed to be highly accessible. The model is open-source, meaning developers can modify it, integrate it into their own projects, or experiment with custom enhancements. Its performance is optimized for modern GPUs, but it also works with slightly older hardware, making it approachable for both hobbyists and professional creators.

For creators who love experimenting, Ovi supports multiple modes including text-to-video, image-to-video, and hybrid workflows. You can generate clips in bulk, test variations, and even fine-tune the output using configuration files. Multi-GPU setups are supported for faster generation, while single GPU usage is straightforward for beginners.

Usage of Ovi AI Video Generator

Whether you’re a filmmaker, content creator, or AI enthusiast, Ovi provides a powerful, flexible, and fully open-source tool to produce high-quality audiovisual content. With its combination of ease-of-use, creative freedom, and advanced AI capabilities, Ovi truly sets itself apart as one of the most exciting video generation tools available today.

You can easily find some of the video Generations from Sora 2 & veo3 alternative Ovi on their official github repository

How Ovi was Developed??

Ovi is built with contributions and ideas from some of the best open-source video and audio generation projects. Its video branch is initialized from Wan2.2, ensuring cutting-edge capabilities for text-to-video generation. The audio encoder and decoder components are borrowed from MMAudio, which allows Ovi to synchronize audio with video seamlessly.

Features of Ovi AI Video Generator

FeatureDescription
Video+Audio GenerationCreate synchronized video and audio clips in one go
Flexible InputSupports text-only or text+image prompts
Multi-Aspect Ratios9:16, 16:9, 1:1 and more
High FPSGenerates 5-second clips at 24 FPS
Customizable PromptsUse <S> tags for speech and <AUDCAP> tags for audio descriptions
Multi-GPU SupportAccelerated generation with multiple GPUs
Open-SourceFully modifiable and extensible
Easy ConfigurationYAML files to control generation quality, audio/video balance, and output directory
Example PromptsReady-to-use CSV examples for quick start
Flexible Memory UsageSupports fp8 quantization and CPU offloading for lower VRAM GPUs

Screenshots

System Requirements

ComponentMinimumRecommended
GPU32 GB VRAM for standard model, 24 GB for fp8 quantizedRTX 40XX, 30XX, 20XX, GTX 10XX, or equivalent
CPU8 cores12+ cores for multi-GPU setups
RAM16 GB32+ GB
Disk Space10 GB50+ GB for models, videos, and caches
OSWindows, Linux, macOSSame
Python3.10+Same

How to Install Ovi: An Open Source Alternative to Sora 2 & Veo3 Locally

Step-by-Step Installation

  1. Clone the repository

You can clone the repository from command below or download it manually from download section of the page.

git clone https://github.com/character-ai/Ovi.git
cd Ovi
  1. Create and activate a virtual environment
virtualenv ovi-env
source ovi-env/bin/activate
  1. Install PyTorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1
  1. Install dependencies
pip install -r requirements.txt
  1. Install Flash Attention
pip install flash_attn --no-build-isolation

Alternative method (if needed)

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py install
cd ../..
  1. Download pre-trained weights
python3 download_weights.py
# Optional: custom directory
python3 download_weights.py --output-dir <custom_dir>
  1. Run Ovi
python3 inference.py --config-file ovi/configs/inference/inference_fusion.yaml

For multi-GPU setups:

torchrun --nnodes 1 --nproc_per_node 8 inference.py --config-file ovi/configs/inference/inference_fusion.yaml

You can also launch the Gradio interface:

python3 gradio_app.py
# Optional flags: --cpu_offload, --use_image_gen, --fp8

Download Ovi: An Open Source Veo3 & Sora2 Alernative to generate AI video with audio for free

Tips & Advantages of Ovi AI Video + Audio Generator

  • Generate videos directly from text prompts in seconds.
  • Include synchronized speech or sound effects using special tags.
  • Fine-tune video quality, denoising steps, and audio/video balance via config files.
  • Share .lset style prompt files with embedded configurations for collaboration.
  • Works efficiently on both older and newer GPUs, with fp8 quantization reducing VRAM usage.
  • Open-source and fully modifiable, perfect for research or personal projects.

Ovi is an exceptional choice for anyone exploring AI-driven video creation. It offers the same capabilities as Sora 2 or Veo 3, but with full open-source flexibility, multi-GPU support, and customizable inputs for truly unique results. Whether you are making short clips for social media, experimental AI films, or storytelling videos, Ovi provides unmatched control and efficiency.

- Advertisement -
YOU MAY ALSO LIKE

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular