back to top
HomeSoftwareAI ToolsVibeVoice AI Voice & Podcast Generator Download and Install Locally Using ComfyUI

VibeVoice AI Voice & Podcast Generator Download and Install Locally Using ComfyUI

Vibe Voice ComfyUI

- Advertisement -

File Information

NameVibeVoice with ComfyUI Integration
VersionLatest Release
LicenseMIT License (Free & Open Source)
PlatformsWindows, macOS, Linux
File TypesSource code, Python dependencies
CategoryText-to-Speech & Conversational AI

Description

VibeVoice, developed by Microsoft, is a cutting-edge open source framework for generating expressive, multi-speaker conversational audio. By integrating it into ComfyUI’s modular workflow, you can now build natural, podcast-like dialogue with up to 4 speakers in one audio file. Whether you want to produce lifelike conversations, narrations, or long-form content, VibeVoice excels at delivering clarity, consistency & realism.

Unlike traditional TTS systems, VibeVoice allows zero-shot voice cloning – simply provide a short audio sample in .wav or .mp3 format, and it instantly recreates that speaker’s timbre. With advanced attention mechanisms like eager, sdpa, flash_attention_2 & the new high-performance SageAttention, developers have complete control over speed, memory usage & compatibility.

ComfyUI manages the heavy lifting by automatically downloading & optimizing models, so you don’t need to worry about manual setup. With optional 4-bit quantization, even GPUs with limited VRAM can run large VibeVoice models efficiently.

This combination makes VibeVoice with ComfyUI one of the best free alternatives to commercial AI speech tools, giving you the power to create professional-grade audio locally on your own machine, with full privacy & no vendor lock-in. You can also try the demo of large model here on HuggingFace Space

Scroll down, follow the installation steps, & start creating expressive multi-speaker dialogues today.

Features of VibeVoice by Microsoft

FeatureDescriptionBenefit
Multi-Speaker TTSGenerate conversations with up to 4 unique voices in one audio output.Perfect for podcasts, dialogues & storytelling.
Zero-Shot Voice CloningClone any voice instantly from a .wav or .mp3 file.No training required, highly natural results.
Advanced Attention ModesChoose from eager, sdpa, flash_attention_2, or sage for optimized performance.Flexibility between speed, memory efficiency & stability.
4-Bit QuantizationRun large models in 4-bit mode with optimized configurations.Save VRAM, run large models on mid-range GPUs.
Automatic Model ManagementComfyUI handles model download & VRAM management automatically.Hassle-free setup, faster experimentation.
Fine-Grained ControlAdjust CFG scale, temperature, top_k, top_p & inference steps.Customize speech style & performance easily.
Robust CompatibilityWorks across eager, sdpa, & SageAttention with smart fallbacks.Stable performance across different hardware.
Emergent CreativityMay generate music, spontaneous sounds, or expressive tones.Adds natural, human-like spontaneity to generated audio.

Screenshots

System Requirements

ComponentMinimum RequirementRecommended Requirement
Operating SystemWindows 10 or later, macOS 11+, Linux (64-bit)Latest Windows 11, macOS Ventura, Ubuntu 22.04
ProcessorIntel i5 / AMD Ryzen 5Intel i7 / Ryzen 7 or higher
RAM8 GB16 GB or more
Storage4 GB free spaceSSD for faster processing
GPU6 GB VRAM (NVIDIA recommended)12 GB+ VRAM for large models
PythonVersion 3.10+Latest stable Python

How to Download & Install VibeVoice with ComfyUI??

Before installation Download the supported version of ComfyUI from here

1. Install via ComfyUI Manager

  1. Open ComfyUI Manager.
  2. Search for ComfyUI-VibeVoice.
  3. Click Install.
  4. Restart ComfyUI & find the new VibeVoice TTS node under audio/tts.

2. Manual Installation

  1. Navigate to your ComfyUI/custom_nodes/ directory.
  2. Open a terminal & clone the repository: git clone https://github.com/wildminder/ComfyUI-VibeVoice.git
  3. Navigate into the folder: cd ComfyUI-VibeVoice
  4. Install dependencies: pip install -r requirements.txt
  5. (Optional) Install SageAttention for advanced performance: pip install sageattention
  6. Restart ComfyUI. The VibeVoice TTS node will now be available.

3. First Use

  • Load reference audio files with ComfyUI’s Load Audio node.
  • Connect them to the speaker inputs on the VibeVoice TTS node.
  • Write your dialogue script in the text field (Speaker 1: Hello, Speaker 2: Hi).
  • Queue the workflow to generate your conversation.

IF you like Open Source AI tools then you might definitely like our Open Source AI tool Collection

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Open Codesign AI design tool

Open CoDesign: Open Source AI Design Tool to Turn Prompts into UI, Prototypes &...

0
Open CoDesign is weird in a good way. You write a prompt. Something shows up next to it. Actual stuff you can use or export. It runs on your laptop. You plug in whatever model you already use, Claude, GPT, Gemini, even Ollama. You can see the agent working, pause it, or just fix one small part instead of starting over. That sounds minor, but it changes how you use it. It’s not perfect. Some outputs miss. Some feel rough. But when it clicks, you go from blank prompt to something usable in minutes. Probably the easiest way to think about it is a design tool that behaves like a coding companion. Just speeds up the part where you turn an idea into something real.
OpenAI Codex CLI opensource

OpenAI Codex CLI: AI Coding Agent That Works in Your Terminal

0
Most AI coding tools stay in your editor or somewhere in the cloud. You type something, they autocomplete, and that’s the whole story. Codex CLI is closer to having a coding assistant in your terminal. You install it, run codex, and that’s it. It just works where you already are. Yeah, it can generate code. Every tool does that now. What I found more useful was throwing it into an existing project and asking 'what is going on here?' It actually traced files, explained stuff, and pointed me in the right direction. Not perfectly, but good enough to save time. It’s also decent at the annoying work. Renaming things, cleaning up code, small refactors. The kind of stuff you keep postponing. That said, don’t blindly trust it. It will give you answers that look right and still be wrong. You still need to think. I wouldn’t use it as a build my whole app tool. But as something that sits in your terminal and helps you move faster? Yeah, that part works.
KillerPDF OpenSource PDF Editor For Windows

KillerPDF: Portable PDF Editor for Windows and a Real Alternative to Adobe Acrobat

0
Most PDF tools push you to upload your files somewhere which is what not many feel comfortable with. That's where KillerPDF solves the problem. You download a zip, extract it, run the EXE. That’s it, nothing running in the background. It handles the usual stuff. Open PDFs, edit text, highlight things, merge files, split pages. The text editing part is better than I expected, it tries to match the original font instead of breaking the layout. There’s search, annotations, signatures, all the basics you’d normally reach for Acrobat to do. And everything stays local.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy