back to top
HomeTechAI ModelsHeartMuLa: An Open-Source Suno-Style AI Music Generator You Can Run Locally with...

HeartMuLa: An Open-Source Suno-Style AI Music Generator You Can Run Locally with ComfyUI

- Advertisement -

If you’ve been playing with AI music tools lately, here’s some genuinely good news.

Heartmula has released an open-source AI music foundation model that’s surprisingly close to what tools like Suno AI can do but with a very different philosophy. It gives you something many creators actually want: full control.

With this model, you can generate music directly on your own PC, offline, with no usage limits. What you run, you own & once it’s set up, you can generate as much music as your hardware allows.

In this guide, I’ll show you exactly how to run Heartmula on your PC, step by step, without skipping the confusing parts.

Before we get into the setup, let’s quickly look at what this model can do and why it’s worth trying in the first place.

Demo of HeartMula

Below is the demo video of HeartMula music model showcasing some of its music generations in different styles & languages.

Features of HeartMula

FeatureWhat It DoesWhy It Matters
Open-Source (Apache 2.0)Fully open code and model weightsOpen-Source Freedom: No subscriptions
Suno-Style Music ScriptingSupports [Verse], [Chorus], [Bridge], etc.Structure Control: Custom songs generation
12.5 Hz HeartCodecUltra-efficient audio encoding & decodingHigh Fidelity: Pro-level sound on consumer GPUs
ComfyUI IntegrationVisual node-based workflowCreator Friendly: No scripts, easy experimentation
Full-Length Music OutputGenerates tracks up to ~6 minutesLong-Form Ready: Songs, not just short clips
Multilingual EngineSupports EN, ZH, JP, KR, ESGlobal Reach: Localized music & ads
Expressive Vocal ControlLyrics formatting affects vocal styleMore Emotion: Singing, spoken, and hybrid vocals
HeartTranscriptorWhisper-tuned audio-to-text modelSync-Ready: Lyrics, subtitles, karaoke
Local & Offline ExecutionRuns 100% on your PCData Sovereignty: Prompts never leave your system
VRAM-Optimized LoadingLazy loading + BF16 pipelineAccessible Power: Works on 12–16 GB GPUs

Before You Start

To keep things simple, this guide assumes you’re using ComfyUI’s portable Windows build. If you’re new to ComfyUI, this is the easiest and safest way to get started.

Recommended ComfyUI Version (Windows)

Why CU126?
It’s more widely compatible and tends to be more stable with custom nodes and AI audio models right now.

Minimum System Requirements

GPU: NVIDIA GPU

  • VRAM:
    • 12 GB minimum
    • 16 GB recommended (best audio quality)
  • Enough disk space for model downloads

If ComfyUI runs on your system, you’re good to continue.

Check if Hugging Face CLI Is Installed

HeartMuLa uses Hugging Face to download model files.

  1. Open Command Prompt or Terminal
  2. Navigate to your ComfyUI folder
  3. Run one of the following commands:
hf --help

or

huggingface-cli --help

What to expect:

  • If you see a list of commands → you’re ready
  • If you see command not found → install it

Install Hugging Face CLI (If Needed)

Run this inside the same Python environment ComfyUI uses:

pip install huggingface-hub

This ComfyUI workflow and custom node integration was created by Benji, and it’s an excellent contribution to the open-source community. His work makes it possible to run HeartMuLa directly inside ComfyUI with a clean, minimal workflow. We’ll use Benji’s HeartMuLa ComfyUI workflow to install and run HeartMuLa locally.

Step 1: Install HeartMuLa ComfyUI Custom Nodes

HeartMuLa uses custom nodes in ComfyUI for music generation and lyrics/audio transcription. Follow these steps:

  1. Open Command Prompt and navigate to your ComfyUI folder and in address bar, type cmd and hit enter then in command prompt type:
cd custom_nodes
  1. Download the custom nodes from GitHub:
git clone https://github.com/benjiyaya/HeartMuLa_ComfyUI
HeartMula Comfyui installation process
  1. Install the required Python dependencies:
  2. Stay in custom_nodes folder and run:
..\..\python_embeded\python.exe -m pip install -r .\HeartMuLa_ComfyUI\requirements.txt

This ensures all the libraries needed for HeartMuLa nodes are installed in your ComfyUI environment.

  1. Check that everything is ready
  • Start ComfyUI by simply double-clicking the file named: run_nvidia_gpu.bat
  • Look for messages confirming the custom nodes loaded successfully

File Structure

ComfyUI/custom_nodes/HeartMuLa_ComfyUI/
├── init.py <– The code provided below
├── util/ <– Create this folder
│ └── heartlib/ <– Paste the heartlib SOURCE CODE here
│ ├── init.py
│ ├── pipelines.py
│ ├── models.py
│ └── … (other python files)
└── requirements.txt (Optional: torch, transformers, torchaudio, etc.)

You’re now ready for Step 2

Step 2: Download the HeartMuLa Model Files.

HeartMuLa has multiple model components: the music generator, 3B model, codec, and transcriptor. We’ll use the Hugging Face CLI to download them directly into the correct folder.

1. Go to your ComfyUI models folder

HeartMula Comfyui installation

ComfyUI\models

2. Look for HeartMuLa folder, if it doesn’t exist, you can create it:

Create a folder namedHeartMuLa & don’t open it yet.

HeartMula Comfyui

3. Download the model files using Hugging Face CLI

Open Cmd & Run these commands one by one:

hf download HeartMuLa/HeartMuLaGen --local-dir ./HeartMuLa
hf download HeartMuLa/HeartMuLa-oss-3B --local-dir ./HeartMuLa/HeartMuLa-oss-3B
hf download HeartMuLa/HeartCodec-oss --local-dir ./HeartMuLa/HeartCodec-oss
hf download HeartMuLa/HeartTranscriptor-oss --local-dir ./HeartMuLa/HeartTranscriptor-oss

These commands will automatically place the files into the correct subfolders inside ComfyUI\models\HeartMuLa. Below is how the HeartMula folder should look like:

HeartMula Comfyui installation scr

Step 3: Verify the folder structure

ComfyUI
└── models
    └── HeartMuLa
        ├── HeartMuLa-oss-3B
        ├── HeartCodec-oss
        ├── HeartTranscriptor-oss
        └── gen_config.json
        └── tokenizer.json

This structure is required for the custom nodes to find the models correctly.


Tip

  • If your GPU has 12 GB VRAM, lazy loading will help manage memory.
  • The 7B model isn’t released yet — stick with 3B for now.

Also Read: Forget AI Videos Yume 1.5 Creates Interactive AI Worlds on Your PC

Step 4: Run Your First Music Generation in ComfyUI

music generation ai
  1. Run ComfyUI
  2. In the HeartMuLa custom nodes folder, you’ll find example workflows:
    • Generate Music.json → Music generation
    • Lyrics Transcriber.json → Audio-to-text transcription
  3. Drag & drop the workflow into ComfyUI.
  4. For music generation:
    • In the lyrics node, type your lyrics
    • Below it, type music styles as keywords/tags (piano,happy,wedding)
    • Adjust any settings you want and run → enjoy your generated song
  5. For lyrics transcription:
    • Import Lyrics Transcriber.json
    • Load any audio into the input node
    • Run → get a transcribed text output

That’s it! play around with the nodes, tweak lyrics or styles, and see what your AI can create!

Also Read: Run TRELLIS 2 Locally: Generate High-Quality 3D Models from Images

Need Help or Have Questions?

If you run into any issues, get stuck, or just want tips on better results, drop a comment below
I’ll do my best to help you out.

Wrapping Up

HeartMuLa brings Suno-style AI music generation fully offline, open-source, and ComfyUI-friendly. With portable ComfyUI, drag-and-drop workflows, and simple lyric + style inputs, you can go from idea to full track in minutes.

Install it once, experiment freely, tweak the settings, and let the model do the heavy lifting
If this guide helped you, try pushing the limits, different genres, structures, and languages.

Happy creating!!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

YOU MAY ALSO LIKE
Laguna XS.2 AI Model For Coding By Poolside AI

Laguna XS.2 Feels Like a Model That Was Never Meant to Be Public. It...

0
Poolside AI spent years building AI for governments and public sector clients, the kind of organizations with security requirements so strict that most software never gets near them. Air-gapped deployments, on-premise infrastructure, clearance levels most developers don't think about. That's the world Poolside was operating in while the rest of the AI industry was racing to ship consumer products. Laguna XS.2 is their first open source release. Its Apache 2.0 Licensed, weights on HuggingFace, runs on a Mac with 36GB of RAM and available on Ollama right now. A model trained on the same infrastructure with the same rigor as something built for high security government environments, free for anyone to download and build with. That backstory matters because it shapes what this model actually is. It wasn't built to win a benchmark leaderboard. It was built to work reliably on hard problems in environments where failure is not an option. The open source release is almost an afterthought, a decision to share what they've learned.
Open Source Tools That Do What Your OS Should Have Done Already

8 Open Source Tools That Do What Your OS Should Have Done Already

0
Your OS was built for everyone. Which means it was optimized for no one in particular. The clipboard works the same way it did decades ago. Audio is still one slider for everything. Window management is still a guessing game. And nobody is coming to fix any of it because technically it works. Just not the way you actually want it to. The open source community noticed. And they got to work. These 8 tools don't ask you to switch operating systems or learn a new workflow. They just quietly fix the things that slow you down every single day. Some of them will feel so obvious you'll wonder why your OS never shipped them in the first place.
DeepSeek-V4 Can Hold Your Entire Codebase in One Context Window and It's Open Source

DeepSeek-V4 Can Hold Your Entire Codebase in One Context Window and It’s Open Source

0
Every developer who has worked with long context models knows the feeling. You paste in your codebase, add your requirements, include some examples, and somewhere around the halfway point the model starts forgetting things it read at the top. You get generic answers. It misses files it already saw. The context window is technically full but the model is functionally half-asleep. This is called the performance cliff and it is the real problem with long context AI, not the number itself. DeepSeek-V4 is making a specific claim here. Not just that it supports 1 million tokens, several models do that now. The claim is that it stays useful across that entire window by fundamentally changing how attention works at scale. In the 1M token setting, V4-Pro requires only 27% of the compute per token and 10% of the KV cache compared to DeepSeek-V3.2. It is MIT licensed. Weights are on HuggingFace right now. And they shipped two models simultaneously, which means there is an actual choice to make depending on what you are building.

Don’t miss any Tech Story

Subscribe To Firethering NewsLetter

You Can Unsubscribe Anytime! Read more in our privacy policy