Version 2 of Higgs Audio Generation

1 slacka 1 7/27/2025, 7:59:48 PM boson.ai ↗

Comments (1)

slacka · 18m ago
Higgs Audio V2 is an advanced, open-source audio generation model developed by Boson AI, designed to produce highly expressive and lifelike speech with robust multi-speaker dialogue capabilities.

Some Highlights:

* Trained on 10M hours of diverse audio — speech, music, sound events, and natural conversations

* Built on top of Llama 3.2 3B for deep language and acoustic understanding

* Runs in real-time and supports edge deployment — smallest versions run on Jetson Orin Nano

* Outperforms GPT-4o-mini-tts and ElevenLabs v2 in prosody, emotional expressiveness, and multi-speaker dialogue

* Zero-shot natural multi-speaker dialogues — voices adapt tone, energy, and emotion automatically

* Zero-shot voice cloning with melodic humming and expressive intonation — no fine-tuning needed

* Multilingual support with automatic prosody adaptation for narration and dialogue

* Simultaneous speech and background music generation — a first for open audio foundation models

* High-fidelity 24kHz audio output for studio-quality sound on any device

* Open source and commercially usable — no barriers to experimentation or deployment

Model on Huggingface: https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-...