Higgs Audio V2 is an advanced, open-source audio generation model developed by Boson AI, designed to produce highly expressive and lifelike speech with robust multi-speaker dialogue capabilities.
Some Highlights:
* Trained on 10M hours of diverse audio — speech, music, sound events, and natural conversations
* Built on top of Llama 3.2 3B for deep language and acoustic understanding
* Runs in real-time and supports edge deployment — smallest versions run on Jetson Orin Nano
* Outperforms GPT-4o-mini-tts and ElevenLabs v2 in prosody, emotional expressiveness, and multi-speaker dialogue
Some Highlights:
* Trained on 10M hours of diverse audio — speech, music, sound events, and natural conversations
* Built on top of Llama 3.2 3B for deep language and acoustic understanding
* Runs in real-time and supports edge deployment — smallest versions run on Jetson Orin Nano
* Outperforms GPT-4o-mini-tts and ElevenLabs v2 in prosody, emotional expressiveness, and multi-speaker dialogue
* Zero-shot natural multi-speaker dialogues — voices adapt tone, energy, and emotion automatically
* Zero-shot voice cloning with melodic humming and expressive intonation — no fine-tuning needed
* Multilingual support with automatic prosody adaptation for narration and dialogue
* Simultaneous speech and background music generation — a first for open audio foundation models
* High-fidelity 24kHz audio output for studio-quality sound on any device
* Open source and commercially usable — no barriers to experimentation or deployment
Model on Huggingface: https://huggingface.co/bosonai/higgs-audio-v2-generation-3B-...