Show HN: Open-source Voice Cloning at 16x real-time: Porting Chatterbox to vLLM

2 RandomBK 0 8/3/2025, 4:00:08 PM github.com ↗
Chatterbox TTS from ResembleAI (https://github.com/resemble-ai/chatterbox) is one of the most accessible and highest-quality Voice Cloning models available today. However, its implementation via HF Transformers left a lot of performance on the table.

This is a pet project I've been building on-and-off. It ports the core of Chatterbox - a 0.5B Llama-architecture model - to vLLM. A lot of ugly hacks and workarounds were needed to make it work, but the end result works.

Outputting at around the same quality level as the original implementation, this port is roughly 5-10x faster, generating a 40min benchmark output in around 2min30s wall time on a 3090 (or 4m30s on a 3060ti). That's almost 16x faster than real-time.

High throughput like this can be itself transformative, enabling scale and efficiency that unblocks new use-cases. I look forward to seeing what the community can do with this!

Disclaimer: This is a personal community project not affiliated with ResembleAI, my employer, or any other entity. The project is based solely on publicly-available information. All opinions are my own and do not necessarily represent the views of my employer.

Comments (0)

No comments yet