Show HN: Make your own voice AI in two clicks
18 unmute-sh 5 5/13/2025, 11:36:34 AM unmute.sh ↗
Upload a voice and write a personality prompt, or try the pre-made characters.
Built by augmenting Gemma 3 12B with our new text-to-speech and speech-to-text models, both of which we will release as open-source soon. Stay tuned.
To the author: what happens to my voice after I upload it? What is your plan moving forward? I am too far left field to understand how to build a business and monetize an open source product like this, even though I found it fun to play around with.
edit: Ah yes, and we do not store the voice sample on our server. The voice embedding is cached for 24 hours.
The latency is about 500ms once we detect that it's the bot's turn to speak (roughly 200ms for the LLM's time-to-first token and 300ms for the TTS audio to start), plus a variable time for the semantic pause detection (VAD).
If it's clear that you're done talking, like when you ask a question, the model will reply very fast. If you stop mid-sentence as if you have more to say, it will wait for longer to avoid interrupting you.