Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms

13 robviren 0 5/21/2025, 3:07:35 PM github.com ↗

I was blown away by Kokoro and what it managed to do with such little space. I became curious if it would be possible to create new voices by direct manipulation of the style tensors. After many failed attempts I finally landed on a method that properly scores the similarity of two audio segments that works well enough to random walk similar voices for Kokoro. I plan on using this scoring as part of a genetic algorithm, but wanted to baseline test it with this code.

The scoring mechanism using Resemblyzer to calculate similarity to target audio and similarity to another segment of audio it generates itself, self similarity. This self similarity was key in keeping the model stable and the audio consistent across inputs. But it was not enough to prevent over fitting to Resemblyzer.

I had to create a third metric which uses a normalized difference of a variety of audio features compared to the target features. Summing those I get a feature similarity metric which is useful in keeping audio quality from degrading too much and prevents over fitting.

The last challenge was weighting the score while keeping it flexible enough to explore the complex text to speech style space. Using a weighted harmonic mean allowed for back sliding on some metrics for significant improvement in others, which reduced stagnation and worked well enough for the random walk to work.

The results are fairly good. I would say it ends up in the uncanny valley of similarity rather than producing a proper clone of the target voice. It sounds like it might be the target voice, but does well enough to improve similarity from 70% to around 90%. There are probably limitations to the architecture of Kokoro in how close it can possibly sound to other voices, but there is probably some more progress to be made using a more advanced genetic algorithm.

Check out the code, make some new voices, and let me know if you have any ideas on ways to improve.

Ask HN: How is the prospect of US was altering your AGI timelines?

Taking the wind out of dangerous cyclones (reporter.anu.edu.au)

Show HN: REPL is the memory layer for multi-agent AI apps – Sherlog‑MCP (github.com)

Children in England growing up 'sedentary, scrolling and alone', say experts (theguardian.com)

Conscience and the New Cartography of War (blogs.timesofisrael.com)

Beach Boys Bassist Carol Kaye Refuses Rock Hall of Fame Induction (guitarplayer.com)

Bill Gates: 'Welcome to the next phase of the Alzheimer's fight' (gatesnotes.com)

Edward Burra's tour of the 20th century (newstatesman.com)

USAF B-2 Spirit Bombers Have Beds (simpleflying.com)

Radio Garden (radio.garden)

Bluetooth Jammer (github.com)

Wait, Why Is Israel Allowed to Have Nukes? (currentaffairs.org)

CTO's at Meta, Open AI, Palantir Became Lieutenant Colonels in the Army (americancitizen2025.substack.com)

Claude's Token Efficient Tool Use on Amazon Bedrock (community.aws)

Tell HN: Sam and Jony Announcement 404s

I built a CLI tool to scaffold Django apps like in NestJS or Larave (github.com)

Call for more thatching courses to save 'uniquely Irish craft' (rte.ie)

The Void IDE, Open-Source Alternative to Cursor, Released in Beta (infoq.com)

Show HN: I made weekend project (Active Directory Security Assessment Tool) (adsecurityassessment.com)

I Sing the Electric Body – On Syntax (2024) (hedgehogreview.com)

I wrote my PhD Thesis in Typst (fransskarman.com)

I Miss YC – Kanye East (twitter.com)

Building my own paper tape punch (unimplementedtrap.com)

Otus Lisp (yuriy-chumak.github.io)

Owl Lisp (haltp.org)

Long-time rivals Bill Gates and Linus Torvalds meet (tomshardware.com)

NASA's Hubble Watches Jupiter's Great Red Spot Behave Like a Stress Ball (2024) (science.nasa.gov)

Ask HN: Best books for designing a weekly schedule?

Diagnose Before You Delegate (subbu.org)

Boldly going where no one has gone before? – creating a Discovery backlog (markdalgarno.medium.com)

That Revolutionary May Day in 1976 When California Wines Bested France's Finest (smithsonianmag.com)

IDF on alert, possible relocation of Iran's enriched uranium from damaged sites (jpost.com)

Using a Bloom Filter to Anonymize Web Server Logs (jamieweb.net)

Ask HN: How do you think AI will change education?

A scale bridging journey into the nanocosmos of a Ni-base superalloy [video] (youtube.com)

Offshore hydrogen production leaves a hydrographic footprint in the North Sea (nature.com)

Cracking the Mondrian Code (2017) (thebeliever.net)

TV Meets Fruit Machine; William Davies on Faragist TikTok (lrb.co.uk)

Fast type class resolution with a trie (welltypedwit.ch)

Dprint: Code Formatting Platform in Rust (dprint.dev)

GOP tax bill would sell off USPS's brand-new EVs (washingtonpost.com)

Sam x Jony page on OpenAI is deleted (twitter.com)

Go: Where Are Slice.Flatten and Slice.Map (xnacly.me)

Tesla Launches Robotaxi Service in Austin (twitter.com)

Using Home Assistant, adguard home and an $8 smart outlet to avoid brain rot (romanklasen.com)

My Failure Resume (medium.com)

Idea: Zerid Indexing (docs.google.com)

FreeBSD Kernel Modules Pkg(8) Repositories (vermaden.wordpress.com)

AI API Prices are 90% Subsidized (tinyml.substack.com)

Bear Whose Head Was Stuck for Two Years Is Freed (nytimes.com)

Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms

Comments (0)