Fast dialog aware sentence splitter

2 SteveJS 1 7/19/2025, 5:54:07 AM github.com ↗

Comments (1)

SteveJS · 17h ago
I made this to try out Claude code. I have the $17/mo thing, and i don’t know rust. (I do know plenty of other languages.) Rust felt like a scripting language when used this way. I used a task system to force getting to a git commit before auto-compact. The completed tasks are in the repo allowing one to see what starting context kicked off the changes. It worked much of the time. It’s 8k of rust, 12k of markdown and i think the markdown helps to correctly interact with a codebase using agents just as unit tests assist in refactoring. On my I9 with a local gutenberg mirror this e2e discovers 20k+ english novels, splits them into sentences, normalizes the sentences, keeps the origination text span and writes it out as tsv’s. It takes 7 seconds to complete that for 7 Gb of novels. Most importantly it splits the sentences the way i needed for the start of my pipeline.

Definitely interested if anyone find cases where it mis-splits english text from a novel.