Show HN: Teach Yourself Systems – Zero-Install Python Sims for Systems Thinking (teachyourselfsystems.com)

Bhavnick and I have been building personal projects with LLMs for a few years. For much of this time, we found ourselves writing our own chunking logic to support RAG applications. We often hesitated to use existing libraries because they either had only basic features or felt too bloated (some are 80MB+).

We built Chonkie to be lightweight, fast, extensible, and easy. The space is evolving rapidly, and we wanted Chonkie to be able to quickly support the newest strategies. We currently support: Token Chunking, Sentence Chunking, Recursive Chunking, Semantic Chunking, plus:

- Semantic Double Pass Chunking: Chunks text semantically first, then merges closely related chunks.

- Code Chunking: Chunks code files by creating an AST and finding ideal split points.

- Late Chunking: Based on the paper (https://arxiv.org/abs/2409.04701), where chunk embeddings are derived from embedding a longer document.

- Slumber Chunking: Based on the "Lumber Chunking" paper (https://arxiv.org/abs/2406.17526). It uses recursive chunking, then an LLM verifies split points, aiming for high-quality chunks with reduced token usage and LLM costs.

You can see how Chonkie compares to LangChain and LlamaIndex in our benchmarks: https://github.com/chonkie-inc/chonkie/blob/main/BENCHMARKS....

Some technical details about the Chonkie package: - ~15MB default install vs. ~80-170MB for some alternatives. - Up to 33x faster token chunking compared to LangChain and LlamaIndex in our tests. - Works with major tokenizers (transformers, tokenizers, tiktoken). - Zero external dependencies for basic functionality. - Implements aggressive caching and precomputation. - Uses running mean pooling for efficient semantic chunking. - Modular dependency system (install only what you need).

In addition to chunking, Chonkie also provides an easy way to create embeddings. For supported providers (SentenceTransformer, Model2Vec, OpenAI), you just specify the model name as a string. You can also create custom embedding handlers for other providers.

RAG is still the most common use case currently. However, Chonkie makes chunks that are optimized for creating high quality embeddings and vector retrieval, so it is not really tied to the "generation" part of RAG. In fact, We're seeing more and more people use Chonkie for implementing semantic search and/or setting context for agents.

We are currently focused on building integrations to simplify the retrieval process. We've created "handshakes" – thin functions that interact with vector DBs like pgVector, Chroma, TurboPuffer, and Qdrant, allowing you to interact with storage easily. If there's an integration you'd like to see (vector DB or otherwise), please let us know.

We also offer hosted and on-premise versions with OCR, extra metadata, all embedding providers, and managed vector databases for teams that want a fully managed pipeline. If you're interested, reach out at shreyash@chonkie.ai or book a demo: https://cal.com/shreyashn/chonkie-demo.

We're eager to hear your feedback and comments! Thanks!

Comments (21)

mritchie712 · 32m ago

We (https://www.definite.app/) have a use case I'd imagine is common for people building agents.

When a user works with our agent, they may end up with a large conversation thread (e.g. 200k+ tokens) with many SQL snippets, query results and database metadata (e.g. table and column info).

For example, if they ask "show me any companies that were heavily engaged at one point, but I haven't talked to in the last 90 days". This will pull in their schema (e.g. Hubspot), run a bunch of SQL, show them results, etc.

I want to allow the agent to search previous threads for answers so they don't need to have the conversation again, but chunking up the existing thread is non-trivial (e.g. you don't want to separate the question and answer, you may want to remove errors while retaining the correction, etc.).

Do you have any plans to support "auto chunking" for AI message[0] threads?

0 - e.g. https://platform.openai.com/docs/api-reference/messages/crea...

yawnxyz · 1h ago

I'm curious if chunking is different for embeddings vs. for "agentic retrieval" e.g. an AI or a person operates like a Librarian; they look up in an index at what resources to look up, get the relevant bits, then piece them together into a cohesive narrative whole — would we do any chunking at all for this, or does this purely rely on the way the DB is setup? I think for certain use cases, even a single DB record could be too large for context windows, so maybe chunking might need to be done to the record? (e.g. a db of research papers)

snyy · 38m ago

Great questions!

Chunking fundamentals remain the same whether you're doing traditional semantic search or agentic retrieval. The key difference lies in the retrieval strategy, not the chunking approach itself.

For quality agentic retrieval, you still need to create a knowledge base by chunking documents, generating embeddings, and storing them in a vector database. You can add organizational structure here—like creating separate collections for different document categories (Physics papers, Biology papers, etc.)—though the importance of this organization depends on the size and diversity of your source data.

The agent then operates exactly as you described: it queries the vector database, retrieves relevant chunks, and synthesizes them into a coherent response. The chunking strategy should still optimize for semantic coherence and appropriate context window usage.

Regarding your concern about large DB records: you're absolutely right. Even individual research papers often exceed context windows, so you'd still need to chunk them into smaller, semantically meaningful pieces (perhaps by section, abstract, methodology, etc.). The agent can then retrieve and combine multiple chunks from the same paper or across papers as needed.

The main advantage of agentic retrieval is that the agent can make multiple queries, refine its search strategy, and iteratively build context—but it still relies on well-chunked, embedded content in the underlying vector database.

amir_karbasi · 1h ago

Looks great! I had looked at Chonkie a few months back, but didn't need it in our pipelines. I was just writing a POC for an agentic chunker this week to handle various formatting and chunking requirements. I'll give Chonkie a shot!

snyy · 37m ago

Awesome! Keep us posted :)

pj_mukh · 1h ago

Super cool!

It looks like size and speed is your major advantage. In our RAG pipeline we run the chunking process async as an onboarding type process. Is Chonkie primarily for people looking to process documents in some sort of real-time scenario?

snyy · 1h ago

In addition to size and speed we also offer the most variety of chunking strategies!

Typically, our current users fall into one of two categories:

- People who are running async chunking but need access to a strategy not supported in langchain/llamaIndex. Sometimes speed matters here too, especially if the user has a high volume of documents

- people who need real time chunking. Super useful for apps like codegen/code review tools.

_epps_ · 1h ago

Excited to try this out! Also +1 for Moo Deng-ish mascot.

greymalik · 2h ago

You’re part of YC but this is open source - how do you plan to make money off of it?

snyy · 2h ago

As mentioned in the other reply, we have a cloud/on-prem offering that comes with a managed ETL pipeline built on top of our OSS offering.

tevon · 2h ago

Looks like they will have a cloud offering, and mentioned in this post are on-prem and managed offerings

Andugal · 2h ago

Congratulations for the launch!

You said that Chonkie works with multiple vector stores. I was wondering what RAG database HN uses? Do you need a specialized one (like Chroma) or is Postgres just fine?

gavmor · 1h ago

Does HN even use a RAG database? What for? They don't even maintain their own search[0].

0. https://hn.algolia.com/

snyy · 2h ago

Not sure what HN uses :)

If you want agents/LLMs to be able to find relevant data based on similarity to queries, vectorDBs like Chroma (or even pgVector) are great.

elliot07 · 2h ago

Chonkie is great software. Congrats on the launch! Has been a pleasure to use so far.

snyy · 2h ago

Thank you :)

pzo · 2h ago

Is this only for node (how about bun/deno)? Have it been tested to work with react native?

snyy · 2h ago

Node and Bun should work. Haven't tested on Deno yet.

We rely on the huggingface/transformers library which might be too heavy for a react-native app.

tevon · 2h ago

Was just looking into chunking strategies today, this looks great! Will update with any feedback.

snyy · 2h ago

Awesome! Keep us posted!

babuloseo · 57m ago

I like the mascot.

Kurt – Copilot for Music Production (trykurt.com)

Universal Basic Income isn't about free money – it's about a freer life (medium.com)

Show HN: A free tool to validate your email template (selfmailkit.com)

A Plan for SIMD (linebender.org)

I Just Launched My "Start a Local Website Business" Helping Kit (0x7bshop.gumroad.com)

WebKit in Safari 26 beta (webkit.org)

"I Hallucinated a Team" – How Far Can Vibe Coding Go? (accelerateordie.com)

Rapid Team Transition to a Bevy-Based Engine (JP) [video] (youtube.com)

A Blockbuster 'Muon Anomaly' May Have Just Disappeared (scientificamerican.com)

Ask HN: Are there any websites, yet that show new Apple liquid glass effects?

The Banks Are Re-Tranching (bloomberg.com)

Is the AI Bubble About to Burst? (versobooks.com)

The Bethesda Declaration (standupforscience.net)

Free and open-source to-do list tracker has changed my life (xda-developers.com)

Show HN: Teach Yourself Systems – Zero-Install Python Sims for Systems Thinking (teachyourselfsystems.com)

Ask HN: Resources for a Green Senior

Can $1k at birth change a child's future? A proposal aims to find out (apnews.com)

CockroachDB 24.2: performance gains, vector indexing and more (cockroachlabs.com)

Over $90K of RTX 5090 GPUs have been replaced in-box by crossbody backpacks (pcgamer.com)

Left to Write: Adaptive Keyboards and Writing Technologies for One-Handed Users (99percentinvisible.org)

The Core of Fermat's Last Theorem Just Got Superpowered (quantamagazine.org)

Assessing the Impact of Nanoplastics in Biological Systems (mdpi.com)

Black-Hat-Zig: Zig for Malware Development and Red Teaming (github.com)

After 3 years, Apolune 2: Endgame VCS Update (steamdb.info)

From Docker and Nix to Apps and Floppy Disks [video] (odysee.com)

Never Forget What They've Done (wheresyoured.at)

Killing X11 (flak.tedunangst.com)

GNOME Human Interface Guidelines (developer.gnome.org)

Building an Audit Readiness Platform for Startups – Would Love Your Feedback

A protester tried to interrupt Apple exec Craig Federighi at WWDC (theverge.com)

Japan's Recent Rice Price Crisis [video] (youtube.com)

Show HN: I got tired of saving and forgetting - built this (snaplinks.ai)

Japan telecom giant NTT Docomo to end own emoji after 26 years (english.kyodonews.net)

NOAA warns staff a militia group thinks its radars are 'weather weapons' (engadget.com)

Unemployment is spiking for US IT pros – unless you want to babysit bots/ (theregister.com)

AWS open sources pgactive: active-active replication extension for PostgreSQL (aws.amazon.com)

Cartridges: Storing long contexts in tiny caches with self-study (hazyresearch.stanford.edu)

Journal likely coming to macOS 26 and iPadOS 26

Follow the Smoke – China-Nexus Threat Actors Hammer at the Doors of Top Targets (sentinelone.com)

Kubernetes is a never-ending wheel of misery. But it doesn't have to be (ingenimax.ai)

Norway Chess 2025 in 7 Graphs (lichess.org)

Apple launches iPadOS 26 with a new look and better multitasking (theverge.com)

Ask HN: Has Apple lost its way?

Emergent Models: a general modeling framework and alternative to Neural Networks (researchhub.com)

We should fund the software we use, not just the software we see (opensourcepledge.com)

DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark (3rd) (twitter.com)

Emergent Models in ML: Cellular Automata and Emergent Intelligence (alphanome.ai)

A bit more on Twitter/X's new encrypted messaging (blog.cryptographyengineering.com)

European Union Official DNS (joindns4.eu)

Recession Watch Metrics (calculatedriskblog.com)

Launch HN: Chonkie (YC X25) – Open-Source Library for Advanced Chunking

Comments (21)