Show HN: Vectorless RAG

4 mingtianzhang 1 8/20/2025, 6:54:23 PM colab.research.google.com ↗

Comments (1)

jimmytucson · 59m ago
So if I understand this correctly, this works on a single large document whose size exceeds what you can or want to put into a single context frame for answering a question? It first "indexes" the document by feeding successive "proto-chunks" to an LLM, along with an accumulator, which is like a running table of contents into the document with "sections" that the indexer LLM decides on and summarizes, until the table of contents is complete. (What we're calling "sections" here - these are still "chunks", they're just not a fixed size and are decided on by the indexer at build time?)

Then for the retrieval stage, it presents the table of contents to a "retriever" LLM, which decides which sections are relevant to the question based on the summaries the indexer LLM created. Then for the answer generation stage, it just presents those relevant sections along with the question.

That's pretty clever - does it work with a corpus of documents as well, or just a single large document? Does the "indexer" know the question ahead of time, or is the creation of sections and section summarization supposed to be question-agnostic? What if your table of contents gets too big? Seems like then it just becomes normal RAG, where you have to store the summaries and document-chunk pointers in some vector or lexical database?