Vectorless: open-source PDF chatbot without RAG

3 richardmeng 0 8/11/2025, 4:09:04 AM
Open-sourcing "Vectorless", a new PDF chatbot without embedding vectors.

Github Repo: https://github.com/roe-ai/vectorless-chatbot Demo app: https://vectorless-chatbot.vercel.app/

How it works: 1. Select best docs – Feed the LLM high-level descriptions + doc names. It picks which docs to use. 2. Select best pages – The Agent goes through the doc pages and pulls out the most relevant pages for your question. 3. Gather and answer – Agent takes all the relevant pages from step 2 and gives you the final answer.

Advantages 1. It's more predictable than vectors. You can tell the Agent exactly how you want to analyze your files. 2. You can ask abstract questions like: “How does NVIDIA compare to AMD in terms of risk?” 3. You can ask aggregate questions like: “How many questions in this SOC 2 report are marked negative?” 4. It supports multimodal questions and documents by nature.

Disadvantages 1. To work in a scalable setup, step 1 relies on high quality metadata over the documents. 2. Step 2 can be wasteful if the user asks a simple follow-up question, the context can be reused. 3. Slower than vector search chat.

How it will scale: 1. We envision a structured metadata retrieval via text to SQL to locate the paths of documents based on the user's questions at step 1. 3. Step 2 can be improved by caching. We envision when a document is queried once, a table of content can be stored, evolved, and leveraged as future questions come in.

Comments (0)

No comments yet