Show HN: VerbatimRAG – RAG that returns only exact text from documents
Instead of letting an LLM generate responses based on retrieved context, VerbatimRAG extracts and returns exact text spans from source documents. Every word in the output exists verbatim in your documents.
Technical approach:
- Fine-tuned ModernBERT on RAGBench dataset for span classification (relevant/not relevant)
- Documents chunked with Docling/Chonkie and indexed with SPLADE for sparse retrieval
- Query-time: retrieve → classify spans → compose response from exact quotes using dynamic templates
- Each span includes citation back to source document
Trade-offs:
- Responses can be choppy since they're composed of exact quotes
- No summarization or synthesis across documents
- Works poorly for conversational/creative tasks
An interesting part: You can run the entire pipeline without any LLM - just embeddings + our ModernBERT extractor. With SPLADE embeddings, it runs entirely on CPU.
- Code: https://github.com/KRLabsOrg/verbatim-rag (MIT)
- Paper: https://aclanthology.org/2025.bionlp-share.8.pdf
- HuggingFace model: https://huggingface.co/KRLabsOrg/verbatim-rag-modern-bert-v1
We can imagine this approach more in applications where accuracy matters more than fluency (e.g. compliant heavy domains).
Curious if others have tried similar "constrained generation" approaches.
No comments yet