Show HN: VerbatimRAG – RAG that returns only exact text from documents

1 justacoolname 0 8/12/2025, 3:31:36 PM
I built VerbatimRAG to solve a specific problem: RAG systems that retrieve the right documents but then paraphrase incorrectly, introducing factual errors (hallucinations).

Instead of letting an LLM generate responses based on retrieved context, VerbatimRAG extracts and returns exact text spans from source documents. Every word in the output exists verbatim in your documents.

Technical approach:

- Fine-tuned ModernBERT on RAGBench dataset for span classification (relevant/not relevant)

- Documents chunked with Docling/Chonkie and indexed with SPLADE for sparse retrieval

- Query-time: retrieve → classify spans → compose response from exact quotes using dynamic templates

- Each span includes citation back to source document

Trade-offs:

- Responses can be choppy since they're composed of exact quotes

- No summarization or synthesis across documents

- Works poorly for conversational/creative tasks

An interesting part: You can run the entire pipeline without any LLM - just embeddings + our ModernBERT extractor. With SPLADE embeddings, it runs entirely on CPU.

- Code: https://github.com/KRLabsOrg/verbatim-rag (MIT)

- Paper: https://aclanthology.org/2025.bionlp-share.8.pdf

- HuggingFace model: https://huggingface.co/KRLabsOrg/verbatim-rag-modern-bert-v1

We can imagine this approach more in applications where accuracy matters more than fluency (e.g. compliant heavy domains).

Curious if others have tried similar "constrained generation" approaches.

Comments (0)

No comments yet