Show HN: Hybrid Knowledge Graph and RAG for Legal Documents (Learning Project)

5 srijanshukla18 2 7/20/2025, 11:09:38 AM github.com ↗
Built this as a toy project to understand knowledge graphs by tackling a real problem: traditional RAG fails badly on legal documents because it misses interconnections between sections.

The system actually combines both approaches on every query - gets semantic matches via TF-IDF, retrieves structural relationships from Neo4j, then feeds both contexts to OpenAI for comprehensive answers.

Used the Indian Income Tax Act as test data since legal documents have natural graph structures. Queries like "What sections reference Section 80C?" get both the reference network AND content explanations.

Full transparency: includes some AI-assisted code as I was learning Neo4j/graph concepts, but the hybrid architecture and problem framing are mine.

Tech stack: Python, Neo4j, OpenAI API, scikit-learn (TF-IDF), numpy. Docker + Makefile for easy setup.

Would love feedback on this pattern for other structured documents.

Comments (2)

tushr · 3h ago
This is an interesting projects. Did you get to explore how you can do the same with the dynamic documents? For instance, a google doc or MS Word file which keeps on changing with every change the graph must update.
srijanshukla18 · 2h ago
oh great question, hadn't thought of that. But this particular project the TF-IDF vectors and KG building is all local, and the main.py builds both during every start can be adding a hook to listen for file changes and rebuild KG and vectors I guess