Show HN: We create visual codebase maps that scale (static analysis and LLMs)
Alex and I are devs, and we've noticed that recently we've been super productive at writing code (prompting :D). But when it comes to understanding big systems, prompting doesn't work that well — for that, diagrams are best imo. Most tools out there don't scale to big projects (e.g. PyTorch), so we're building CodeBoarding — a recursive visualizer for codebases. It starts from the highest level of abstractions and lets you dive deeper. We use static analysis and LLM agents. The control-flow graph is our starting point, and we validate the LLM's analysis against the static analysis output. LLMs alone often hallucinate or apply familiar architectural patterns that don't actually exist in the code.
Since this is a concise representation of a codebase, we also added an MCP-server to provide our docs for the libs your project depends on — reducing hallucinations and avoiding blowing up the context window. The vision: With agents writing more and more code, we think we also need a concise representation for it — diagrams. But for that to work, the diagrams have to be accurate, and that's why static analysis has to take part in the fun ;d.
Would love to hear what you think about diagram representations for code, what problems you've run into with hallucinations while vibe-coding (even with tools like gitingest/context7), and any general feedback :)
But comparing the control-flow-graph probably makes more sense for a big refactor commits, as it might blow the context. However so far we haven't seen this be an issue.
Curious to hear what was your approach when building diagram represnetation!