Show HN: Sourcebot, the self-hosted Perplexity for your codebase
We’re Brendan and Michael, the creators of Sourcebot (https://www.sourcebot.dev/), a self-hosted code understanding tool for large codebases. We originally launched on HN 9 months ago with code search (https://news.ycombinator.com/item?id=41711032), and we’re excited to share our newest feature: Ask Sourcebot.
Ask Sourcebot is an agentic search tool that lets you ask complex questions about your entire codebase in natural language, and returns a structured response with inline citations back to your code. Some types of questions you might ask:
- “How does authentication work in this codebase? What library is being used? What providers can a user log in with?” (https://demo.sourcebot.dev/~/chat/cmdpjkrbw000bnn7s8of2dm11)
- “When should I use channels vs. mutexes in go? Find real usages of both and include them in your answer” (https://demo.sourcebot.dev/~/chat/cmdpiuqhu000bpg7s9hprio4w)
- “How are shards laid out in memory in the Zoekt code search engine?” (https://demo.sourcebot.dev/~/chat/cmdm9nkck000bod7sqy7c1efb)
- "How do I call C from Rust?" (https://demo.sourcebot.dev/~/chat/cmdpjy06g000pnn7ssf4nk60k)
You can try it yourself here on our demo site (https://demo.sourcebot.dev/~) or checkout our demo video (https://youtu.be/olc2lyUeB-Q).
How is this any different from existing tools like Cursor or Claude code?
- Sourcebot solely focuses on code understanding. We believe that, more than ever, the main bottleneck development teams face is not writing code, it’s acquiring the necessary context to make quality changes that are cohesive within the wider codebase. This is true regardless if the author is a human or an LLM.
- As opposed to being in your IDE or terminal, Sourcebot is a web app. This allows us to play to the strengths of the web: rich UX and ubiquitous access. We put a ton of work into taking the best parts of IDEs (code navigation, file explorer, syntax highlighting) and packaging them with a custom UX (rich Markdown rendering, inline citations, @ mentions) that is easily shareable between team members.
- Sourcebot can maintain an up-to date index of thousands of repos hosted on GitHub, GitLab, Bitbucket, Gerrit, and other hosts. This allows you to ask questions about repositories without checking them out locally. This is especially helpful when ramping up on unfamiliar parts of the codebase or working with systems that are typically spread across multiple repositories, e.g., micro services.
- You can BYOK (Bring Your Own API Key) to any supported reasoning model. We currently support 11 different model providers (like Amazon Bedrock and Google Vertex), and plan to add more.
- Sourcebot is self-hosted, fair source, and free to use.
Under the hood, we expose our existing regular expression search, code navigation, and file reading APIs to a LLM as tool calls. We instruct the LLM via a system prompt to gather the necessary context via these tools to sufficiently answer the users question, and then to provide a concise, structured response. This includes inline citations, which are just structured data that the LLM can embed into it’s response and can then be identified on the client and rendered appropriately. We built this on some amazing libraries like the Vercel AI SDK v5, CodeMirror, react-markdown, and Slate.js, among others.
This architecture is intentionally simple. We decided not to introduce any additional techniques like vector embeddings, multi-agent graphs, etc. since we wanted to push the limits of what we could do with what we had on hand. We plan on revisiting our approach as we get user feedback on what works (and what doesn’t).
We are really excited about pushing the envelope of code understanding. Give it a try: https://github.com/sourcebot-dev/sourcebot. Cheers!