Show HN: Sourcebot – Self-hosted Perplexity for your codebase

82 bshzzle 20 7/30/2025, 2:44:13 PM github.com ↗
Hi HN,

We’re Brendan and Michael, the creators of Sourcebot (https://www.sourcebot.dev/), a self-hosted code understanding tool for large codebases. We originally launched on HN 9 months ago with code search (https://news.ycombinator.com/item?id=41711032), and we’re excited to share our newest feature: Ask Sourcebot.

Ask Sourcebot is an agentic search tool that lets you ask complex questions about your entire codebase in natural language, and returns a structured response with inline citations back to your code. Some types of questions you might ask:

- “How does authentication work in this codebase? What library is being used? What providers can a user log in with?” (https://demo.sourcebot.dev/~/chat/cmdpjkrbw000bnn7s8of2dm11)

- “When should I use channels vs. mutexes in go? Find real usages of both and include them in your answer” (https://demo.sourcebot.dev/~/chat/cmdpiuqhu000bpg7s9hprio4w)

- “How are shards laid out in memory in the Zoekt code search engine?” (https://demo.sourcebot.dev/~/chat/cmdm9nkck000bod7sqy7c1efb)

- "How do I call C from Rust?" (https://demo.sourcebot.dev/~/chat/cmdpjy06g000pnn7ssf4nk60k)

You can try it yourself here on our demo site (https://demo.sourcebot.dev/~) or checkout our demo video (https://youtu.be/olc2lyUeB-Q).

How is this any different from existing tools like Cursor or Claude code?

- Sourcebot solely focuses on code understanding. We believe that, more than ever, the main bottleneck development teams face is not writing code, it’s acquiring the necessary context to make quality changes that are cohesive within the wider codebase. This is true regardless if the author is a human or an LLM.

- As opposed to being in your IDE or terminal, Sourcebot is a web app. This allows us to play to the strengths of the web: rich UX and ubiquitous access. We put a ton of work into taking the best parts of IDEs (code navigation, file explorer, syntax highlighting) and packaging them with a custom UX (rich Markdown rendering, inline citations, @ mentions) that is easily shareable between team members.

- Sourcebot can maintain an up-to date index of thousands of repos hosted on GitHub, GitLab, Bitbucket, Gerrit, and other hosts. This allows you to ask questions about repositories without checking them out locally. This is especially helpful when ramping up on unfamiliar parts of the codebase or working with systems that are typically spread across multiple repositories, e.g., micro services.

- You can BYOK (Bring Your Own API Key) to any supported reasoning model. We currently support 11 different model providers (like Amazon Bedrock and Google Vertex), and plan to add more.

- Sourcebot is self-hosted, fair source, and free to use.

Under the hood, we expose our existing regular expression search, code navigation, and file reading APIs to a LLM as tool calls. We instruct the LLM via a system prompt to gather the necessary context via these tools to sufficiently answer the users question, and then to provide a concise, structured response. This includes inline citations, which are just structured data that the LLM can embed into it’s response and can then be identified on the client and rendered appropriately. We built this on some amazing libraries like the Vercel AI SDK v5, CodeMirror, react-markdown, and Slate.js, among others.

This architecture is intentionally simple. We decided not to introduce any additional techniques like vector embeddings, multi-agent graphs, etc. since we wanted to push the limits of what we could do with what we had on hand. We plan on revisiting our approach as we get user feedback on what works (and what doesn’t).

We are really excited about pushing the envelope of code understanding. Give it a try: https://github.com/sourcebot-dev/sourcebot. Cheers!

Comments (20)

nkmnz · 8h ago
How does this compare to ingesting all your code into some RAG tool and using that in a chat? I understand the citations part, which is a cool feature indeed, but especially tools for graph-RAG, such as graphiti https://github.com/getzep/graphiti can deliver so much more information that can be stored in a graph versus the code-repository alone, such as info about collaborators, infrastructure, metrics, logs, etc. pp.
bshzzle · 8h ago
You certainly could create an embedding of your code and then hooking it up to OpenWeb UI or equivalent as a chat interface - we've actually spoked to some teams that have rolled their own custom solution like that!

From a product POV: our main focus with Sourcebot is providing a world-class DX and UX so that it is really easy to use. Practically speaking, for DX: a sys-admin should be able to throw Sourcebot up into their cluster in minutes with minimal maintenance overhead. For UX: provide a snappy interface that is minimal and gets out of your way.

From a technology POV: vector embeddings (and techniques like graph-RAG) are definitely something we are going to investigate as a means of improving the agent's ability to find relevant context fast. Bringing in additional context sources (like git history, logs, GitHub issues, etc.) is also something we plan to investigate. It's a really fascinating problem :)

bravura · 6h ago
I was very excited for a strong off-the-shelf code vector embedding search tool.

I wanted to encourage you to explore that direction, since it's a) very powerful, b) annoying to hand-roll, and thus c) sorely needed as open source.

cobbzilla · 8h ago
Love this idea, docs are good I just need to read them better :)

Trying it out now. Keep it fully open source and nicely pluggable and I'll keep being a fan!

bshzzle · 8h ago
Ah I was just replying to your previous comment - I'm guessing you found this? ;) https://docs.sourcebot.dev/docs/connections/local-repos

Thanks for the support!

cobbzilla · 5h ago
Yes, thanks! I opened an issue on your support site. I got stuck on a file ownership error when trying to mount local repos. Excited to try it if I can get it to work :)
prepend · 6h ago
So can I use Functional Source licensed code in internal products if I’m a commercial org?
msukkarieh · 6h ago
hey I'm Michael (the other cofounder). If the products are purely internal[1] then you're able to use, modify, and distribute the code as you please (even if you're a commercial org). If you have any additional questions about the license feel free to reach out at license@sourcebot.dev

The Fair Source website is a great resource to learn more: https://fair.io/

[1] The only restriction on the code is that it cannot be used for a commercial product that substitutes for our software. We have a few teams that have connected Sourcebot into internal dev dashboards! This is 100% allowed by the license

dchuk · 7h ago
In reading the docs, it doesn't look like the MCP server supports the Ask Sourcebot capability. Is that correct or am I missing something in the docs? Is that planned to be added?
bshzzle · 7h ago
Yea they are currently separate - the MCP server exposes out the same tools that Ask Sourcebot uses, but the actual LLMs call is on the MCP client. It would be interesting to merge them though - maybe have a Exa style MCP tool that lets MCP clients ask questions similar to how we are doing it with Ask Sourcebot.

Would be great to hear more about your use case though.

cuzinluver · 1d ago
Love that it’s free to use
Alifatisk · 9h ago
I thought this had anything to do with Perplexity
bshzzle · 9h ago
We used Perplexity as a mental mapping since there is some overlap, e.g., LLMs using search and citing its sources, it's a webapp, etc.
drcongo · 8h ago
This looks pretty neat. Just spotted in the docs that it has an MCP server too, however, I haven't found anything in the docs about using a locally hosted model. Running this on a box in the corner of the office would be great, but external AI providers would be a deal breaker.
bshzzle · 8h ago
Running Sourcebot with a self-hosted LLM is something we plan to support and have documented in the golden path very soon, so stay tuned.

We are using the Vercel AI SDK which supports Ollama via a community provider, but doesn't V5 yet (which Sourcebot is on): https://v5.ai-sdk.dev/providers/community-providers/ollama

hahaxdxd123 · 7h ago
I got this set up and working in basically 5 minutes. Going to try to set it up at work. Super cool! It seems like the open source version already has a bunch of features, how do you plan on making sure you can sustainably support it?
bshzzle · 6h ago
awesome glad to hear! We are monetizing enterprise features like audit logging and SSO. The core product will remain free and under a FSL license.
cweagans · 2h ago
SSO is not an enterprise feature :( https://sso.tax

I'm using OIDC SSO (via Pocket ID) just for my own sanity. I don't want or need multiple sets of credentials for my home lab applications.

skybrian · 1h ago
Why not use a password manager instead?
cweagans · 1h ago
That is an orthogonal solution to SSO. I have many apps in my home lab. It doesn't make sense to have individual credentials for everything, even if it is effectively free to keep track of them. Rotating dozens of passwords (even spread out over time) is not my idea of a fun day, nor is supporting individual logins for friends/family who use the apps in my network.

SSO is the quick and easy way, especially when other people are involved.