Show HN: A web browser agent in your Chrome side panel

45 parsabg 17 5/18/2025, 11:48:42 AM github.com ↗
Hey HN,

I'm excited to share BrowserBee, a privacy-first AI assistant in your browser that allows you to run and automate tasks using your LLM of choice (currently supports Anthropic, OpenAI, Gemini, and Ollama). Short demo here: https://github.com/user-attachments/assets/209c7042-6d54-4fc...

Inspired by projects like Browser Use and Playwright MCP, its main advantage is the browser extension form factor which makes it more convenient for day to day use, especially for less technical users. Its also a bit less cumbersome to use on websites that require you to be logged in, as it attaches to the same browser instance you use (on privacy: the only data that leaves your browser is the communication with the LLM - there is no tracking or data collection of any sort).

Some of its core features are as follows:

- a memory feature which allows users to memorize common and useful pathways, making the next repetition of those tasks faster and cheaper

- real-time token counting and cost tracking (inspired by Cline)

- an approval flow for critical tasks such as posting content or making payments (also inspired by Cline)

- tab management allowing the agent to execute tasks across multiple tabs

- a range of browser tools for navigation, tab management, interactions, etc, which are broadly in line with Playwright MCP

I'm actively developing BrowserBee and would love to hear any thoughts, comments, or feedback.

Feel free to reach out via email: parsa.ghaffari [at] gmail [dot] com

-Parsa

Comments (17)

nico · 1h ago
Looks amazing, love it. And I see that in your roadmap the top thing is saving/replaying sessions

Related to that, I'd suggest also adding the ability to "templify" sessions, ie. turn sessions into sort of like email templates, with placeholder tags or something of the like, that either ask the user for input, or can be fed input from somewhere else (like an "email merge")

So for example, if I need to get certain data from 10 different websites, either have the macro/session ask me 10 times for a new website (or until I stop it), or allow me to just feed it a list

Anyway, great work! Oh also, if you want to be truly privacy-first you could add support for local LLMs via ollama

parsabg · 12m ago
Thank you!

I like that suggestion. Saved prompts seem like an obvious addition, and having templating within them makes sense. I wonder how well would "for each of the following websites do X" prompts work (so have the LLM do the enumeration rather than the client - my intuition is that it won't be as robust because of the long accumulated context)

dbdoskey · 54m ago
Looks amazing. Would love something like this in Firefox or Zen. Mozilla released Orbit, but it was never something that ended up really being useful.
parsabg · 7m ago
Thank you! :)

Would love to explore a FF port. Right now, there are a couple of tight Chrome dependencies:

- CDP - mostly abstracted away by Playwright so perhaps not a big lift

- IndexedDB for storing memories and potentially other user data - not sure if there's a FF equivalent

tux3 · 42m ago
Firefox already has something similar natively, but it's not enabled by default. If you turn on the new sidebar they have an AI panel, which basically looks like an iframe to the Claude/OAI/Gemini/etc chat interface. Different from Orbit.
barbazoo · 43m ago
> Since BrowserBee runs entirely within your browser (with the exception of the LLM), it can safely interact with logged-in websites, like your social media accounts or email, without compromising security or requiring backend infrastructure.

Does it send the content of the website to the LLM?

parsabg · 18m ago
yes, the LLM can invoke observation tools (e.g. read the text/DOM or take a screenshot) to retrieve the context it needs to take the next action
dataviz1000 · 53m ago
You might be able to reduce the amount of information sent to the LLM by 100 fold if you use a stacking context. Here is an example of one made available on Github (not mine). [0] Moreover, you will be able to parse the DOM or have strategies that parse the DOM. For example, if you are only concerned with video, find all the videos and only send that information. Perhaps parsing a page once finding the structure and caching that so the next time only the required data is used. (I see you are storing tool sequence but I didn't find an example of storing a DOM structure so that requests to subsequent pages are optimized.)

If someone visits my website that I control using your Chrome Extension, I will 100% be able to find a way to drain all their accounts probably in the background without them even knowing. Here are some ideas about how to mitigate that.

The problem with Playwright is that it requires Chrome DevTools Protocol (CDP) which opens massive security problems for a browser that people use for their banking and managing anything that involves credit cards are sensitive accounts. At one point, I took the injected folder out of Playwright and injected it into a Chrome Extension because I thought I needed its tools, however, I quickly abandoned it as it was easy to create workflows from scratch. You get a lot of stuff immediately by using Playwright but likely you will find it will be much lighter and safer to just implement that functionality by yourself.

The only benefit of CDP for normal use is allowing automation of any action in the Chrome Extension that requires trusted events, e.g. play sound, go fullscreen, banking websites what require trusted event to transfer money. I'm my opinion, people just want a large part of the workflow automated and don't mind being prompted to click a button when trusted events are required. Since it doesn't matter what button is clicked you can inject a big button that says continue or what is required after prompting the user. Trusted events are there for a reason.

[0] https://github.com/andreadev-it/stacking-contexts-inspector

A4ET8a8uTh0_v2 · 29m ago
Interesting. I can't play with it now since out for grocery run, but can it interact with elements on the page if asked directly?
parsabg · 10m ago
yes, you can ask it to both observe (e.g. query an element) or interact with (e.g. click on) elements, for example using selectors or a high level reference like the label or the color of a button
rizs12 · 37m ago
Aren't browsers starting to ship with built-in LLMs? I don't know much about this but if so then surely your extension won't need to send queries to LLM APIs?
m0rde · 2h ago
This looks fun, thanks for sharing. Will definitely give it a shot soon.

I read over the repo docs and was amazed at how clean and thorough it all looks. Can you share your development story for this project? How long did it take you to get here? How much did you lean on AI agents to write this?

Also, any plans for monetization? Are you taking donations? :)

parsabg · 7s ago
Thanks a lot! :)

I might write a short post on the development process, but in short:

- started development during Easter so roughly a month so far

- developed mostly using Cline and Claude 3.7

- inspired and borrowed heavily by Cline, Playwright MCP, and Playwright CRX which had solved a lot of the heavy lifting already - in a sense this project is those 3 glued together

I don't plan to monetize it directly, but I've thought about an opt-in model for contributing useful memories to a central repository that other users might benefit from. My main aim with it is to promote open source AI tools.

donclark · 1h ago
Looks great. Any plans for this to work in Firefox?
parsabg · 5m ago
I'll be exploring a FF port. There are a couple of tight Chrome dependencies that need to be rethought (IndexedDB for storage and CDP for most actions)
dmos62 · 1h ago
I presume that this works by processing the html and feeding to the llm. What approaches did you take for doing this? Or am I wrong?
fermuch · 44m ago
Under the "tools" part of the README it shows the following observation tools: - browser_snapshot_dom - browser_query - browser_accessible_tree - browser_read_text - browser_screenshot

So most likely the LLM can chose how to "see" the page?