Show HN: Nxtscape – an open-source agentic browser
-- Why bother building a new browser? For the first time since Netscape was released in 1994, it feels like we can reimagine browsers from scratch for the age of AI agents. The web browser of tomorrow might not look like what we have today.
We saw how tools like Cursor gave developers a 10x productivity boost, yet the browser—where everyone else spends their entire workday—hasn't fundamentally changed.
And honestly, we feel like we're constantly fighting the browser we use every day. It's not one big thing, but a series of small, constant frustrations. I'll have 70+ tabs open from three different projects and completely lose my train of thought. And simple stuff like reordering tide pods from amazon or filling out forms shouldn't need our full attention anymore. AI can handle all of this, and that's exactly what we're building.
Here’s a demo of our early version https://dub.sh/nxtscape-demo
-- What makes us different We know others are exploring this space (Perplexity, Dia), but we want to build something open-source and community-driven. We're not a search or ads company, so we can focus on being privacy-first – Ollama integration, BYOK (Bring Your Own Keys), ad-blocker.
Btw we love what Brave started and stood for, but they've now spread themselves too thin across crypto, search, etc. We are laser-focused on one thing: making browsers work for YOU with AI. And unlike Arc (which we loved too but got abandoned), we're 100% open source. Fork us if you don't like our direction.
-- Our journey hacking a new browser To build this, we had to fork Chromium. Honestly, it feels like the only viable path today—we've seen others like Brave (started with electron) and Microsoft Edge learn this the hard way.
We also started with why not just build an extension. But realized we needed more control. Similar to the reason why Cursor forked VSCode. For example, Chrome has this thing called the Accessibility Tree - basically a cleaner, semantic version of the DOM that screen readers use. Perfect for AI agents to understand pages, but you can't use it through extension APIs.
That said, working with the 15M-line C++ chromium codebase has been an adventure. We've both worked on infra at Google and Meta, but Chromium is a different beast. Tools like Cursor's indexing completely break at this scale, so we've had to get really good with grep and vim. And the build times are brutal—even with our maxed-out M4 Max MacBook, a full build takes about 3 hours.
Full disclosure: we are still very early, but we have a working prototype on GitHub. It includes an early version of a "local Manus" style agent that can automate simple web tasks, plus an AI sidebar for questions, and other productivity features (grouping tabs, saving/resuming sessions, etc.).
Looking forward to any and all comments!
You can download the browser from our github page: https://github.com/nxtscape/nxtscape
Is this a common and well-defined term that people use? I've never heard it.
It would appear to me from the context that it means something like "web browser with AI stuff tackled on".
By "agentic browser" we basically mean a browser with AI agents that can do web navigation tasks for you. So instead of you manually clicking around to reorder something on Amazon or fill out forms, the AI agent can actually navigate the site and do those tasks.
Does having access to Chromium internals give you any super powers over connecting over the Chrome Devtools Protocol?
Few ideas we were thinking of: integrating a small LLM, building MCP store into browser, building a more AI friendly DOM, etc.
Even today, we use chrome's accessibility tree (a better representation of DOM for LLMs) which is not exposed via chrome extension APIs.
You might consider the Accessibility Tree and its semantics. Plain divs are basically filtered out so you're left with interactive objects and some structural/layout cues.
The tl;dr is that it's AI that makes decisions on its own.
A complicated workflow may involve other tools. For example, the input to the LLM may produce something that tells it to set the user-agent to such and such as string:
Other tools could be clicking on things in the page, or even injecting custom JavaScript when a page loads.A chat interface works for ChatGPT, because most folks use it as a pseudo-search, but productivity tools are (broadly speaking) not generative, therefore shouldn't be using freeform inputs. I have many thoughts on fixing this, and it's a very hard problem, but simply slapping an LLM on Chrome is just lazy. I don't mean to be overly negative, but it's kind of wild to see YC funding slop like this.
have linux next on our radar. What build do you want?
What is the tech around the thing that segments out DOM elements automatically and shows the visual representation. I think something like this would be great for automated UI testing agents?
While reviewing the prompt's capabilities, I had an idea: implementing a Greasemonkey/Userscript-style system, where users could inject custom JavaScript or prompts based on URLs, could be a powerful way to enhance website interactions.
For instance, consider a banking website with a cumbersome data export process that requires extra steps to make the data usable. Imagine being able to add a custom button to their UI (or define a custom MCP function) specifically for that URL, which could automatically pull and format the data into a more convenient format for plain text accounting.
Good luck, but in your place i would at least start with something that a certain ICP needs more. Many, many manhours have been wasted by ambitious technical founders on taking down Chrome. (many also starting from a chrome fork itself). But none of them succeeded. We only have limited energy
Definitely agree there is good amount of competition here.
But we do think there is a gap in the market for open-source, community driven and privacy-first AI browser. (Something like Brave?)
Sort of like a backwards perplexity search. (LLM context is from open tabs rather than the tool that brings you to those tabs)
I built a tab manager extension a long time ago that people used but ran into the same problem- the concept of tab management runs deeper than just the tabs themselves.
I added few features which I felt would be useful - easy way to organise and group tabs - simple way to save and resume sessions with selective context.
What are your problems that you would like to see solved?
This would of course apply to not just open tabs but tabs I used to have open, where the LLM knows about my browsing history.
But I think I would want a non-chat interface for this. (of course at any time I could chat/ask a question as well)
Resist the call to open in a tab every link in this article, overcome the fear of losing something if all these tabs lagging behind are closed right now without further consideration.
But wonder if it matter if it the agent is mostly using it for "human" use cases and not scrapping?
If your browser behaves, it's not going to be excluded in robots.txt.
If your browser doesn't behave, you should at least respect robots.txt.
If your browser doesn't behave, and you continue to ignore robots.txt, that's just... shitty.
No comments yet
To get the page content we parse accessibility tree.
Also what's the business model?
> what's the reason for no Linux/Windows?
Sorry, just lack of time. Also we use Sparkle for distributing updates, which is MacOS only.
> Also what's the business model?
We are considering an enterprise version of the browser for teams.
Appreciate the agplv3 licence, kudos on that.
I get the general sentiment. But cursor for sure has improved productivity by a huge multiplicative factor, especially for simpler stuff (like building chrome extension).
Instead of manually hunting across half a dozen different elements, then copy/paste and retype to put something into a format I want…
I can just get Dia do it. In fact, I can create a shortcut to get it to do it the same way every single time. It’s the first time I’ve used something that actually feels like an extension of the web, instead of a new way to simply act on it at the surface level.
I think the obvious extension of that is agentic browsers. I can’t wait for this to get built to a standard where I can use it every day… But how well is it going to run on my 16GB M1 Pro?
Download form https://www.nxtscape.ai/ or our github page.
Google being a big one of those companies would soon side with those companies and not with the users, it's been their modus operandi, just recently some people got threats that if they don't stop using ad blockers in YouTube they will ban them from the platform.
Oh cool, will look into basic.tech to understand more.
edit: Just read about the accessibility thing, but that's thin. Is there any usecase in the future that a browser can, but an extension can't?
https://developer.chrome.com/docs/extensions/ai
Don't any of these fit the bill? Are they Gemini-locked and you want something else? I am not familiar with the Chrome API, so pardon my ignorance.
The only reason to use a browser over a chrome extension is to bypass security features, for example, trusted events. If a user wants the browser window to go to full screen or play a video, a physical mouse click or key press is required. Moreover, some websites do not want to be automated like ChatGPT web console and Chase.com which checks if the event was a trusted event before accepting a button click or key press. This means that a Chrome extension can not automate voice commands inferred with audio to text. However, to get a trusted event only requires the user to press a button, any button, so message or dialog prompt that says, "Press to go full screen," is all that is required. This can be down with a remote bluetooth keyboard also.
The way I see it, these limitations are in place for very, very good reasons and should not be bypassed. Moreover, there are much larger security issues using a agentic browser which is sending entire contents of a bank website or health records in a hospital patient portal to a third party server. It is possible to run OpenAI's whisper on webgpu on a Macbook Pro M3 but most text generation models over 300M will cause it to heat up enough to cook a steak. There are even bigger issues with potential prompt injection attacks from third party websites that know agentic browsers are visiting their sites.
The first step in mitigating these security vulnerabilities is preventing the automation from doing anything a Chrome extension can't already do. The second is blacklisting or opt in only allowing the agents to read and especially to write (fill in form is a write) any webpage without explicit permission. I've started to use VSCode's copilot for command line action and it works with permissions the same way such as only session only access.
I've already solved a lot of the problems associated with using a Chrome extension for agentic browser automation. I really would like to be having this conversation with people.
EDIT: I forgot the most important part. There are 3,500,000,000 Chrome users on Earth. Getting them to install a Chrome extension is much, much easier than getting them to install a new browser.
- Ship a small LLM along with browser - MCP store built in
feel free to add new or upvote. Want to build what people want :)
Thank you! We have ollama integration already, you can run models locally and use that for AI chat.
- https://tsdr.uspto.gov/#caseNumber=76017078&caseSearchType=U...
> PROVIDING MULTIPLE-USER ACCESS TO A GLOBAL COMPUTER INFORMATION NETWORK FOR THE TRANSFER AND DISSEMINATION OF A WIDE RANGE OF INFORMATION; ELECTRONIC TRANSMISSION OF DATA, IMAGES, AND DOCUMENTS VIA COMPUTER NETWORKS; [ELECTRONIC MAIL SERVICES; PROVIDING ON-LINE CHAT ROOMS FOR TRANSMISSION OF MESSAGES AMONG COMPUTER USERS CONCERNING A WIDE VARIETY OF FIELDS]
- https://tsdr.uspto.gov/#caseNumber=76017079&caseSearchType=U...
> PROVIDING INFORMATION IN THE FIELD OF COMPUTERS VIA A GLOBAL COMPUTER NETWORK; PROVIDING A WIDE RANGE OF GENERAL INTEREST INFORMATION VIA COMPUTER NETWORKS
- https://tsdr.uspto.gov/#caseNumber=74574057&caseSearchType=U...
> computer software for use in the transfer of information and the conduct of commercial transactions across local, national and world-wide information networks
Also the fact that it's AGPL means this project is very copyleft and not compatible with business models.
I'm not saying that there is no place for copyleft open source anymore, but when it's in a clearly commercial project that makes me question the utility of it being open source.
https://www.gnu.org/licenses/why-affero-gpl.html
This means that if this company is successful and sells me 1 license, in theory I can request the source code and spin up Dr Evil's voice 1 billion clones and not pay licenses for those.
With other forms of GPL you only have to release the source code if you release the software to the user.
Saying that such a behavior encompasses all possible business models, it's like saying directorship is the only form of governance.
It was cute when the internet was cute but now it's just boring.
But not gonna lie, as a tiny startup, we don’t have marketing budget of Perplexity or Dia, so we picked a name and icon that at least hinted at “browser” right away. Definitely not trying to mislead anyone -- just needed something recognizable out of the gate.