Show HN: Blast – Fast, multi-threaded serving engine for web browsing AI agents

145 calebhwin 66 5/2/2025, 5:42:28 PM github.com ↗

Hi HN!

BLAST is a high-performance serving engine for browser-augmented LLMs, designed to make deploying web-browsing AI easy, fast, and cost-manageable.

The goal with BLAST is to ultimately achieve google search level latencies for tasks that currently require a lot of typing and clicking around inside a browser. We're starting off with automatic parallelism, prefix caching, budgeting (memory and LLM cost), and an OpenAI-Compatible API but have a ton of ideas in the pipe!

Website & Docs: https://blastproject.org/ https://docs.blastproject.org/

MIT-Licensed Open-Source: https://github.com/stanford-mast/blast

Hope some folks here find this useful! Please let me know what you think in the comments or ping me on Discord.

— Caleb (PhD student @ Stanford CS)

Comments (66)

diggan · 138d ago

What measures are you using to make sure you're not bombarding websites with a ton of requests, since you seem to automatically "scale up" the concurrency to create even more requests/second? Does it read any of the rate-limit headers from the responses or do something else to back-off in case what it's trying to visit suddenly becomes offline or starts having slower response times?

Slightly broader question: Do you feel like there is any ethical considerations one should think about before using something like this?

calebhwin · 138d ago

The main sort of parallelism we exploit is across distinct websites. For example "find me the cheapest rental" spawning tasks to look at many different websites. There is another level of parallelism that could be exploited within a web site/app. And yes we would have to make our planner rate limit aware for that.

Absolutely agree there are ethical considerations with web browsing AI in general. (And the whole general ongoing shift from using websites to using chatgpt/perplexity)

rollcat · 138d ago

> There is another level of parallelism that could be exploited within a web site/app. And yes we would have to make our planner rate limit aware for that.

People are already deploying tools like Anubis[1] or go-away[2] to cope with the insane load that bots put on their server infrastructure. This is an arms race. In the end, the users lose.

[1]: https://anubis.techaro.lol

[2]: https://git.gammaspectra.live/git/go-away

calebhwin · 138d ago

IMO it depends on how this tech is deployed. One way I see this being extremely useful is for developers to quickly build AI automation for their own sites.

E.g. if I'm the developer of a workforce management app (e.g. https://WhenIWork.com) I could deploy BLAST to quickly provide automation for users of my app.

rollcat · 138d ago

That's my point. You can use a knife to slice bread or to stab your neighbor. We're seeing an unprecedented amount of stabbings. People are getting away with murder, there's no accountability. Refining the stilettos doesn't help the problem.

spiderfarmer · 137d ago

I just block every AI bot that doesn’t give me traffic or other benefits. I know I can’t block them all, but it won’t be for a lack of trying.

lostmsu · 138d ago

> In the end, the users lose.

I think it would take more substantiation to claim this. Maybe 10 out of 1000 websites will get closed, but the users will be able to use AI tools to use the remaining 990. Not sure about you, but sounds like a win for users to me.

taskforcegemini · 137d ago

>Maybe 10 out of 1000 websites will get closed

this may be true for the time being (or not), but will sure change if/when more [websites] become aware of what is going on. The result will be 10 out of 1000 websites will remain open, and not the ones you actually want. The more pressure there is on the sites/servers, the more these will have to act to stay online to begin with.

lostmsu · 136d ago

I find it extremely unlikely that 99% websites will shutdown just because they discover they are being used via AI. I don't think even 1% would shut down without replacement.

rollcat · 137d ago

Yes, and these are the websites that aren't behind CloudFlare or some other CDN. The holdouts of the open, independent Internet. Sure, let them burn.

lostmsu · 136d ago

Do you have any stats?

A few vocal people on HN barely scratch the surface.

diggan · 138d ago

> Absolutely agree there are ethical considerations with web browsing AI in general.

I'm personally not sure there are, but I'm curious to hear what those are for you :)

calebhwin · 138d ago

Maybe more of a legal than ethical consideration but web browsing AI makes scraping trivial. You could use that for surveillance, profiling (get a full picture of a user's whole online life before they even hit Sign Up), cutting egress cost in certain cases. Right now CAPTCHA is actually holding up pretty well against web browsing AI for sites that really want to protect their IP but it will be interesting to see if that devolves into yet another instance of an AI vs AI "arms race".

No comments yet

xena · 138d ago

How do I block your service? Do you read robots.txt and have an identifiable user agent?

diggan · 138d ago

Seems Blast uses browser-use (https://github.com/browser-use/browser-use) which seems to be some client specifically for AIs to connect to/run browser runtimes.

Unfortunately, it seems like browser-use tries to hide that it's controlled by an AI, and uses a typical browser user-agent: https://github.com/browser-use/browser-use/blob/d8c4d03d9ea9...

I'm guessing because of the amount of flags, you could probably come up with a unique fingerprint for browser-use, based on available features, screen/canvas size and so on, that could be reused for blocking everyone using Blast/browser-use.

If calebhwin wanted to make Blast easier to identify, they could set a custom user-agent for browser-use that makes it clear it's Blast doing the browsing for the user.

ATechGuy · 138d ago

Can browser-use be blocked using Anubis or other anti-bot measures?

razemio · 137d ago

I think these kind of requests can not be blocked. It is like asking if Claude computer control can be blocked from visiting websites. It is not detectable. You could ofc display captchas which are difficult to solve for an AI.

diggan · 137d ago

As mentioned, browser-use doesn't seem "out of this world" hard to fingerprint since all instances would be using the exact same settings.

ZeroTalent · 137d ago

Browser-use just switched to patchright https://github.com/Kaliiiiiiiiii-Vinyzu/patchright which is an “undetectable” version of Playwright and can rotate/randomize fingerprints. Another extension uses residential proxies and pulls real fingerprints of real people.

ATechGuy · 137d ago

Sorry if I'm missing something, but why can't Anubis not detect/block such AI agents? Is it because they use headful browsers?

ZeroTalent · 136d ago

Yes, they are doing the Proof of Work. If need be, we will completely emulate human-like mouse movements. These are essentially clients no different from real users.

But the thing is, the more they push for this, the less accessible websites will get, so there has to be another solution. Not everyone uses a mouse.

Source: I work in the industry.

pal9000i · 138d ago

The whole point of AI browser automation is mimicking human behaviour, fighting the anti-bot detection systems. If the point is interacting with systems, we'd be using APIs

subscribed · 137d ago

How can I find for myself an affordable accommodation in Iceland across several websites aggregating cheap accommodation but without signing API access contract with all of them and without building a middleware to abstract them?

I don't have API for _that_.

jlpom · 137d ago

No, it's to automate tasks that can't be done using an API, like RPA.

nikisweeting · 135d ago

you're saying essentiall the same thing as the parent comment, both are true

anti-bot fingerprinting exists to prevent you from using the human-facing site like an API

croemer · 137d ago

...when there is an API. Often there isn't.

calebhwin · 138d ago

Good point, we should probably integrate that. Feel free to submit a PR!

BLAST can also be used to add automation to your own site/app FWIW.

dejobaan · 138d ago

> Feel free to submit a PR!

I think it's cool that you're experimenting in this area, but I'm not a huge fan of this as answer to a question about responsible/respectful web crawling. This stuff seems like it should be table stakes (even if you wanted to make it optional for the end user), but "yeah probably; learn the codebase, fork it, make changes, then we'll review it" really puts the onus onto the original poster.

calebhwin · 137d ago

Ah you're right, my bad. Hope I didn't sound dismissive because I think some sort of robots.txt needs to exist for AI that's scraping the web both at train or test time.

I'm really not excited at all about the "scrape other people's data" use case for BLAST and if we can prevent it then awesome. I'm excited about BLAST automating science, legacy web apps, internal tools, adding AI automation to your own app, etc.

lostmsu · 138d ago

Curious: if a user has an ad blocker, are they browsing responsibly?

subscribed · 137d ago

Very. Malvertising is a thing. Adtech surveillance is a thing.

Ad blocker is the least user can do.

lostmsu · 137d ago

That's not why most users use ad blockers.

xena · 138d ago

I will for my typical hourly rate plus the consulting bonus.

BrenBarn · 134d ago

Feel free to take down your project until you comply with basic ethical standards.

adrmtu · 138d ago

Cool project! How does the prefix cache work exactly? What’s your invalidation strategy when the page’s structure drifts (and how often do you refresh)? And how do you match an incoming question or task to the correct cached prefix? What criteria or fingerprinting logic do you use to ensure high hit rates without false positives

calebhwin · 137d ago

Thank you! It's currently based on task lineage, exact match of task descriptions, and an optional user-provided cache_control argument that can control whether results or plans are cached.

One use-case for this is conversations: So for example if I invoke /chat/completions with [{"role": "user", "content": "Go to google.com"}] and later with [{"role": "user", "content": "Go to google.com"}, {"role": "user", "content": "Search for gorilla vs 100 human"}] then we cache the browser state from the first invocation so it can be quickly restored (or reuse the browser if not evicted).

Caching will get much more sophisticated in a future version, it's the piece we're most actively working on.

smcleod · 137d ago

It's not immediately clear to me if this is a tool for ages (e.g. a MCP server), a browser engine (e.g. browserless) or some sort of OpenAI compatible LLM proxy that injects a web browser tool? It appears to expose itself via an openAI compatible API which makes me think the latter?

calebhwin · 137d ago

I would really think about it as a serving engine like vllm but for browsers+LLMs. It handles caching, parallelism, scheduling, budget constraints for LLM cost and browser memory usage. Yes it currently has an OpenAI-compatible API but we will also implement MCP. (though we're working on something that will be way better than "MCP for web browsers")

pal9000i · 138d ago

Great work! I just tried it and Google immediately captcha'd me on the first attempt. Is it using playwright or patchright? patchright using chrome and not chromium is more robust

pal9000i · 138d ago

Also any plans to add remote browser control feature? For Human in the loop tasks, for example advanced captcha bypassing and troubleshooting tasks that are stuck

calebhwin · 137d ago

Yes, human-in-the-loop is definitely on the roadmap. It's orthogonal to the central goal of low latency but necessary for completeness. Either via VNC or something simpler we have in mind.

triyambakam · 137d ago

Someone above said it's using browser-use which uses patchright

mtrovo · 138d ago

I don't work close to LLM APIs so not sure what exactly is the use case here? Is it something that could be adapted to work as a deep research feature on a custom product?

calebhwin · 137d ago

The use cases are (1) integrating AI automation into my app (2) automating workflows inside web browser (3) personal use. The value is in optimizing for low latency under user-defined constraints such as LLM cost budget or maximum browser memory usage.

joshstrange · 138d ago

This looks really cool but wouldn't this be better as an MCP server? It feels like it's mixing too many concepts and can't be plugged into another system. What if I want to extend my agent to use this but I already have MCP servers tied in or I'm going through another OpenAI proxy-type thing? I wouldn't want to stack proxies.

calebhwin · 138d ago

Great point, we are working on an MCP server implementation which should address this. The main benefit of having a serving engine here is to abstract away browser-LLM specific optimizations like parallelism, caching, browser memory management, etc. It's closer to vllm but I agree an MCP server implementation will make integration easier.

Though ultimately I think the web needs something better than MCP and we're actively working on that as well.

barbazoo · 138d ago

Looking forward to hearing more about that MCP successor you’re working on.

TheTaytay · 138d ago

Cool!

I read through the docs and want to try this. I couldn’t figure out what you were using g under the covers for the actual webpage “use” I did see: “ What we’re not focusing on is building a better Browser-Use, Notte, Steel, or other vision LLM. Our focus is serving these systems in a way that is optimized under constraints”

Cool! That makes sense!but I was still curious what your default AI-driven browser use library was.

If I were to use your library right now on my MacBook, is it using “browser-use” under the covers by default? (I should poke around the source more. I just thought it might be helpful to ask here in case I misunderstand or in case others had the same question)

calebhwin · 138d ago

Yes! And browser-use is great though I'm hoping at some point we can swap it out for something leaner, maybe one day it'll just be a vision language model. All we'll have to do within BLAST is implement a new Executor and all the scheduling/planning/resource management stays the same.

anxman · 138d ago

I was a little unclear at first, after looking at the source code, it looks like Blast uses Browser Use which uses your local browser (in dev) under the hood

otabdeveloper4 · 137d ago

I read all the documentation and I still have no idea what this does.

(I make AI agents as my day job, among many other things.)

calebhwin · 137d ago

Do you build agents that interface with web browsers? BLAST is sort of like vllm for browser+LLM. The motivation for this is that browser+LLM is slow and we can do a lot of optimization with an engine that manages browser+LLM together - e.g. prefix caching, auto-parallelism, data parallelism, request hedging, scheduling policy, and more coming soon.

Now the API is what may be throwing folks off. Right now it's an OpenAI-compatible API. We will implement MCP. But really the core thing is abstracting away optimizations required to efficiently run browser+LLM.

otabdeveloper4 · 137d ago

I have no idea what you mean by "browser+LLM". Image models to process pictures of a webpage? A wrapper around Python's "requests"?

calebhwin · 137d ago

A system that does the following given a task_description:

while LLM("is <task_description> not done?"): Browser.run(LLM("what should the browser do next given <Browser.get_state()>"))

This simple loop turns out to be very powerful, achieving the highest performance on some of OpenAI's latest benchmarks. But it's also heavily unoptimized compared to a system that is just LLM("<task_description>") for which we already have things like vllm. BLAST is a first step towards optimizing this while loop.

badmonster · 138d ago

How does BLAST handle browser instance isolation and resource contention under high concurrency?

ivape · 138d ago

resource contention under high concurrency

A queue? What else can you really do. Your server is at the mercy of OpenAI, so all you can do is queue up everyone's requests. I don't know how many parallel requests you can send out to OpenAI (infinite?), so that bottleneck is probably just dependent on your server stack (how many threads).

There's a lot of language being thrown out here, and I'm trying to see if we're using too much language to discuss basic concepts.

calebhwin · 138d ago

There's definitely opportunities to parallelize. BLAST exploits these with an LLM-planner and tool calls to dynamically spawn/join subtasks (there's also data parallelism and request hedging which further reduce latency).

Now you are right that at some point you'll get throttled either by LLM rate limits or a set budget for browser memory usage or LLM cost. BLAST's scheduler is aware of these constraints and uses them to effectively map tasks to resources (resource=browser+LLM).

grahamgooch · 138d ago

Interesting. Could I use this to automate testing of massive web applications (100s of screens). And potentially load test?

diggan · 138d ago

> And potentially load test?

You wanna load test the local DOM rendering or what? Otherwise, whatever endpoint is serving the HTML, you configure your load tests to hit that, if anything. Although you'd just be doing the same testing your HTTP server probably already doing before doing releases, usually you wanna load test your underlying APIs or similar instead.

calebhwin · 137d ago

Yes. And you can give BLAST an LLM cost budget or max browser memory usage and BLAST takes care of scheduling.

gitroom · 137d ago

Looks sick tbh, way more power than I'd ever need for my own stuff - you think stuff like this ever just outpaces all the anti-bot blockers or nah?

nikisweeting · 135d ago

absolutely, because of the bearproof trashcan problem (smartest bears are smarter than the dumbest users, make your trashcans too bear-proof and the humans cant use them either). the AIs will be able to solve more CAPTCHAs than some humans can pretty soon.

the next step is something like worldcoin/ID or biometric-gated access, or an extreme increase in the number of paywalls (not that I like either of those options). I don't see any other way forward on the decade+ timescale.

lgiordano_notte · 138d ago

Looks really cool. Curious how you're handling action abstraction? We've found that semantically parsing the DOM to extract high-level intents—like "click 'Continue'" instead of 'click div#xyz' helps reduce hallucination and makes agent planning more robust.

debo_ · 138d ago

I know it's impossible to avoid name collisions at this stage of the game, but BLAST is basically the Google of biological sequence alignment / search:

https://blast.ncbi.nlm.nih.gov/Blast.cgi

calebhwin · 138d ago

Right I figured there isn't a huge overlap of interested communities so hopefully not a point of confusion. I guess that could change!

esafak · 138d ago

The bigger collision is https://withblast.com/ (found via https://kagi.com/search?q=blast+llm )

Event Horizon Labs (YC W24) Is Hiring (ycombinator.com)

Adam (YC W25) Is Hiring to Build the Future of CAD (ycombinator.com)

Piramidal (YC W24) Is Hiring Back End Engineer (ycombinator.com)

Mux (YC W16) Is Hiring Engineering ICs and Managers (mux.com)

Bild AI (YC W25) Is Hiring (ycombinator.com)

Infracost (YC W21) Is Hiring First Product Manager to Shift FinOps Left (ycombinator.com)

Crimson (YC X25) is hiring founding engineers in London (ycombinator.com)

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

Nango (YC W23) Is Hiring a Staff Back End Engineer (Remote) (jobs.ashbyhq.com)

Gym Class VR (YC W22) Is Hiring – UX Design Engineer (ycombinator.com)

Relace (YC W23) Is Hiring for Code LLMs (SF)

Artie (YC S23) Is Hiring Engineers, AES, and Senior PMM (ycombinator.com)

Depot (YC W23) Is Hiring a Solutions Engineer (Remote US and Canada) (ycombinator.com)

Svix (webhooks as a service) is hiring for a founding marketing lead (svix.com)

Dynamo AI (YC W22) Is Hiring for AI Product Managers (ycombinator.com)

Kapa.ai (YC S23) is hiring research and software engineers (ycombinator.com)

Optery (YC W22) Is Hiring in Engineering, Legal, Sales, Marketing (U.S., Latam) (optery.com)

Telli (YC F24) is hiring engineers, designers, and interns (on-site in Berlin) (hi.telli.com)

Infisical (YC W23) Is Hiring Solutions Engineers to Scale the OSS Security Stack (ycombinator.com)

Channel3 (YC S25) Is Hiring a Founding Engineer, NYC (channel3.notion.site)

Thunder Compute (YC S24) Is Hiring (ycombinator.com)

Deepnote (YC S19) is hiring engineers to build a better Jupyter notebook (deepnote.com)

Prosper AI (YC S23) Is Hiring Founding Account Executives (NYC) (jobs.ashbyhq.com)

The Forecasting Company (YC S24) Is Hiring a Software Engineer (ycombinator.com)

Lago – Open-Source Usage Based Billing – Is Hiring in Sales, Eng, Ops (EU, US) (ycombinator.com)

Ember (YC F24) Is Hiring Full Stack Engineer (ycombinator.com)

LiteLLM (YC W23) is hiring a back end engineer (ycombinator.com)

SigNoz (YC W21, Open Source Datadog) Is Hiring Platform Engineers (Remote) (jobs.ashbyhq.com)

Motion (YC W20) Is Hiring Principal Software Engineers (jobs.ashbyhq.com)

Bild AI (YC W25) Is Hiring an Applied AI Engineer (workatastartup.com)

Text.ai (YC X25) Is Hiring Founding Full-Stack Engineer (ycombinator.com)

Show HN: Blast – Fast, multi-threaded serving engine for web browsing AI agents

Comments (66)