Crash Cows (beza1e1.tuxen.de)

There was a ton of this stuff before Chrome or WebKit even existed! Back in my day, we used Selenium and hated it. (I was lucky enough to start after Mercury...)

hugs · 1h ago

selenium creator here. hi!

hu3 · 1h ago

selenium helped my team so much back in the days. Thank you for it!

We had a complex user registration workflow that supported multiple nationalities and languages in a international bank website.

I setup selenium tests to detect breakages because it was almost humanly impossible to retest all workflows after every sprint.

It brought back sanity to the team and QA folks.

Tools that came after certainly benefitted from selenium lessons.

moi2388 · 19m ago

I just wanted to say I absolutely love your product. Thank you!

Aurornis · 40m ago

I always enjoyed Selenium, for what it’s worth.

gregpr07 · 1h ago

Hi, the first version of Browser Use was actually built on Selenium but we quite quickly switched to Playwright

hugs · 1h ago

yeah, i noticed that. apologies if i missed a post about it... what do you wish didn't suck about selenium?

moi2388 · 17m ago

Scrolling to an element doesn’t always work because somehow the element might not be ready. You need to add ids to the element and select by that to ensure it works properly.

arm32 · 1h ago

Uh, ahem, <clears throat>, we meant the _other_ Selenium.

hugs · 1h ago

that's what i thought. :) personal life accomplishment was seeing wikipedia add a disambiguation link on the element's page. you know, because it's right up there in importance as the periodic table, obviously.

patrickhogan1 · 2m ago

Selenium was very usable before 2011.

This post is like saying Grafana and not mentioning Nagios

saberience · 5m ago

Talk about "not built here" mentality. This is a project doomed to failure. Using VC money to re-write better built software which has been around for years.

Good luck guys!

johnsmith1840 · 1m ago

Why not cdp snapshot?

dataviz1000 · 1h ago

I made this comment yesterday but really applies to this conversation.

> In the past 3 weeks I ported Playwright to run completely inside a Chrome extension without Chrome DevTools Protocol (CDP) using purely DOM APIs and Chrome extension APIs, I ported a TypeScript port of Browser Use to run in a Chrome extension side panel using my port of Playwright, in 2 days I ported Selenium ChromeDriver to run inside a Chrome Extension using chrome.debugger APIs which I call ChromeExtensionDriver, and today I'm porting Stagehand to also run in a Chrome extension using the Playwright port. This is following using VSCode's core libraries in a Chrome extension and having them drive a Chrome extension instead of an electron app.

The most difficult part is managing the lifecycle of Windows, Pages, and Frames and handling race conditions, in the case of automating a user's browser, where, for example, the user switches to another tab or closes the tab.

Tsarp · 9m ago

Wouldnt having chrome.debugger=true also flag your requests?

wonger_ · 53m ago

What is the benefit of porting all those tools to extensions? Have you ran into any other extension-based challenges besides lifecycles and race conditions?

dataviz1000 · 30m ago

Some benefits (without using Chrome.debugger or Chrome DevTools Protocol):

1. There are 3,500,000,000 instances of Chrome desktop being used. [0]

2. A Chrome Extension can be installed with a click from the Chrome Web Store.

3. It is closer to the metal so runs extremely fast.

4. Can run completely contained on the users machine

5. It's just one user automating their web based workflows making it harder for bot protections to stop and with a human-in-the-loop any hang ups and snags can be solved by the human

6. Chrome extensions now have a side panel that is stationary in the window during navigation and tab switching. It is exactly like using the Cursor or VSCode side panel copilots

Some limitations:

1. Can't automate ChatGPT console because they check for user agent events by testing if the `isTrusted` property on event objects is true. (The bypass is using Chrome.debugger and the ChromeExtensionDriver I created.)

2. Can't take full page screen captions however it is possible to very quickly take visible scree captions of the viewport. Currently I scroll and stitch the images together if a full page screen is required. There are other APIs which allow this in a Chrome Extension and can capture video and audio but they require the user to click on some button so it isn't useful for computer vision automation. (The bypass is once again using the Chrome.debugger and ChromeExtensionDriver I created.)

3. Chrome DevTool Protocol allows intercepting and rewriting scripts and web pages before they are evaluated. With manifest v2 this was possible but they removed this ability in manifest v3 which we still hear about today with the adblock extensions.

I feel like with the limitations having a popup dialog that directs the user to do an action will work as long as it automates 98% of the user's workflows. Moreover, a lot of this automation should require explicit user acknowledgments before preceding.

[0] https://www.demandsage.com/chrome-statistics/

diggan · 46m ago

> What is the benefit of porting all those tools to extensions?

Personally, I have a browser extension running in my user/personal browser instance that my agent use (with rate-limits) in order to avoid all the captchas and blocks basically. Everything else I've tried ultimately ends up getting blocked. But then I'm also doing some heavy caching so most agent "browse" calls end up not even reaching out to the internet as it's finding and using stuff already stored locally.

arm32 · 1h ago

Ah, yes, the classic "Playwright isn't fast enough so we're reinventing Puppeteer" trope. I'd be lying if I haven't seen this done a few times already.

Now that I got my snarky remark out of the way:

Puppeteer uses CDP under the hood. Just use Puppeteer.

haolez · 1h ago

I've seen a team implement Go workers that would download the HTML from a target, then download some of the referenced JavaScript files, then run these JavaScript files in an embedded JavaScript engine so that they could consume less resources to get the specific things that they needed without using a full browser. It's like a browser homunculus! Of course, each new site would require custom code. This was for quant stuff. Quite cool!

odo1242 · 48m ago

This exact homunculus is actually supported in Node.JS by the `jsdom` library: https://www.npmjs.com/package/jsdom

I don't know how well it would work for that use-case, but I've used it before, for example, to write a web-crawler that could handle client-side rendering.

nikisweeting · 13m ago

sir we are a python library, puppeteer-python was abandoned, how exactly do you propose we use puppeteer?

epolanski · 7m ago

Playwright has Python bindings .

boredtofears · 1h ago

Is the case for playwright over puppeteer just in it's crossbrowser support?

We're currently using Cypress for some automated testing on a recent project and its extremely brittle. Considering moving to playwright or puppeteer but not sure if that will fix the brittleness.

rising-sky · 1h ago

In my experience Playwright provided a much more stable or reliable experience with multiple browser support and asynchronous operations (which is the entire point) over Puppeteer. ymmv

arm32 · 1h ago

Playwright also offers nice sugar like HTML test reports and trace viewing.

gregpr07 · 1h ago

From my experience with Playwright RR-Web recordings are MUCH better than Playwright’s replay traces, so we usually just use those.

epolanski · 7m ago

What's RR web?

benmmurphy · 33m ago

direct CDP has been used by the scraping community for a long time in order to have a cleaner browser environment that is harder to fingerprint. for example nodriver (https://github.com/ultrafunkamsterdam/nodriver) was started in Feb 2024 and I suspect this technique was popular before that project started.

keepamovin · 5m ago

[delayed]

spullara · 29m ago

this is exactly what I did when I wrote my first agent with scraping. later we switched to taking control of the users browser through a browser extension.

Robdel12 · 34m ago

All of the approaches of driving the browser outside of the browser is going to be slow (webdriver, playwright, puppeteer, etc).

Karma like approaches are where I’m at (execute in the browser)

appcustodian2 · 26m ago

> All of the approaches of driving the browser outside of the browser is going to be slow

Why? I would think any cross-process communication through the CDP websocket would have imperceptible overhead compared to what already takes long in the browser: a ton of HTTP I/O

What is Karma? What are you executing in the browser?

nikisweeting · 11m ago

CDP rountrip time on a local machine is 100µs (0.1ms), it's not slow haha

johnsmith1840 · 2m ago

"Thousands of cdp calls" from the link.

Cdp does add a good chunk of latency. Depends on what your threshold is.

An image grab is around 60ms and a snapshot can range from 40ms -> 500ms

The latency is pure data movement. It's like the difference of using ram vs ssd vs data from the internet.