Ask HN: Since getting an Agent, what's changed in your life?

1 points by yeeyang 6m ago 2 comments

Decorators and Functional Programming (lihil.cc)

1 points by BerislavLopac 6m ago 0 comments

AI jobs danger: Sleepwalking into a white-collar bloodbath (axios.com)

1 points by duck 9m ago 1 comments

Learning System 1.0 (shaneso.dev)

1 points by Shaneso 10m ago 0 comments

Show HN: Because Apple's daily rings are limited: App for weekly activity goals (apps.apple.com)

2 points by tobias5 11m ago 0 comments

Consider the Cronslave (hiandrewquinn.github.io)

1 points by hiAndrewQuinn 14m ago 0 comments

Live Watches comes to CLion's debugger with the new build (neowin.net)

1 points by bundie 16m ago 0 comments

The End of Seeing Is Believing (tumithak.substack.com)

1 points by miles 18m ago 0 comments

Fake My Run Is Tricking Strava While Trying to Make a Larger Point (nytimes.com)

1 points by helsinkiandrew 19m ago 1 comments

Canonical Announces First Ubuntu Desktop Image for Qualcomm Dragonwing Platform (canonical.com)

1 points by tapanjk 20m ago 0 comments

Shadcn/UI in HTML and Tailwind (basecoatui.com)

1 points by vishnumohandas 23m ago 0 comments

What's Your Doge Number? (yourdogenumber.live)

1 points by anshyyy 24m ago 0 comments

AI is helping rescue a nearly extinct bird species (mashable.com)

2 points by 01-_- 26m ago 0 comments

Google AI Overviews Says It's Still 2024 (wired.com)

1 points by 01-_- 26m ago 0 comments

American finance, always unique, is now uniquely dangerous (economist.com)

1 points by petethomas 28m ago 0 comments

MCP Streamable HTTP – Python and TypeScript Examples (github.com)

1 points by bishopsmother 30m ago 0 comments

Three Level Summary: Neural Radiance Fields vs. 3D Gaussian Splatting (edwardahn.me)

2 points by todsacerdoti 31m ago 0 comments

Woman fired by Wikipedia parent after harassment speaks out (thedesk.net)

3 points by kurtreed2 34m ago 0 comments

Anthropic archives many of their reference MCP servers on GitHub (twitter.com)

3 points by macOSCryptoAI 36m ago 0 comments

The Pale Blue Dot, But Time (qntm.org)

1 points by blobcode 37m ago 0 comments

Chalk Raises $50M Series A to Power AI Inference (businesswire.com)

2 points by jamesblonde 38m ago 0 comments

A decade in, bootstrapped Thinkst Canary reaches $20M in ARR without VC funding (techcrunch.com)

3 points by danielmorozoff 46m ago 0 comments

SEO Tips to Boost Traffic and Organic Rankings (slideshare.net)

2 points by objectad 47m ago 1 comments

Dotfiles/.claude/Claude.md (github.com)

3 points by hboon 47m ago 1 comments

Show HN: I made a Chrome extension to filter your X feed (getreplyguy.com)

3 points by Tjerkienator 52m ago 1 comments

High vitamin B6 doses over a long period could cause irreversible nerve damage (abc.net.au)

14 points by l8rlump 57m ago 6 comments

Kara Pod – self-refilling coffee maker that turns air into water (karapod.com)

2 points by wertyk 57m ago 0 comments

The Palindrome Game of the Enigma Codebreakers (visualthesaurus.com)

1 points by EvgeniyZh 59m ago 0 comments

How Generative Engine Optimization (GEO) rewrites the rules of search (a16z.com)

2 points by eutropheon 1h ago 0 comments

Building a Linux Electron App (dolthub.com)

1 points by ingve 1h ago 0 comments

The David Lynch Collection (juliensauctions.com)

11 points by Duanemclemore 1h ago 4 comments

The Beauty of TanStack Router (tkdodo.eu)

2 points by cmpit 1h ago 0 comments

30 years ago, Apple fans met the Mac clone. This is the weird, wild story (macworld.com)

3 points by ingve 1h ago 0 comments

Conversation with a 32nd Generation Samurai (musubi.academy)

3 points by cdplayer96 1h ago 0 comments

Ask HN: Arc is dead, where should we move now?

2 points by Vishal19111999 1h ago 10 comments

Absent Fathers (arvat.org)

2 points by arvatus 1h ago 0 comments

Why You're Not Shipping (newsletter.posthog.com)

2 points by cmpit 1h ago 0 comments

Spider Sense Male Enhancement (facebook.com)

1 points by rafikaol 1h ago 0 comments

Everything I Created: May 2025 Edition (williammeller.com)

1 points by wmeller 1h ago 1 comments

Stack Overflow's New Plan to Fight AI-Induced Death Spiral (m.slashdot.org)

4 points by thunderbong 1h ago 0 comments

Grow Valley (Game) (eyezmaze.com)

1 points by sdovan1 1h ago 0 comments

Limits to Growth was right about collapse (thenextwavefutures.wordpress.com)

44 points by surprisetalk 1h ago 27 comments

How to Do Ambitious Research in the Modern Era [video] (youtube.com)

14 points by surprisetalk 1h ago 1 comments

SpaceX town residents may lose right to using their property for its current use (cnbc.com)

7 points by 1659447091 1h ago 0 comments

Tasks Per Day – A minimalist productivity app that works

2 points by TerrenceTian 1h ago 0 comments

Show HN: ChatGPT Library Exporter – Download ChatGPT Image Data (chromewebstore.google.com)

1 points by qwikhost 1h ago 0 comments

Show HN: Bing Maps Leads Scraper (chromewebstore.google.com)

2 points by qwikhost 1h ago 1 comments

The Linux 6.15 kernel arrives – and it's big a victory for Rust fans (zdnet.com)

2 points by olalonde 1h ago 0 comments

The Hays Code (1930) (josephsmithfoundation.org)

2 points by thomassmith65 1h ago 1 comments

Polio Victim with Incentive Pays Price for His Success (1977) (nytimes.com)

1 points by petethomas 1h ago 0 comments

Web Bench: a new way to compare AI browser agents

28 suchintan 9 5/29/2025, 2:57:25 PM blog.skyvern.com ↗

Comments (9)

helsinki · 7h ago

Does anyone use Skyvern to build their websites? I’m wondering how I might benefit from using an agentic browser workflow instead of a playwright MCP server for building a web UI?

neveroddoreven · 15h ago

I had no idea WebVoyager only spanned 15 websites lol... the 452 figure you have still seems a little low though - do you have plans to expand it? It seems like you'd want as many sites as possible to improve the real-world accuracy of agents due to the long tail nature of website traffic

suchintan · 14h ago

We definitely plan to expand it. I want to get to ~10,000 for a reasonable benchmark.

15 blew my mind -- it's too easy to overfit that dataset

vasusen · 8h ago

Thank you so much for creating this folks! A browser navigation agent is key part of our AI QA setup at Donobu (https://donobu.com/). We found the WebVoyager benchmarks severely lacking for complex e2e test cases like logged-in dashboards, onboarding forms, etc.

While the extraction/2fa flows aren't super relevant to us, this saves us time from building our own set of benchmarks. Really appreciate it and hope we can contribute to make this a really large set.

suchintan · 5h ago

That would be amazing!!

pants2 · 7h ago

Great work! Big fan of Skyvern.

Looking forward to the benchmarks on Claude 4 (and o3 CUA when that's released)

gitmagic · 9h ago

Would love to see how Nelly [0] performs on this benchmark.

[0] https://nelly.is

suchintan · 9h ago

Very cool. The benchmark can be found here if you want to take a look at it: https://github.com/Halluminate/WebBench

wm2 · 15h ago

super cool!