Ask HN: Experience automating E2E manual testing with AI

1 rudderdev 0 9/8/2025, 6:22:19 AM

I see lots of discussions around using AI in testing. Let's make this discussion more objective and useful by sharing our experiences, here's my experience of using AI to automate e2e manual testing (especially where user interaction is required):

What I’m testing: RudderStack iOS SDK, it is used to track customer event data and send it to various product, marketing, and business tools.

The problem in my current testing workflow: Manual testing is important for quality assurance. In the case of testing RudderStack SDK, it requires multiple time-consuming and error-prone steps such as - plan specific steps for the test, perform interactions, review lengthy amounts of log text, and then verify logs which includes comparing long IDs.

The solution I experimented with: I leveraged LLM to plan test steps, used mobile-mcp to simulate user interactions (clicking some buttons such as track, reset, track, etc.), review logs using LLM (verify the event ID changes sent to the server), and prepare a final comprehensive report. All packaged as an MCP server that can work in my IDE (cursor) with test cases as prompt in plain English.

Result: My agent did click through track → reset → track and caught the anonymous ID change (something that ensures the tracking by the SDK worked properly)

What actually worked:

- Once set up, it did catch the regression correctly - Consistent results vs my manual testing where I sometimes miss things

Issues I ran into:

- Had to write extremely detailed step-by-step instructions and extensive context. If I missed anything, it just failed

- WebDriver setup on port 4723 was finicky

- It is slow. Took 2 minutes for what should be a 30-second manual test

Biggest problem: The amount of upfront work to get it running properly. I spent more time writing instructions than I would have just testing manually.

The real value might be in consistency for regression testing, not speed. But the initial investment is rough.

What would make this useful:

I need to create a workflow where, based on the feature or fixes, agents automatically generate test cases—including all edge cases—targeting the code impacted by the changes, and then perform a thorough end-to-end QA.

Has anyone else tried automating QA using AI? How was your experience and how did you resolve the challenges you faced? (I want to learn the practice that I can incorporate in my workflow)

Tor VPN Beta (Android) (play.google.com)

14 Killed in protests in Nepal over social media ban (tribuneindia.com)

Ask HN: Would Windows users want a native multi-model AI client?

The Dropshipping Problem: Youth Digital Marketing Gone Wrong

Trillion Dollar Elephants (dbushell.com)

Show HN: Silksong Map Online (silksongmap.co)

Quantum router could speed up quantum computers (newscientist.com)

Alloyed agents: combining LLMs to improve AI code generation (enginelabs.ai)

OntoMotoOS: An "Operating System" Between Delusion and Scholarship (medium.com)

Using Cursor Commands to Onboard a New Developer to a Repository (shlep.ai)

Show HN: I made a tool to turn anxiety-inducing news into short narrated videos (explainerslop.xyz)

Signal to start offering 100GB cloud storage (community.signalusers.org)

Silentype Font v2.0 Released – the europlus zone (blog.europlus.zone)

Fighting the Flu (1941) (theatlantic.com)

The New GraphQL.org (graphql.org)

Rimac unveils new solid state battery and EV powertrains (electrek.co)

RSS Beat Microsoft (buttondown.com)

Six Labors License Enforcement Changes and a New Subscription Tier (sixlabors.com)

Postal traffic to US drops more than 80% after trade exemption rule ends (npr.org)

How inaccurate are Nintendo's official emulators? [video] (youtube.com)

Use Grok Code for Free (twitter.com)

Strong Eventual Consistency – The Big Idea Behind CRDTs (lewiscampbell.tech)

Show HN: Gatling.io Has a New Website (gatling.io)

Geometric Deep Learning Grids, Groups, Graphs, Geodesics, and Gauges [pdf] (arxiv.org)

Breaking the AI-to-Production Bottleneck (docs.myop.dev)

Show HN: Submit to Hacker News | Browser Extension (hn.wbnns.com)

Distributing your own scripts via Homebrew (justin.searls.co)

ChatGPT as a Narcissus Mirror (neofeudalreview.substack.com)

Broadcom Lands Shepherding Deal for OpenAI "Titan" XPU (nextplatform.com)

How Do Devs Make Levels Without Game Engines? (jslegenddev.substack.com)

One dead in Nepal protests against social media ban, state TV says (reuters.com)

Venice's famous winged lion was made in China, scientists say (thetimes.com)

RateMyEmployer (Employer Review Platform) (github.com)

Manifesto for AI Software Development: Code Is Cattle, Not Pets (metamagic.substack.com)

Using RPI 5 Compute Module for FPGA Test (hackster.io)

The Reverse Flynn Effect (calnewport.com)

Mr Shange

Observability for Databases in CI/CD (blog.sonichigo.com)

Deluxe Paint on the Commodore Amiga (stonetools.ghost.io)

The Rise of Roko's Basilisk (hopit.ai)

The Nx Supply Chain Attack: How to Reproduce the First Steps (veganmosfet.github.io)

Your Small Imprecise Ask Is a Big Waste of Their Time (staysaasy.com)

Show HN: Side Space – An Arc-like sidebar tabs manager for Chrome (chromewebstore.google.com)

Consumption of Low- and No-Calorie Artificial Sweeteners and Cognitive Decline (pubmed.ncbi.nlm.nih.gov)

The Structure and Interpretation of Quantum Programs (arxiv.org)

Your ATM Card

AI Ate Its Own Tail, and I Learned Something About Writing (nibzard.com)

WPlace Script Hub – Download Free WPlace Scripts and Tools (wplacescript.com)

In France, the eternal return of facial recognition (laquadrature.net)

Ten Years of D3D12 (therealmjp.github.io)

Ask HN: Experience automating E2E manual testing with AI

Comments (0)