Hey HN! I built an open-source Python library that lets AI agents control mobile apps like humans do—tap buttons, scroll feeds, fill forms, you name it. Its heavily inspired by browser-use
From our experience testing mobile workflows or automate repetitive app tasks has always been a pain. You either write brittle UI automation scripts or do everything manually. And after testing out browser-use and seeing all the cool things people were building for web without the pains of normal automation I decided I wanted to build something like it but for mobile
So I built App Use around Appium with a dead-simple Python interface. You just describe the task in plain English, point it at your app, and the agent uses computer vision to navigate the UI:
agent = Agent(
task="Order some tacos from the nearest restaurant",
llm=ChatOpenAI(model="gpt-4o"),
app=app # iOS/Android app config
)
await agent.run()
The agent takes screenshots, analyzes the UI, and performs actions step-by-step while explaining what it's doing. We've tested it on everything from shopping apps to social media—it can handle complex multi-step flows like creating accounts, making purchases, or posting content.
What makes it different from existing mobile automation tools:
Natural language tasks instead of writing XPath selectors
Works with any LLM (OpenAI, Anthropic, Gemini, etc.)
Vision-based navigation so it adapts when UIs change
Memory system to remember app states and user preferences
Cross-platform (iOS + Android) with the same API
All the things you love about browser-use but for mobile
Current limitations: it's as smart as the LLM you give it, can be slow on complex tasks, and occasionally gets confused by unusual UI patterns. But for most common mobile workflows, it works surprisingly well.
Install with: pip install app-use
You'll also need Appium running and your device/emulator connected. The setup takes about 5 minutes if you're familiar with mobile development.
I've curious what use cases this unlocks for you! Some ideas I've been thinking this could be used for:
QA teams automating regression tests across app updates
Vibe Debugging tools
Data collection from apps that don't have APIs
Automating tasks on mobile first apps (stuff browser-use wouldn't be able to do)
The whole thing is open source at https://github.com/itsericktorres/app-use. I'd love feedback, especially if you've struggled with mobile app automation before or have ideas for making this more robust.
Let me know what you guys think! And if you have any questions!
Here's a quick demo https://x.com/itsericktorres/status/1932996729458110482
From our experience testing mobile workflows or automate repetitive app tasks has always been a pain. You either write brittle UI automation scripts or do everything manually. And after testing out browser-use and seeing all the cool things people were building for web without the pains of normal automation I decided I wanted to build something like it but for mobile
So I built App Use around Appium with a dead-simple Python interface. You just describe the task in plain English, point it at your app, and the agent uses computer vision to navigate the UI:
agent = Agent( task="Order some tacos from the nearest restaurant", llm=ChatOpenAI(model="gpt-4o"), app=app # iOS/Android app config ) await agent.run()
The agent takes screenshots, analyzes the UI, and performs actions step-by-step while explaining what it's doing. We've tested it on everything from shopping apps to social media—it can handle complex multi-step flows like creating accounts, making purchases, or posting content.
What makes it different from existing mobile automation tools: Natural language tasks instead of writing XPath selectors Works with any LLM (OpenAI, Anthropic, Gemini, etc.) Vision-based navigation so it adapts when UIs change Memory system to remember app states and user preferences Cross-platform (iOS + Android) with the same API
All the things you love about browser-use but for mobile
Current limitations: it's as smart as the LLM you give it, can be slow on complex tasks, and occasionally gets confused by unusual UI patterns. But for most common mobile workflows, it works surprisingly well.
Install with: pip install app-use
You'll also need Appium running and your device/emulator connected. The setup takes about 5 minutes if you're familiar with mobile development. I've curious what use cases this unlocks for you! Some ideas I've been thinking this could be used for:
QA teams automating regression tests across app updates Vibe Debugging tools Data collection from apps that don't have APIs Automating tasks on mobile first apps (stuff browser-use wouldn't be able to do)
The whole thing is open source at https://github.com/itsericktorres/app-use. I'd love feedback, especially if you've struggled with mobile app automation before or have ideas for making this more robust.
Let me know what you guys think! And if you have any questions!