Azure Linux with OS Guard: Secure, Immutable, and Open-Source Container Host (techcommunity.microsoft.com)

Hi HN, We’re Mahmoud and Alan, building Cyberdesk (https://www.cyberdesk.io/), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.

Here’s a couple demos of Cyberdesk’s computer use agent:

A fast file import automation into a legacy desktop app: https://youtu.be/H_lRzrCCN0E

Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): https://youtu.be/nXiJDebOJD0.

Filing a W-2 tax form: https://youtu.be/6VNEzHdc8mc

Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.

I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.

Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.

The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.

Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.

This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.

How it works: you install our open source driver on any Windows machine (https://github.com/cyberdesk-hq/cyberdriver). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).

The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.

Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.

We don't have a self-serve option yet but we'd love to onboard you manually. Book a demo here to learn more! (https://www.cyberdesk.io/) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (https://forms.gle/HfQLxMXKcv9Eh8Gs8) so you can be notified as soon as that’s ready. You can also check out our docs here: https://docs.cyberdesk.io/.

We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!

Comments (6)

rkagerer · 34m ago

Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.

mahmoud-almadi · 29m ago

Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.

throw03172019 · 1h ago

Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?

mahmoud-almadi · 1h ago

We sign BAAs with all our healthcare customers + all our vendors. Currently using Claude computer-use. Zero-data retention signed with both Anthropic and OpenAI, so none of the information getting sent to their LLMs ever get retained

hermitcrab · 6m ago

>none of the information getting sent to their LLMs ever get retained

Is it possible to verify that?

sgtwompwomp · 5m ago

Yup! It's a policy we explicitly sign with the LLM providers.

Why PDF Hell Breaks RAG Workflows and What Works (unstract.com)

Norway Spy Chief Blames Russian Hackers Dam Sabotage (reuters.com)

DINOv3: Single vision backbone outperforms specialized solutions on tasks (ai.meta.com)

Show HN: Studying STEM made fun and with Doregaku Study

Azure Linux with OS Guard: Secure, Immutable, and Open-Source Container Host (techcommunity.microsoft.com)

Meta appoints conspiracy theorist as AI bias advisor (advocate.com)

Study STEM subjects made fun and exciting

Show HN: Hacker News MCP Server from Klavis AI (github.com)

Behold the 'Cook Model' to Tariff Exemption (forbes.com)

Corporate America's Toxic Circus: Welcome to BossOrToss (bossortoss.wtf)

Yan: The Future of Interactive Video Generation Is Here (usecatalyst.xyz)

Matrix-Game 2.0: Real-Time Interactive World Models at 25 FPS (usecatalyst.xyz)

Can't pay, won't pay: streaming services are driving viewers back to piracy (theguardian.com)

Google Finds Workaround for Lobbying Rules That Omits Big Bosses (bloomberg.com)

Show HN: I "hacked" Cursor and turned it into a general-purpose AI agent (viewerkit.com)

Bluesky: Updated Terms and Policies (bsky.social)

Ask HN: NLP Using Chat UI

Multipotentialites, Creative Envy and the Will to Be Great at Something (lucas-schiavini.com)

Malware in Lisp? Now you're just being cruel (theregister.com)

How Keeta processes 11M financial transactions per second with Spanner (cloud.google.com)

Can't open bank account due to silly address mismatch

Explosion of formulaic research, inproper study designs false discoveries (journals.plos.org)

Show HN: DeepReel – AI Video Agent that turns blogs/docs into polished videos (deepreel.com)

Edit PDFs the way your brain works – just talk (pdfapp.app)

Amazon's $100B DC spend similar to Costa Rica GDP (theregister.com)

Gwynne Dyer: The AI Crash Is on the Way (pressreader.com)

Ask HN: How do you connect with other founders in your city?

Ambitious Denmark project starts farm-to-forest conversion (news.mongabay.com)

Show HN: We built an AI website for generating app icons (free with limits) (dreamcreator.ai)

Show HN: I made a 2-dimensional Rubik's Cube game

Ford's Robot Driver Puts Ranger Super Duty Through 10 Years of Abuse in 24 Hours (autoblog.com)

Boost your productivity using Pomofocus (pomofocus.app)

Code Talkers (nativepartnership.org)

Using gen AI, researchers design compounds that can kill drug-resistant bacteria (news.mit.edu)

Otumfuo Calls for Non-Partisan Approach to Presidential Aircraft (jphfeeds.top)

git-plot: plot changes using Unicode blocks (j.wied.co)

Ask HN: Which laptop can run the largest LLM model?

Meta appoints anti-LGBTQ+ conspiracy theorist Robby Starbuck as AI bias advisor (thepinknews.com)

GNU D compiler has been broken on FreeBSD 14 for over a year and no one noticed (briancallahan.net)

Tesla's Forgotten Founder Speaks Out – Exclusive with Martin Eberhard (YouTube) [video] (youtube.com)

What Musk, Altman and Others Say About AI-Funded 'Universal Basic Income' (wsj.com)

Gemma 3-270M (huggingface.co)

Unaligned GPT-OSS-20B-base extracted from OpenAI's model (twitter.com)

Debate Website (bicker.ca)

Show HN: I made a tool that turns niche research into daily marketing tasks (launchprint.deplo.yt)

How we use a 3-stage, human-in-the-loop AI workflow to overhaul rsyslog's docs (rsyslog.com)

The Internal Tooling Maturity Ladder (robbyonrails.com)

My Year of Rust (xavd.id)

Gemma 3 270M (twitter.com)

Art of the Nerd Snipe (lichess.org)

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

Comments (6)