Computer use agent with RL training for day trading?

1 iiTsEddy 1 5/8/2025, 3:23:45 PM

We now know that RL can make models more capable on measurable tasks and is the new dimension of scaling law, but is anyone putting these capabilities to more meaningful use beyond Olympic math problems or 2D game playing?

So far, pretty much all of the computer use agent demos I've seen revolve around some kind of instruction following (book this flight, clean my desktop, etc.) I wonder, is anyone working on putting them into active trading in financial markets and use P&L as a reward / loss function? Or maybe title agnostic video game playing which is optimized for ELO or rank or win rate?

It feels like context length would go eventually from millions of tokens to days or months of agent's "life span"; inference cost would eventually go down to time cost of GPU server since hybrid models (mamba + attention) with linear time complexity can perform like regular transformers (who's inference is quadratic). What are the other major technical challenges here?

I think a meaningful metric is crucial, and I took a lot of inspiration from this startup, Chai.ai, a competitor to Character.ai. I went to one of their events and got the sense that they are essentially optimizing chat LLMs for user monthly retention and subscription. Their small team hit 30M ARR with 1.8m DAU, and it happened over the last year or so. Combined with my own experience working at a startup, it seems like the right metric is the money shot.

Am I missing anything fundamental? Is anybody working on this? (or have interest?)

Comments (1)

falcor84 · 8h ago

I'm pretty certain that the likes of Jane Street and SIG aren't leaving any low-hanging-fruit signals for us mortals to benefit from.

Ask HN: What are good high-information density UIs (screenshots, apps, sites)?

Ask HN: How much better are AI IDEs vs. copy pasting into chat apps?

Ask HN: Help us validate our idea of an administrative app for small businesses

OSUniverse: Building a Better OSWorld

We built an AI-powered voice tool to boost sales

AI Summarizer: Summarize Web, YouTube and PDFs in Seconds–Free

Ask HN: Which Firefox add-ons are you using in 2025?

Ask HN: How do you obtain software development contracts?

Ask HN: Nvidia GeForce RTX 5060 arrives May 19 at $299 revive PC builds?

I built a pixel art editor after playing Octopath Traveler II

Ask HN: Who wants to be hired? (May 2025)

Ask HN: Why is the sender chat box always on the right?

Ask HN: Who is hiring? (May 2025)

Ask HN: What's the best advice you ignored and later wished you hadn't?

Ask HN: Hackathons feel fake now

Is a Smaller Internet Better?

Getting tired of Helm – any better way to handle deployments in Kubernetes?

Ask HN: Did Aliexpress stop shipping to US?

Ask HN: Have you used Claude Code? Is it any good?

Ask HN: Advice wanted – director distrusting of our team?

Ask HN: Theory to industry: where do academics fit in?

Ask HN: Has AI breathed new life into Semantic (web) Technologies?

Ask HN: Jaded with AI – Alternatives?

Ask HN: Has anyone managed to pass Meta's Access Verification?

I made 4000 agent calls in Cursor last month. Each model has a personality

What I discovered after months of professional use of custom GPTs

Ask HN: Can you exclude ~/Library except for your iCloud Drive in macOS backups?

Cloi – Local debugging agent that runs in your terminal

Canadian Startup Options Tax Planning

Ask HN: How are you using MCP when coding?

Ask HN: Best underrated way to get a job in tech during a hiring slowdown?

Ask HN: What engineering trivia earned you the most cred

Computer use agent with RL training for day trading?

Comments (1)