OSUniverse: Building a Better OSWorld

2 mountainriver 0 5/8/2025, 4:16:45 PM
Hey all,

We are happy to release a new benchmark for computer use. We didn’t set out to build a benchmark but found the current state of OSWorld to be very challenging to work with and numerous tests were faulty.

OSUniverse aims to be dead simple to use, it only requires docker and can run in a single command. It offers test levels that increase in complexity and are easy to extend.

We have benchmarked all the top agents. As new GUI agents are released we will continue to update their performance.

Enjoy!

Comments (0)

No comments yet