VideoGameBench: Can Vision-Language Models complete popular video games?

Comments (1)

msgodel · 20h ago

That's pretty interesting. Last year I messed with hooking language models up to inform 6/zmachines (both RL training and inference and even generating games) and have noticed most are awful at navigating and reasoning about graphs. Maybe this changed with the latest ones.

Memvid – Video-Based AI Memory (github.com)

PHP Pipe operator v3 Accepted (wiki.php.net)

Ping A real-life social app to meet people nearby (WIP) (figma.com)

Show HN: Overlay Images (overlayimages.app)

Anduril and Meta Partner for US Army VR Headsets (techcrunch.com)

Show HN: I redirected 10K+ URLs in 5 minutes (redirectifyapp.com)

Ask HN: Do we need AGI to filter spam?

Grammarly secures $1B from General Catalyst to build AI productivity platform (reuters.com)

First and Best Offers (brilliantorg.notion.site)

Writing is everywhere: write more (catalinpit.substack.com)

US suspends engine sales to Chinese planemaker COMAC (reuters.com)

JavaScript Errors (haydenbleasel.com)

Free Online Media Converter Tool (rendley.com)

Worldlines: Visualizing Special Relativity (2010) (worldlines.sourceforge.net)

Show HN: Fine-tune your image through intuitive dialogue (flux1kontext.org)

Apple Executives Won't Be Appearing at This Year's WWDC "The Talk Show Live" (macrumors.com)

Show HN: Edit photos with simple text prompts (fluxkontext.im)

Quaternion formulation claimed to resolve Navier Stokes Millennium Problem (arxiv.org)

RFK Jr's 'Maha' report found to contain citations to nonexistent studies (theguardian.com)

Robert Jarvik, who designed the first permanent artificial heart, dies (sltrib.com)

Neuroscience needs to empower early-career researchers, not fund moon shots (thetransmitter.org)

Passkey can detect auth cloning via signCount, but big tech do not support it (uzyn.com)

Twitter Header: Part 1 (fa0311.github.io)

Ask HN: Why Companies Do Not Provide RSS Feeds of Job Openings

How the iPhone Drove Men and Women Apart (nytimes.com)

Duplication Is Not the Enemy (terriblesoftware.org)

AppImage from Scratch (kevinboone.me)

Trends shaping open source funding–and what they mean for maintainers (github.blog)

Scientists Have Clear Evidence of Martian Atmosphere 'Sputtering' (sciencealert.com)

More film and television to watch than ever before – good luck finding it (salon.com)

Stealthy Backdoor Campaign Affecting Asus Routers (greynoise.io)

The DSM is not medical science – it's a social control manual (github.com)

Man skateboarding to Africa has belongings stolen (bbc.com)

Exploring the OKLCH ecosystem and its tools (evilmartians.com)

Show HN: Laravel Cookie Consent (github.com)

Nosey: Open-source hardware for acoustic nasalance (arxiv.org)

Show HN: Free Tailwindcss Components for HTML and JSX UIs (tailwindready.com)

9 days incident of Ubuntuforums.org HTTP (status.canonical.com)

VisionCraft v2 MCP now better than Context7 (github.com)

Open-source project that use LLM as deception system

First new antibiotic in 50 years to tackle superbug (telegraph.co.uk)

Buttplug MCP (github.com)

The Ivy League Dream possible for Indian middle class

A fake Facebook event disguised as a math problem is one of its top posts for 6M (engadget.com)

Taking Exams in Blue Books? They're Back to Help Curb AI Cheating (2024) (kqed.org)

Launching TextCLF: An API to create custom text classification models

A Poor Man's Types (blog.snork.dev)

San Francisco Tech Exodus: 2020 Predictions vs. 2025 Reality (jobswithgpt.com)

Let's DO this: detecting Workers Builds errors across 1M Durable Objects (blog.cloudflare.com)

Ask HN: Since getting an Agent, what's changed in your life?

VideoGameBench: Can Vision-Language Models complete popular video games?

Comments (1)