CEO Bench – Can AI Replace the C-Suite?

Comments (1)

dave1010uk · 3h ago

Thanks for submitting this!

Author here. (If you can call me that. GPT-4 and Gemini did the bulk of the work)

This is a (slightly tongue in cheek) benchmark to test some LLMs. All open source and all the data is in the repo.

It makes use of the excellent `llm` Python package from Simon Willison.

I've only benchmarked a couple of local models but want to see what the smallest LLM is that will score above the estimated "human CEO" performance. How long before a sub-1B parameter model performs better than a tech giant CEO?

Don't Read This If You Have a Security Clearance (2023) (theatlantic.com)

Understanding Firewalls in GCP (joshuajebaraj.com)

Why People Are Making SOA Fail (2008) (cio.com)

Trump says US has bombed Fordo nuclear plant in attack on Iran (bbc.co.uk)

Data Egress Costs Compared (getdeploying.com)

BYD Business Practices [video] (youtube.com)

Briefer (briefer.cloud)

See Jane 128 by Arktronics run (ft. Magic Desk, 3-Plus-1 and the Thomson MO5) (oldvcr.blogspot.com)

To Bind and Loose a Reference (thephd.dev)

Horse Browser (gethorse.com)

Transparent Ambition: on translucent user interfaces (take.surf)

Durability of Cybertruck HFS (twitter.com)

Tauri (v2.tauri.app)

Publishing a Docker Container for MS Edit to the GitHub Container Registry (til.simonwillison.net)

Wikimedia DNS (meta.wikimedia.org)

Resurrecting the Historic Cactus Movie Theater in Hawthorne Nevada 35mm Cinema [video] (youtube.com)

The Latin Library (thelatinlibrary.com)

Vibe-coding Minecraft mods (and the lessons learned) (maxleiter.com)

Researchers show AI art protection tools still leave creators at risk (tu-darmstadt.de)

Photo overload might be warping our ability to remember (nationalgeographic.com)

Disney Files Landmark Case Against AI Image Generator [video] (youtube.com)

Hybrid Cars, Once Derided and Dismissed, Have Become Popular (nytimes.com)

The Limits of Founder Friendly (newcomer.co)

HIV vaccine candidate could offer strong protection with just one dose (news.mit.edu)

Non-uniform finite-element meshes defined by ray dynamics for Helmholtz problems (arxiv.org)

Requiem for a Solar Plant (7goldfish.com)

An open standard for defining AI context and collaboration across platforms (github.com)

She flew hazardous fighter planes for Britain during WW2. She just turned 106 (theguardian.com)

Be Positive – Strive for Balance (penzba.co.uk)

Studio Ghibli's Majestic Sensibility Is Drawing Imitators (nytimes.com)

Stepping back as maintainer (github.com)

Texas Governor Greg Abbott signs strategic Bitcoin reserve bill into law (theblock.co)

Adding linear-time lookbehinds to RE2 (systemf.epfl.ch)

Im Investing

I went for 1,200 jobs but got only two interviews (bbc.co.uk)

Save 30 minutes prepping 1-on-1s with this free tool (tndm.app)

Don't Throw Out Your Old SD Cards – Do This Instead (slashgear.com)

Application First – Media over QUIC (quic.video)

China's property sector in an extended slump, shrinking population worsening it (cnbc.com)

Unveil of Thepantheon.ai (thepantheon.ai)

How do Transistors Work? [video] (youtube.com)

Simplified Gemini for Claude Code (github.com)

A tool to audit and compare static websites (live vs. archive, staging vs. prod) (github.com)

EVs Were Supposed to Save the Planet. What Happened? (carsandhorsepower.com)

Apple Sued by Shareholders for allegedly overstating AI progress (reuters.com)

From Killer Drones to Robotaxis, Sci-Fi Dreams Are Coming to Life (wsj.com)

Honeywell H316 Kitchen Computer (2023) (kbd.news)

Show HN: Good old emails and LLMs for automating job tracking

CodeQL (codeql.github.com)

Show HN: Procedural 2D Terrain Generator (procedural-art.netlify.app)

CEO Bench – Can AI Replace the C-Suite?

Comments (1)