Free evals API for AI startups (ship 10x faster with evals you can trust)

2 sfox100 0 7/31/2025, 11:28:25 AM

Hey HN,

We built Composo because AI apps fail unpredictably and teams have no idea if their changes helped.

LLM-as-judge doesn't work - it gives random scores, doesn't work well for agents, and doesn't tell you what to fix.

We've built purpose-built evaluation models that give you: - Deterministic scores (same input = same score, always) - Instant identification of where prompts, retrievals, agents & tool calls fail - Exact failure analysis ("tool calls are looping due to poorly specified schema")

We're 92% accurate vs 72% for SOTA LLM-as-judge.

Giving 10 startups free access: - 10k eval credits - Just launched our evals API for agents & tool calling - 5 min setup

Already helping teams at Palantir, Accenture, and Tesla ship reliable AI.

Apply: composo.short.gy/startups

Happy to answer questions about evaluation, reward models, or why LLMs are bad at judging themselves. startups@composo.ai

The Big Oops in Type Systems: This Problem Extends to FP as Well (danieltan.weblog.lol)

Show HN: Patio Ask, a Perplexity-style DIY home improvement assistant (patio.so)

Money Stuff: You Can Insider Trade NFTs Now (newsletterhunt.com)

EPA eliminates research and development office, begins layoffs (apnews.com)

Inside TSMC, the $1 Trillion Ghost Foundry Behind Nvidia's Crown (howardyu.substack.com)

TechnologyRanked: The Most Popular Programming Languages (2014-2024) (visualcapitalist.com)

IRS chief says agency plans to end free filing program (cnbc.com)

Lethal Cambodia-Thailand border clash linked to cyber-scam slave camps (theregister.com)

Show HN: Flowcus – Kanban Board for OmniFocus and Apple Reminders (getflowcus.app)

Show HN: I built an AI brand generator to stop wasting days naming side projects (brandkiit.com)

Community Contributions (philippemnoel.posthaven.com)

Figma's S-1/A from their IPO today (sec.gov)

Charvaka (en.wikipedia.org)

The upstart company that wants to build the largest aircraft (bbc.com)

Why Is Everyone Missing This with AI Agents? (Memory and Tools That Scale) [video] (youtube.com)

How to Dig for Music Without Spotify (pitchfork.com)

Show HN: Moots AI (YC W22) helps you turn meetup contacts into deals

How to Build a Custom Mechanical Keyboard (alexotos.com)

Show HN: SafeRate – AI chat-native mortgage lender (saferate.com)

Figma stock soars 230% in first day of trading, valuing company north of $40B (finance.yahoo.com)

Chasing Riskless Profits with Triangular Arbitrage Bot on Binance (shufflingbytes.com)

My Heart of Hearts (astralcodexten.com)

What Can a Cell Remember? (quantamagazine.org)

Sweeney confirms Epic Games Store launch on Play Store after Google loses appeal (neowin.net)

Sendria is a test SMTP server (github.com)

New ultrasound imaging to map drug delivery into the brain (medicalxpress.com)

From Elvis to "Men in Black," the Story of the Hamilton Ventura (wornandwound.com)

Andrew Tate's "The Real World" is stealing my software (2022) (insrt.uk)

Show HN: DJ-style audio waveform thumbnailer in Python (github.com)

The missing trust model in AI tools (docs.freestyle.sh)

Orch.space – A visual builder for AI reasoning workflows (BYOK) (orch.space)

Changes in Inflation by City (wallethub.com)

Model Context Protocol, Product Demos, and the New App Store (wjgilmore.com)

Robotaxis are powered by human drivers as it launches ride-hailing in Bay Area (electrek.co)

Show HN: Microsoft research on impact of GenAI on jobs (youtube.com)

Intuitive explanation of LLM Transformers without math (youtube.com)

Playwright users–wants design-partner to shape an AI tool that tames flaky tests

Our first outage from LLM-written code (sketch.dev)

KDE Linux (kde.org)

Parsing Without ASTs and Optimizing with Sea of Nodes [video] (youtube.com)

The Practicals of Writing: Paper and Pens (brianschrader.com)

What Is Space-Time? Einstein's Theory of Time and Gravity Explained (discovermagazine.com)

Released a lsp server for game maker language (github.com)

Tesla hits a speed bump in its California 'Robotaxi' rollout: Permits (politico.com)

Augment Code Brings Its Coding Agent to the Terminal (thenewstack.io)

In-Network Leaderless Replication for Distributed Data Stores [pdf] (vldb.org)

Understanding Blockchains Through Java Code (blog.blockingqueue.com)

UK Government Warns Promoting the Use of VPNs Could Attract Fines (ispreview.co.uk)

Show HN: Tight Studio – impressive screen recordings made easy (tight.studio)

Google Drive (docs and sheets) as a simple, headless CMS (jedwal.co)

Free evals API for AI startups (ship 10x faster with evals you can trust)

Comments (0)