How much attention do you need, really? Experiments in O(1) latent reasoning

Comments (1)

orderone_ai · 4h ago

Hello, fellow kids!

I want to share what I've been working on the last few weeks: O(1) inference across whole tasks through direct vector transformation. A few facts upfront to give you an idea of how it goes:

1. Implemented as part of a PoC of what I call the Promptable General Classifier (a classifier which can be prompted for general tasks, including (some, limited) reasoning tasks, and has inference-time hot swappable vocabulary/classes), and the 1.09B implementation:

    1. Runs 93x faster than Zephyr 7B (and this is being generous to Zephyr, as I had to add post-processing to extract labels from malformed LLM output, and I didn't count the time necessary to complete this post processing in the Zephyr's benchmarks

    2. Matches Zephyr 7B's batched accuracy across 13 tasks at 77.7% (the unbatched run with Zephyr gets one more correct, so it's 80%. The DSRU is much more deterministic, and it receives no accuracy boost from running unbatched). Note that I did prompt engineering on 2-3 of these to help the DSRU. The prompt engineering seemed to have no impact on Zephyr’s performance, which I’m assuming is due to its robustness as a professionally built LLM rather than a PoC of a new architecture made by a lone amateur researcher

    3. ~19x faster latency than Zephyr 7B

2. Separately trained on entailment tasks, and scored 80% (~2.66x better than chance) on a 3-label text entailment task (entails, contradicts, neutral), and 50% on a 3-label multiple choices entailment task ('1', '2', '3') - notes in the white paper on why the difference

3. The core model has an inference time at 1.09B of around 1ms per batch, but this is purely in post-attention latent space. This model has generalization capabilities, but lacks the full flexibility of an LLM. In exchange for giving that up, it gains extreme inference speeds, determinism, and extremely straightforward training with smooth loss landscapes. I was a bit hesitant to put this out so early, kept thinking about edge cases, ways I could add just a bit more rigor, etc, but I decided the perfect was the enemy of the good, and put together this white paper over the course of a couple of weekends with some midweek refinements.

I'll be releasing a full reference implementation of the training pipeline that can run on midrange consumer hardware with default settings on github in…I’m thinking 4 weeks, probably, depending on how busy I end up being - doing this with a day job has been...a lot, to say the least.

I’d release it now, but frankly, it’s an embarrassing ball of mud that I hacked my way do haphazardly while chasing positive signal. Now that I’ve gotten this far, I can implement it more thoughtfully - and try a new specific model architecture that I think will work a lot better for a lot of comparative reasoning tasks.

It is patent pending, but I'm permitting personal experimentation and thesis work without restriction. This includes grad students using it for their degrees! You can share results and discuss your work, but distribution of trained models or derivatives is not permitted. For funded research, institutional use, or anything commercial, usage is not permitted for now.

I hope you all find it interesting!

China Biotech's Advance Is Changing the Drug Pipeline (bloomberg.com)

Traditional Chinese Medicine Has Not Been Vindicated by Science (mcgill.ca)

Lt. Columbo (Peter Falk) Roasts Frank Sinatra (1978) [video] (youtube.com)

Defending Against Prompt Injection with a Few DefensiveTokens (arxiv.org)

Daily Notes Considered Harmful (literallythevoid.com)

Show HN: Shadow VCS quarantines AI generated commits before they break your repo (github.com)

Tim Cook Has Outpaced Jobs in Shareholder Value, but AI Era Exposes Weaknesses (fortune.com)

The human harbor: Navigating identity and meaning in the AI age (venturebeat.com)

Task Runner Census 2025 (aleyan.com)

OpenCut: The open-source CapCut alternative (github.com)

APKLab: Android Reverse-Engineering Workbench for VS Code (github.com)

La Scala Warns Opera Patrons: No Flip-Flops or Tank Tops Allowed (nytimes.com)

Cinema Digital Sound (CDS) (in70mm.com)

What birdsong and back ends can teach us about magic (digitalseams.com)

BOE Governor Bailey Warns Banks Against Issuing Own Stablecoins (bloomberg.com)

Show HN: A Browser-Only Dream Interpreter Using Symbol Logic and JavaScript (github.com)

Show HN: I made a free, simple open-source Stripe invoice generator (oneoffinvoice.com)

Are a few people ruining the internet for the rest of us? (theguardian.com)

Legalise AC (samdumitriu.com)

When Novels Mattered (nytimes.com)

Efficiency of a key enzyme in photosynthesis boosted (news.mit.edu)

How to put your phone down and get back into habit of reading books (2024) (theguardian.com)

Programming Language Theory has a public relations problem (happyfellow.bearblog.dev)

Ancient Neanderthal 'Fat Factory' Reveals How Advanced They Were (sciencealert.com)

No Code Is Dead (thenewstack.io)

Quality and Food Safety Consultant – Emilia Wardach (emiliawardach.com)

Show HN: Hawk – Pandas-like data analysis for JSON/YAML/CSV in CLI (github.com)

Autodesk Weighs Takeover of Engineering Software Firm PTC (finance.yahoo.com)

Show HN: Type-safe PostgreSQL helpers for Kysely – arrays, JSONB, and vector ops (github.com)

Show HN: Shepherd – Generating Synthetic Data with Claude and MCP (github.com)

Five companies now control over 90% of the restaurant food delivery market (marketsaintefficient.substack.com)

Ask HN: How much of OpenAI code is written by AI?

Hungary's oldest library is fighting to save books from beetle infestation (apnews.com)

Book Review: The Laws of Trading (astralcodexten.com)

Add Furigana to your Japanese homework based on your JLPT level (github.com)

Vertical tiny homes redefine compact living (foxnews.com)

Building an open source multi-modal AI assistant (getubo.com)

On the Cyclical Nature of Nostalgia (2023) (lithub.com)

Using Gemini and Claude for SQL Analytics – A Bake Off (benjaminwootton.com)

Hypercapitalism and the AI Talent Wars (blog.johnluttig.com)

The more senior engineers get, the more results matter (seangoedecke.com)

Nvidia GPU Specs and Pricing Cheatsheet (github.com)

The Gentle Romance (asimov.press)

Grok 4 vs. ChatGPT: Bypassing Strict AI Filters with Advanced Prompt Engineering (lightcapai.medium.com)

Elon Musk's SpaceX might invest $2B in Musk's xAI (techcrunch.com)

The most responsible AI model (goody2.ai)

Holographic ribbon aims to oust magnetic tape with 50-year life span and 200TB (tomshardware.com)

Macron calls on EU to 'defend European interests resolutely' from Trump (theguardian.com)

The origin story of OnlyFans (reuters.com)

Strava Fitness App Revealed Locations of Swedish Leaders (nytimes.com)

How much attention do you need, really? Experiments in O(1) latent reasoning

Comments (1)