Show HN: A reading to remind us to keep raising our voices against oppression (childrensbookforall.org)

Worth noting that division (integer, fp, and simd) has gotten much cheaper in the last decade. Division is partially pipelined on common microarchitectures now (capable of delivering a result every 2-4 cycles) and have greatly reduced latency from ~30-80 cycles down to ~10-20 cycles.

This improvement is sufficient to tip the balance toward favoring division in some algorithms where historically programmers went out of their way to avoid it.

net01 · 6h ago

you can find a better table here with most operations and time:

https://uops.info/table.html

supports most modern and old architectures

bee_rider · 5h ago

I think it is a totally different type of table. Yours is real data. Theirs is more like a ballpark. Maybe there could be some use for the latter? Just to help folks reason about performance.

Although, reasoning about performance can be hard anyway.

owlbite · 2h ago

Trying to reduce high end processor performance to "operation X takes Y cycles" likely confuses the uninitiated more than it helps once you get beyond "cache miss bad".

For the uninitiated, most high-performance CPUs of recent years:

- Are massively out-of-order. It will run any operation that has all inputs satisfied in the next slot of the right type available.

- Have multiple functional units. A recent apple CPU can and will run 5+ different integer ops, 3+ load/stores and 3+ floating point ops per cycle if it can feed them all. And it may well do zero-cost register renames on the fly for "free".

- Functional units are pipelined, you can throw 1 op in the front end of the pipe each cycle, but the result sometimes isn't available for consumption until maybe 3-20 cycles later (latency depends on the type of the op and if it can bypass into the next op executed).

- They will speculate on branch results and if they get them wrong it needs to flush the pipeline and do the right thing.

- Assorted hazards may give +/- on the timing you might get in a different situation.

Liftyee · 4h ago

I agree with this. As someone who's not an expert in assembly and CPU architecture the "simplified" estimates in a condensed log-chart format was much more insightful. The exact data for specific architectures would be useful for more advanced users than me, but it doesn't offer the same quick "big picture" overview.

bee_rider · 4h ago

Did you get a chance to use it? I’ve only just come across this table now, so I haven’t bad a chance to actually try and use it for anything, so I wouldn’t be able to evaluate the usefulness.

I have a sneaking suspicion that this table is satisfying for our brains as a vaguely technical and interesting thing, but I’m not sure how useful it really is. In general the compiler will be really creative in reordering instructions, and the CPU will also be creative about which ones it runs parallel (since it is good at discovering instruction level parallelism). So, I wonder if the level of study necessary to use this information also requires the level of data that is available in the detailed table.

I have not done much caring about instructions, it seems very hard. FWIW I have had some success caring about reducing the number of trips to memory and making sure the dependencies are obvious to the computer, so I’m not totally naive… but I think that caring about instruction timing is mostly for the real hardcore optimization badasses.

torium · 2h ago

The x-axis is in CPU cycles (10^0 means 1 cycle).

If you CPU runs on 1000MHz that's 10^9 cycles per second. On that CPU the right hand side of the picture corresponds to 1ms. You can do 1 million register-register operations in 1ms, or 1 billion in 1sec.

Computers are fast.

Show HN: Snape, a Minimal Snippet Manager Built in Go (github.com)

Show HN: ServerBuddy – GUI SSH client for managing Linux servers from macOS (serverbuddy.app)

Show HN: Free SVG Icons – Browse, customize, and grab icons (iconshelf.com)

Show HN: I built a desktop app that indexes your media locally (meetcosmos.com)

Show HN: I built an app that uses math to find restaurants nearby the sweet spot (settld.space)

Show HN: A Sinclair ZX81 retro web assembler+simulator

Show HN: pywebview 6 is out (pywebview.flowrl.com)

Show HN: KARMA – An evaluation framework for Medical AI systems (karma.eka.care)

Show HN: An endless feed with history, science, tech., business (reelly.app)

Show HN: Resume Vibe Check (vibecheck.joecooper.me)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Show HN: Open-source protocol for secure tool-calling [Technical Specification] (utcp.io)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Show HN: AskPrisma – Multi-agent AI that can replace a junior data analyst (askprisma.ai)

Show HN: An open-source email archiver with full-text search capabilities (openarchiver.com)

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model (github.com)

Show HN: QuickShelf – Stop opening Finder just to drag files (quickshelf-app.slowlab.dev)

Show HN: I integrated Ollama into Excel to run local LLMs (pythonandvba.com)

Show HN: QuotationGenie – Create and Track Quotes, Invoices, and Contracts (quotegenie.com)

Show HN: Browser AI agent platform designed for reliability (github.com)

Show HN: Trayce – Burp Suite for developers (trayce.dev)

Show HN: Stasher – Burn-after-read secrets from the CLI, no server, no trust (github.com)

Show HN: A reading to remind us to keep raising our voices against oppression (childrensbookforall.org)

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude (github.com)

Show HN: I spent 6 years building a ridiculous wooden pixel display (benholmen.com)

Show HN: 1 Million Rows (1mrows.pages.dev)

Show HN: Sinkzone DNS – Forwarder that blocks everything except your allowlist (github.com)

Show HN: I've been building an ERP for manufacturing for the last 3 years (github.com)

Show HN: Synchrotron, a real-time DSP engine in pure Python (synchrotron.thatother.dev)

Show HN: Whittle – A shrinking word game (playwhittle.com)

Show HN: I analyzed why my post got 0 votes and built this (hn-gems.sensem.de)

Show HN: Aha Domain Search (ahadomainsearch.com)

Show HN: I collected 70k online communities – semantic search to find your niche (pluggo.ai)

Show HN: A new alternative to Softmax attention – live GD-Attention demos (zenodo.org)

Show HN: Aura – Like robots.txt, but for AI actions (github.com)

Show HN: An open-source e-book reader for conversational reading with an LLM (github.com)

Show HN: I indexed 100k e-commerce stores to build an API (searchagora.com)

Show HN: QRCodes Shaped with Text/Images – NitroQR (nitroqr.com)

Show HN: Play Pokémon to unlock your Wayland session (github.com)

Show HN: Sidequest.js – Background jobs for Node.js using your database (docs.sidequestjs.com)

Show HN: Tiny logic and number games I built for my kids (quizmathgenius.com)

Show HN: AI Coloring Pages Generator (aicoloringpages.app)

Show HN: When is the next Caltrain? (minimal webapp) (erikschluntz.com)

Show HN: Kimu – Open-Source Video Editor (trykimu.com)

Show HN: Reactive: A React Book for the Reluctant (written by Claude) (github.com)

Show HN: HMPL – Small Template Language for Rendering UI from Server to Client (github.com)

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases (github.com)

Show HN: I made a safe anonymous message app (subrosa.vercel.app)

Show HN: FFlags – Feature flags as code, served from the edge (fflags.com)

Operation Costs in CPU Clock Cycles (2016)

Comments (7)