Show HN: The Magic of Code – book about the wonders and weirdness of computation (themagicofcode.com)

This is a good set of slides. Dan is a good guy. There are a few nits I would pick. Sqrt(N) convergence comes from independence not normality -- based on independence => linearity of variance. { So, N IID samples of any distribution have a sum with N times higher variance, but then dividing by N you et sqrt(N). } There is, of course, a higher order relationship between the variance / "scale^2" of the distro and its tails which statisticians refer to as "shape". He later goes on to mention the dependence problem, though, and the minimum dt solution that, relied upon by, e.g., https://github.com/c-blake/bu/blob/main/doc/tim.md. So, it's all good. He may have covered it in voice, even.

He also mentions the Sattolo used by https://github.com/c-blake/bu/blob/main/doc/memlat.md to do his memory latency measurements. One weird thing was how he said because of 1 byte/cycle is 4GB/s things are "easily CPU bound" while I feel like I've been "fighting The Memory Wall for at least 3 decades now..." even just from super-scalar CPUs, but he later does some vectorization stuff. That more relates to what calcs you are doing, of course, but high bandwidth memory is a big part of what nVidia is selling.

appreciatorBus · 11h ago

Looks like this was delivered earlier today at SEA 2025, I hope there's video that will be available soon!

https://x.com/lemire/status/1947615932702200138

curiouscoding · 6h ago

I don't think talks are being recorded, unfortunately.

benob · 2h ago

Are there efforts to include the neccessary context in compilers to autovectorize?

yvdriess · 2h ago

What do you mean with necessary context?

Modern compilers all autovectorize really well. Usually writing plain canonical loops with plain C arrays is a good way to write portable optimal SIMD code. The usual workflow I use is to translate the vector notation (RIP Cilk+ array syntax) in my paper notes to plain C loops. The compiler's optimization report (-qopt-report for icx, gcc has -fopt-info-vec and -fopt-info-vec-missed) gives feedback on what optimizations it considered and why it did not apply them. In more complex scenarios it can be helpful to add `#pragma omp simd` pragmas or similar to overrule the C semantics.

anonymousDan · 56m ago

Is there a talk to go with the slides by any chance?

DrNosferatu · 3h ago

I’m surprised so much branching isn’t more costly.

yvdriess · 1h ago

Branch predictors have gotten really good and it often now makes more sense to rely on it rather than working away the branches.

For example, modern compilers will very rarely introduce conditional moves (cmov) x86 because they are nearly always slower than simply branching. It might be counter intuitive, but a branch prediction breaks the dependencies of the micro-ops between the conditional and the clause. So if your cmov's conditional depends on a load, you need to wait for that load complete before it can execute.

Always benchmark with at-scale data and measure.

mycatisblack · 2h ago

Depends on the branch predictor: correct branch, everything’s loaded and set. Wrong branch: flush it all and load again.

If you know the branch predictor algorithm you can optimise for it.

Edit: it’s on p27

NooneAtAll3 · 9h ago

apple still uses utf16?

vanderZwan · 6h ago

JavaScript does, so the web does, so by extension Apple probably does care about utf16.

jiggawatts · 6h ago

Also: Java, DotNet, and Windows all use 2-byte char types.

looperhacks · 3h ago

Akchyually! These days, Java uses Latin1 if no characters outside Latin1 are used. Only if full Unicode is necessary, it uses UTF-16

josephg · 3h ago

Apple does something similar for strings in objc and Swift. They do lots of other optimisations besides - like small string optimisation for short strings to avoid heap allocations.

flohofwoe · 2h ago

NSString is UTF-16 internally: https://developer.apple.com/documentation/foundation/nsstrin...

...it's trivial to get UTF-8 strings into and out of an NSString though so the internal representation doesn't matter all that much.

More importantly, all of the actual user-facing side of macOS is UTF-8 (e.g. you can simply copy-paste an UTF-8 encoded UNICODE string literal into a C source file, printf() it and it will show up correctly on the terminal without tinkering with text editor or locale settings).

markasoftware · 7h ago

is this talk about apple? Regardless, lots of language runtimes still use utf16 (eg java, qt, haskell), and windows certainly still uses utf16.

phkahler · 8h ago

Pentium 4 didn't hit 3.8GHz. It melted at 1.4 or so.

wtallis · 7h ago

The Pentium 3 is what eventually topped out at 1.4 GHz, for the 130nm Tualatin parts introduced in 2001. The Pentium 4 started at 1.4GHz and 1.5GHz with the 180nm Willamette parts introduced in 2000. Those were eventually released with speeds up to 2.0GHz. The 130nm Pentium 4 Northwood reached 3.4GHz in 2004, and the 90nm Pentium 4 Prescott hit 3.8GHz later in 2004.

vrighter · 7m ago

Funny... my pentium 4 ran at 1.6GHz

necubi · 7h ago

The Pentium 4 HT 670, released in 2005, came factory-clocked at 3.8 (https://www.techpowerup.com/cpu-specs/pentium-4-ht-670.c20)

Netburst lasted a long time as intel was floundering, before Core Duo was released in 2006.

bayindirh · 5h ago

Intel released a couple of Pentium 4's from different cores topping at 3.8GHz [0].

Tom's Hardware overclocked one of these Northwood Pentium 4's to 5 GHz with liquid nitrogen and a compressor [1].

Those were the days, honestly.

[0]: https://en.wikipedia.org/wiki/Pentium_4

[1]: https://www.youtube.com/watch?v=z0jQZxH7NgM

anthk · 2h ago

It hit 3.8 and for a while it surpassed multiple cores' performance because games were built to work on single cores and multithreaded/multicore. It happened with some emulators.

IgnaciusMonk · 11h ago

I do not want to be rude but this is exactly why LLVM being in hands of same entity which controls access to / owns platform is insane.

edit - #64 E ! Also, i always say, human body is most error prone measuring device humans have in their disposal.

bayindirh · 5h ago

Both LLVM and GCC is being supported by processor manufacturers directly. Yes, Apple and Intel has their own LLVM versions, but as long as don't break compatibility with GCC and doesn't prevent porting explicitly, I don't see a problem.

I personally use GCC suite exclusively though, and while LLVM is not my favorite compiler, we can thank them for spurring GCC team into action for improving their game.

gleenn · 11h ago

Can you be more explicit? Is it because they are optimizing too much to a single platform that isn't generalizable to other compilers or architectures? What's your specific gripe?

almostgotcaught · 10h ago

Whose hands exactly is LLVM in?

IgnaciusMonk · 11h ago

Also to be more controversial. - redhat deprecated x86_64_v1 & x86_64v2 . and people were crying because of that....

volf_ · 7h ago

A commercial enterprise is dropping support for older cpu architectures in their newer OSs so they can improve the average performance of the deployed software?

Don't see how that's controversial. It's something that doesn't matter to their customers or their business.

bayindirh · 5h ago

The newest x86_64-v1 server is older than a decade now, and I'm not sure -v2 is deprecated. RockyLinux 9 is running happily on -v2 hardware downstairs.

Oh, -v2 is deprecated for RH10. Not a big deal, honestly.

From a fleet perspective, I prefer more code uses more advanced instructions on my processors. Efficiency goes up on hot code paths possibly. What's not to love?

homebrewer · 1h ago

One more reason to switch to a better alternative:

https://lwn.net/Articles/1010868/

https://almalinux.org/blog/2025-05-27-welcoming-almalinux-10...

tl;dr: AlmaLinux will support v2 in EL10 as a separate rebuild in the near future.

jonathanspw · 1h ago

We already support it - the build is live.

https://repo.almalinux.org/almalinux/10/isos/x86_64_v2/

We even do a full EPEL rebuild for it as well.

scns · 4h ago

The newest x86_64-v1 server is older than a decade now

Did you mean v3?

bayindirh · 4h ago

No, v1. I mean, you can't buy a x86_64-v1 server for a decade now, and if you have one and it's alive, it's a very slim chance it's working unless it's new old stock.

If it has seen any decent amount of workload during its lifetime, it possibly has a couple of ICs which reached their end of their electronic life and malfunctioning.

anthk · 2h ago

Gemini Lake runs pretty well. If that happens, bye Fedora Bazzite with Linux-Libre on top.

Show HN: Header-only GIF decoder in pure C – no malloc, easy to use

Show HN: WTFfmpeg – Natural Language to FFmpeg Translator (github.com)

Show HN: Phind.design – Image editor & design tool powered by 4o / custom models (phind.design)

Show HN: A word of the day that doesn't suck

Show HN: Made my first iOS app free offline currency converter (apps.apple.com)

Show HN: Compass CNC – Open-source handheld CNC router (compassrouter.com)

Show HN: The Magic of Code – book about the wonders and weirdness of computation (themagicofcode.com)

Show HN: Any-LLM – Lightweight router to access any LLM Provider (github.com)

Show HN: Bookmarks in one smart place with SaveTo (saveto.io)

Show HN: Lotas – Cursor for RStudio (lotas.ai)

Show HN: Llm-benchmark – Benchmarks LLM-optimized code across multiple providers (github.com)

Show HN: A rudimentary game engine to build four dimensional VR evironments (brainpaingames.com)

Show HN: Go Command-streaming lib for distributed systems (3x faster than gRPC) (github.com)

Show HN: My GPU Fan Saga – A DIY ATX Fan Controller (shafq.at)

Show HN: Pogocache – Fast caching software (github.com)

Show HN: Built an email marketing platform after paying $230/month (fertit.com)

Show HN: Conductor, a Mac app that lets you run a bunch of Claude Codes at once (conductor.build)

Show HN: X11 desktop widget that shows location of your network peers on a map (github.com)

Show HN: A CLI tool for creating Typst screenplay projects (github.com)

Show HN: ggc – A terminal-based Git CLI written in Go (github.com)

Show HN: Dead simple transcription command-line (github.com)

Show HN: My Side Project: A Free Mindful Breathing App (mentalhealthactivity.com)

Show HN: InkyCut – The open-source Canva alternative with a vibe editor (inkycut.com)

Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty (github.com)

Show HN: Create your color palettes in context, not isolation (colorpal-sage.vercel.app)

Show HN: Inkverse - An Indie comics platform (inkverse.co)

Show HN: Checkmate, an infrastructure, uptime and web page monitoring tool (checkmate.so)

Show HN: Intercepting proxy for semantic search over visited pages (github.com)

Show HN: Am-I-vibing, detect agentic coding environments (github.com)

Show HN: Giti – Natural Language to Git Commands with Local LLM (github.com)

Show HN: Improving search ranking with chess Elo scores (zeroentropy.dev)

Show HN: SandCrab – An AWS S3 GUI for macOS (sandcrab.io)

Show HN: SynSniff- Detect Minecraft Client OS via TCP/IP Fingerprinting (github.com)

Show HN: MCP Jetpack – The easiest way to get started with MCP in Cursor (mcpjetpack.com)

Show HN: I built library management app for those who outgrew spreadsheets (librari.io)

Show HN: Molab, a cloud-hosted Marimo notebook workspace (molab.marimo.io)

Show HN: Dyad – build AI apps locally, no cloud (github.com)

Show HN: McpX – A C# Library to Communicate with Mitsubishi PLCs via MC Protocol (github.com)

Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL (matthieulc.com)

Show HN: Display Photos on a World Map (worldsnap.surge.sh)

Show HN: BrightShot – AI photo enhancement and virtual staging for real estate (bright-shot.com)

Show HN: OrioleDB Beta12 Features and Benchmarks (orioledb.com)

Show HN: Nightcrawler – A scanner that finds low-hanging fruit while you work (github.com)

Show HN: A P2P Contribution-Based Protocol for Civic Trust (github.com)

Show HN: We made our own inference engine for Apple Silicon (github.com)

Show HN: Featurevisor v2.0 – declarative feature flags management with Git (featurevisor.com)

Show HN: Beyond Z²+C, Plot Any Fractal (juliascope.com)

Show HN: Context42 – capture your coding style from across your projects (github.com)

Show HN: Simple, reliable and efficient distributed task queues for C++17 (github.com)

Show HN: Simulating autonomous drone formations (github.com)

Algorithms for Modern Processor Architectures

Comments (34)