Reliable by Design: Building Guardrails for AI and Other Unpredictable Systems [video] (youtube.com)

Formal Methods in general are underrated in the industry. Pretty much no large companies except AWS (thank you Byron Cook!) use them at a large scale.

Edit: maybe there are large companies that use them behind the curtains, but AWS is the only place I know of where they publicly acknowledge how much they appreciate and use formal methods. If you know any of them, please comment and I'd be curious to learn about how they're using it!

Twirrim · 22h ago

Microsoft, Amazon, Oracle, Google, all sorts of large companies use them, and talk about it, publicly. They've all published whitepapers and resources about them. Microsoft even employs Dr. Leslie Lamport who created and maintains TLA+ (among other things).

Just for some quick examples:

Microsoft: https://github.com/Azure/azure-cosmos-tla, https://www.youtube.com/watch?v=kYX6UrY_ooA

Google: https://research.google/pubs/specifying-bgp-using-tla/, https://www.researchgate.net/publication/267120559_Formal_Mo...

Oracle: https://blogs.oracle.com/cloud-infrastructure/post/sleeping-... (note the author is a "Formal Verification Engineer", it's literally his job at Oracle to do this stuff)

Intel: https://dl.acm.org/doi/10.1145/1391469.1391675, https://link.springer.com/chapter/10.1007/978-3-540-69850-0_...

Jetbrains: https://lp.jetbrains.com/research/hott-and-dependent-types/

Arm: https://ieeexplore.ieee.org/document/9974354

dfc · 19h ago

Lamport+Microsoft was the first thing that I thought of when I read the comment. FWIW he retired at the beginning of this year.

IshKebab · 1d ago

I don't think they are underrated. They are heavily used where they work really well and bugs have a very high cost (e.g. hardware design).

For the vast majority of software though they don't really make much sense because formally verifying the software is 10-100x more work than writing it and testing it with normal methods. And formal verification of software generally requires faaaaar more expertise than most people have. (The situation is much better for hardware design because it tends to be way simpler.)

It's a very powerful tool but also extremely difficult to use.

steamrolled · 1d ago

> Formal Methods in general are underrated in the industry. Pretty much no large companies except AWS (thank you Byron Cook!) use them at a large scale.

At least Microsoft and Google poured a lot of money into this by funding well-staffed multi-year research projects. There's plenty of public trail in terms of research papers. It's just that not a whole lot came out of it otherwise.

The problem isn't that the methods are underrated, it's that they aren't compatible with the approach to software engineering in these places (huge monolithic codebases, a variety of evolving languages and frameworks, no rigid constraints on design principles).

fakedang · 5h ago

Can you ELI5 what formal methods are and how not the industry standard apparently? As a complete noob, from what I'm reading online, they're pretty much how you should approach software engineering, or really any sort of programming right?

dwrodri · 5h ago

Formal methods = “this software cannot do things it shouldn’t do”, I have formally proven it ALWAYS EXECUTES THE WAY I CONSTRAINED IT TO.

Contrast with

Testing = “My tests prove these inputs definitely produce these test outputs”

IME Formal methods struggle making contact with reality because you really only get their promise “it always does what it is constrained to do” when every abstraction underneath provides the same guarantee, I wager most CPUs/GPUs aren’t verified down to the gate level these days.

It’s just faster to “trust” tests with most of the benefit, and developing software faster is very important to capturing a market and accruing revenue if you are building your software for business reasons.

EDIT: My gate-level verification remark is a bit extreme, but it applies to higher layers of the stack. The linux kernel isn’t verified. Drivers are sometimes verified, but not often. There is an HN comment somewhere about building a filesystem in Coq, and while the operations at the filesystem layer are provably correct, the kernel interfaces still fail. The firmware still isn’t proven. The CPU itself running on has undisclosed optimizations in its caches and load/store mechanisms which just aren’t proven, but enabled it to beat the competition on benchmarks, driving sales.

tgma · 23h ago

> Pretty much no large companies except AWS (thank you Byron Cook!) use them at a large scale.

I don't think that's true at all. I suppose that depends on what you mean by formal methods and in what context you're concerned about those. Off the top of my head this comes to mind from Microsoft: https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

nhatcher · 1d ago

SAT solvers and the algorithms surrounding them are so much fun. I agree they are very unappreciated.

Shameless plug: I wrote a (admittedly very deriative) introduction with some examples I thought at the time were cool.

https://www.nhatcher.com/post/on-hats-and-sats/

cubancigar11 · 15h ago

Thanks that was quite informative, perfect for me.

sevensor · 1d ago

SMT is so much fun. The Z3 Python api lets you write your problem very directly and then gives you fast answers, even for quite large problems.

Recursing · 9h ago

Note that CVC5 has basically the same Python API ( https://cvc5.github.io/docs/cvc5-1.1.2/api/python/python.htm... ) and is often much faster

ndr · 12h ago

I did write a shockingly similar solution few months ago:

https://gist.github.com/enigma/98ea0392471fa70211251daa16ce8...

doctorpangloss · 22h ago

This post is the programming joke about Python, "import solution; solution()".

sevensor · 18h ago

Barely a joke, this is literally what using the Python Z3 bindings feels like.

cpatuzzo · 1d ago

I tried to write a programming language that compiles to SAT many years ago: https://sentient-lang.org/

superlopuh · 11h ago

I love that language and frequently show it to people. I'm sad to see that my local install doesn't work any more. I actually used it to solve a puzzle in Evoland 2 that I'm relatively sure was added as a joke, and is not solvable in a reasonable time without a solver. I'm actually doing a PhD in compilers right now, and would love to chat about sentient if you have the time. My email is sasha@lopoukhine.com.

hwayne · 1d ago

I remember you showing me this! Wow that was a long time ago.

zero_k · 21h ago

Haha, Marijn Heule who is pushing a lot of limits of SAT solving would love this. If you manage to get him excited, he might spend a few years on this problem :) He's kinda famous for solving the Boolean Pythagorean Triples problem using SAT [1]. He loves puzzles. He also got Knuth excited about a bunch of fun puzzles.

BTW, these puzzles also tend to have a lot of symmetries, which SAT solvers are pretty bad at handling. You can break them, though, using a variety of techniques, e.g. static symmetry breaking [2], or symmetric learning.

[1] https://www.cs.utexas.edu/~marijn/ptn/ [2] https://github.com/markusa4/satsuma

jononor · 1d ago

How good are current LLMs at translating problems given as text into something SMT solvers can operate on? Be it MiniZinc, Z3, Smtlib, Python bindings, etc. Anyone tried it out?

hwayne · 1d ago

Apparently they're getting very good: https://emschwartz.me/new-life-hack-using-llms-to-generate-c...

I try not to use them too much because I want to build the skill of using SMTs directly for now.

Twirrim · 22h ago

I've found them to be bad, for the most part. There aren't enough blog posts and examples of code out there for them to leach from.

Besides which, I would argue the process of writing proof in the language is integral to building the understanding you need to deal with the results. You'll spot bugs as you're creating the model.

Jaxan · 1d ago

I tried it many months ago and it was garbage. But this was trying smtlib directly. Maybe via the python bindings it works better?

stong1 · 1d ago

Reminds me of a small project I did back in undergrad: Minesweeper using a SMT solver. https://github.com/stong/smt-minesweeper

robinhouston · 1d ago

If you want a language for expressing constraint satisfaction problems that's higher-level than SAT, I think MiniZinc is pretty interesting. https://www.minizinc.org/

naet · 1d ago

I actually wrote a backtracking solution to the LinkedIn queens game a while ago (and the tango game).

I know nothing about SMT or SAT and I imagine they might be faster, but the backtracking appears to solve just as instantaneously when you push the button.

Might be cool to learn a bit about SMT or SAT. Not sure how broadly applicable they are but I've seen people use them to solve some difficult advent of code problems if nothing else.

gbacon · 21h ago

Definite broad applicability.

NP-complete are the hardest problems in NP. Cook in 1971 proved SAT to be in NP-complete. In the worst case for any other problem in NP, we can quickly (i.e., in polynomial time) convert instances of that problem into instances of SAT. In other words, we can use SAT to solve any problem in NP.

It turns out there are many problems in NP-complete. The fast conversion applies among them too, so in some sense, problems in NP-complete are all the same because we can use them all to solve instances of each other. However, for some of those problem instances the best known algorithm is to try all possible inputs, which requires exponential time (very, very slow for even modestly large inputs).

Lots of research has been and continues to be poured into SAT because any gains automatically yield improvements to everything else in NP-complete and the rest of NP. Using a SAT solver allows you to hitch a ride more or less for free on the results of all that research. Each incremental improvement to SAT solvers benefits programs that use them.

As the author noted, forming SAT instances by hand can be a pain. SMT or SAT Modulo Theories is sort of a high-level language that “compiles down” to SAT. Expressing problems with SMT is more natural and reduces the burden of converting your problem to SMT and SMT solutions back to your problem domain.

TheBozzCL · 1d ago

Hah, about a month ago I wrote a DLX solver for exact cover problems and LiQueens was one of my first implementations.

Next I want to try to solve the Tango and Zip games.

b0a04gl · 1d ago

you mentioned SMT is slower than SAT and left it there, but that feels incomplete. in problems like this, solve time barely matters unless you’re generating at scale. the real weight is in how fast you can write, refactor, and trust the constraints. SAT might give faster outputs, but SMT usually gets you to correct models quicker with less friction. wondering if you actually benchmarked both and stuck with SAT on numbers, or if it was more of a default comfort pick. felt like a missed moment to shift the lens from solver speed to model dev loop speed

osmarks · 1d ago

I was briefly looking into using SMT for Minecraft autocrafting, but it turns out you can do integer linear programming and the mapping is easier.

spencerflem · 1d ago

Some additional context: Outside of Microsoft, this puzzle is often known as Star Battle.

Terrific little puzzle, highly recommend it!

https://www.puzzles.wiki/wiki/Star_Battle

https://www.puzzle-star-battle.com/?size=5

anArbitraryOne · 20h ago

What about a CP solver?

refulgentis · 22h ago

> Which is the correct solution to the queens puzzle. I didn't benchmark the solution times, but I imagine it's considerably slower than a raw SAT solver. Glucose is really, really fast.

I'm new to this area, neither the original article nor the link to Glucose have enough info to tell me order of magnitude here: milliseconds? hours?

OutOfHere · 1d ago

The article fails to even say what SMT is. It also fails to describe and explain it. This article should help:

https://en.wikipedia.org/wiki/Satisfiability_modulo_theories

Jtsummers · 1d ago

In the article:

> "Satisfiability Modulo Theories"

rook37 · 1d ago

Huh, where? I didn't see it going through and opening it in reader mode and ctrl+f-ing any of those words turns up nothing for me still.

Jtsummers · 1d ago

He embeds the footnotes in the web version (it's a proper footnote in the email newsletter version). Find the "..." in this paragraph:

> Ryan solved this by writing Queens as a SAT problem, expressing properties like "there is exactly one queen in row 3" as a large number of boolean clauses. Go read his post, it's pretty cool. What leapt out to me was that he used CVC5, an SMT solver. (...) SMT solvers are "higher-level" than SAT, capable of handling more data types than just boolean variables. It's a lot easier to solve the problem at the SMT level than at the SAT level. To show this, I whipped up a short demo of solving the same problem in Z3 (via the Python API).

Mini Incident Report about GCP Outage (status.cloud.google.com)

Radio pulses detected coming from ice in Antarctica (psu.edu)

Oneofs are a disaster. Protovalidate has fixed them (buf.build)

Iran retaliates with missile attacks on Israel (bbc.com)

Wing and Walmart are bringing drone delivery to 100 new stores (theverge.com)

Total Cost Analysis: Passwordless Cuts Authentication Expenses by 65% (guptadeepak.com)

Use your virtual card number for Apple Cash (support.apple.com)

The crisis of zombie social science (forkingpaths.co)

iCraft Editor – Help you easily create excellent 3D architecture diagrams (github.com)

QSafe: The First Quantum-Secure, Multi-Chain Crypto Wallet (qsafewallet.com)

Show HN: Equity Copilot – AI-powered equity grant assistant for tech employees (equitycopilot.app)

Tech billionaires are making a risky bet with humanity's future (technologyreview.com)

Anne Wojcicki Wins Bidding for 23andMe (wsj.com)

Show HN: The fastest way to create carousels (lumeo.me)

GameStop CEO Says the Company's Future Isn't in Games (gamespot.com)

Reliable by Design: Building Guardrails for AI and Other Unpredictable Systems [video] (youtube.com)

Hit songs are getting shorter (economist.com)

3D printing metal molds poised to accelerate US auto manufacturing (techxplore.com)

What does the DEI-free commitment mean? · Issue #40 · X11Libre/xserver (github.com)

AI and LLM Takes from the Field (medium.com)

Part of Alaska is under a heat advisory. That's a first (washingtonpost.com)

Rethinking the Patent Office (forbes.com)

The average ChatGPT request uses ~0.34Wh (engineeringprompts.substack.com)

After millions of years, why are carnivorous plants still so small? (smithsonianmag.com)

Open-source granola (meetings summary) (omi.me)

Powering next-gen services with AI in regulated industries (technologyreview.com)

GTM advice from my experience at 2 YC Startups

Hackable AlphaFold 3 without Docker or MSAs (github.com)

Show HN: A Visual way to build complex prompts - Looking for product validation (thepromptindex.com)

Silicon Valley tech execs are joining the US Army Reserve (techcrunch.com)

The Israeli Attack Against Iran (mearsheimer.substack.com)

Ask HN: Has anyone digitally modeled the impact and collapse of the twin towers?

In Twist, U.S. Diplomacy Served As Cover for Israeli Surprise Attack (wsj.com)

Show HN: Free tool to download Microsoft Learn video (github.com)

The Growing Risk of Malicious Browser Extensions (socket.dev)

There's another leak on the ISS, but NASA is not saying much about it (arstechnica.com)

Apple's Liquid Glass is prep work for AR interfaces, not just a design refresh (omc345.substack.com)

Plunder: How Private Equity is reshaping HVAC (heatpumped.org)

Show HN: Infrabase: Natural language rules engine to manage your cloud account (infrabase.co)

The Viable Systems Model (fffej.substack.com)

Build It Twice (russellpollari.substack.com)

Observability with real insights and auto-fixes (cloudgrip.ai)

First Fossil Proof Found That Long-Necked Dinosaurs Were Vegetarians (nytimes.com)

The Postgres Developers guide to updates and deletes in ClickHouse (clickhouse.com)

The Return of Forgotten Math in Computer Graphics [pdf] (2012) (terathon.com)

Ask HN: Are senior engineers not senior anymore?

LLMs.txt Generator with Automated Monitoring (github.com)

All Starlink Direct to Cell Gen 1 satellites have now been launched (twitter.com)

Anti-Tesla demonstration highlights safety concerns with self-driving vehicles (statesman.com)

Things Jeremy says to do (2019) (forums.fast.ai)

Solving LinkedIn Queens with SMT

Comments (38)