Show HN: I built an AI dataset generator (github.com)
37 points by matthewhefferon 1h ago 9 comments
Show HN: AI Phone Interviewer – get a call in 30 seconds
12 points by OlehSavchuk 15h ago 5 comments
Lone coder cracks 50-year puzzle to find Boggle's top-scoring board
176 DavidSJ 39 5/24/2025, 6:24:05 PM ft.com ↗
FT article: https://archive.ph/siaAO
His blog: https://www.danvk.org/blog.html
> Driven “by the thrill of discovery”, Vanderkam has searched for this board, essentially alone, since 2004. He would scrape together computing time on Google’s hardware for heavy Boggle computation, all along documenting his efforts on his blog.
> “As far as I can tell, I’m the only person who is actually interested in this problem,” Vanderkam said.
For context, many people are interested in finding high-scoring Boggle boards, usually via simulated annealing, hillclimbing, or genetic algorithms. But so far as I can tell, I'm the only one interested in _proving_ that a particular board is best. Doing that was the new result here.
I don’t remember all the longest words but sesquicentennials was one of them.
Depending on wordlist and whether you want a 16 or 17 letter word, you get "charitablenesses", "supernaturalised" (British spelling), "quadricentennials" or "quartermistresses". These boards all score considerably lower than the REPLASTERING board. Full results here: https://github.com/danvk/hybrid-boggle/#highest-scoring-boar...
I hadn't realized until I did this "side quest" that most wordlists top out at 15 letter words. That makes sense for a Scrabble dictionary, but it's not great for Boggle.
Why not compute the max possible Boggle board for other languages: French, Spanish, German, Italian, Portuguese, Czech... they each have a different set of 16 dice x 6 faces [], and of course totally different wordlists:
[boardgames.SE] "What is the dice configuration for Boggle in various languages?" https://boardgames.stackexchange.com/questions/29264/boggle-...
[] some languages' Boggle dice sets have 25x6 faces
This analysis doesn't make use of the Boggle dice. It assumes that any cell can be any letter. In practice, all high-scoring boards can be rolled with the Boggle dice. My code does assume the letters are A-Z, though, so the Ñ die in Spanish Boggle would require some code changes.
- Spanish: ñ
- Czech: no extra characters (e.g. 'E' can be used in words as E, É and Ì)
- Danish/Norwegian: do Boggle dice use æ,å,ø ? I searched but couldn't find out.
- Swedish: do Boggle dice use å,ä,ö ?
- Turkish: do Boggle dice use ç,ı,ğ,ş,ö,ü ?
- Bosnian: see https://boardgames.stackexchange.com/a/51289/2358
- Filipino: ñ . And separately in some wordgames (e.g. Scrabble knockoffs), I see 'ng' treated as a single digraph. Apparently it's officially a separate digraph, so the Tagalog alphabet ('abakada') has 28 letters.
And b) if we have rare characters on a few dice (esp. the unofficial Bosnian one proposed), that imposes restrictions on your assumption that any cell can be any letter. So you'd have to postprocess to cull a few words that couldn't legally be made with that dice-set.
After 20 years, the globally optimal Boggle board - https://news.ycombinator.com/item?id=43774702 - April 2025 (23 comments)
How did that spend only 6 hours on HN's frontpage? I'm gonna email danvk right now
Nice, I'm a big fan of this combo! Hits the right balance of prototype speed plus performance.
Solving Boggle would be a welcome addition to my Branch and Bound examples.
I teach Algorithms course and my Branch and Bound lecture usually involves solving integer linear optimization problems. Those are quite dry.
Now for the Branch and Bound to work you need what I informally call "cheat code". That is we need a way of solving a relaxed - easier version of the problem quickly . Quickly meaning with less time complexity.
So what is the approach / key insight to calculate upper and lower bound estimates in Boggle?
There are some examples in these old posts:
- https://www.danvk.org/wp/2009-08-08/breaking-3x3-boggle/inde...
- https://www.danvk.org/wp/2009-08-11/a-few-more-boggle-exampl...
These bounds are pretty effective at finding the global max for 3x3 Boggle, but 4x4 is a lot bigger.
There is a mapping from Boggle optimization to ILP, but I’ve seen no evidence that this is an efficient way to solve it. I’ve been told that branch and cut is usually better than branch and bound, but I don’t know whether it’s applicable to Boggle.
Scroll down to 'Sum/Choice trees' or Ctrl+F max_bound
https://mudcat.org/thread.cfm?threadid=45390#671943
Added: I think s/he got some of the words slightly wrong!
That said, both kinds of play has its place, in my life at least. Staying on the topic of shooters, when I play online in ranked, it's all out for me, and I enjoy that as well, in a different way. But when playing with my wife, it's never all out, always friendly.
You just have to find people to play with who are like you. Then you'll have fun. Our family rules didn't even allow 2 letter words, for example.
Our family compromise has always been, if it's valid and you know its definition (like "qi" and "za"), you can play it.
The storage i/o speeds were also crap (quoting myself here from the notes I made at the time) compared to what I got soon after when I bought a 70€ TLC SSD as upgrade for the laptop-server. They claim it runs on SSDs but, if it does, they're the world's cheapest bulk data SSDs or they have a bottleneck in ferrying data between the SAN and the VPS host, even for sequential r/w (that it has higher iops latency, I could understand)
At work, we sometimes use AWS GPUs for password cracking (we do security audits and sometimes find hashes). The GPUs that employees have just to play some games are faster than these and cost a lot less to operate, but we can't put customer data on private systems
For occasional problems that can be parallelised, sure, rent a hundred temporary systems at a cloud farm for two days; but in general I just can't recommend it to anyone. It costs an arm and a leg compared to cheaper providers or buying hardware yourself (if you're into being the sysadmin). The "we have a fleet of systems on stand-by for you" (so you can scale on demand) is a big premium that most seem to be unaware of that they're paying implicitly
I think this is a really interesting problem but I have to admit that if I'd heard it stated I would have guessed the answer was already known. I love the persistence on display here in spite of it being a "low status" problem. Reminds me of the recent discovery of a new largest (Mersenne?) prime, just someone getting nerd sniped and willing to spend their time and money.
> It took 23,000 CPU hours on a high-end 192-core machine in the cloud — time worth about $1,200, across five human days.
Pardon the pun but the sheer amount of possible boards is mind boggling. Impressive how he managed to cut it down by magnitudes.
Then you find that 26^16 is less than 6^16*16!
https://www.danvk.org/wp/2007-08-02/how-many-boggle-boards-a...