Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory (dariobalinzo.medium.com)

I can recommend this book [1] on connect-4 by James Allen, the first person to solve the game. It remains very difficult for humans to play near perfectly, as the harder puzzles in the book demonstrate.

[1] https://www.waterstones.com/book/the-complete-book-of-connec...

icepush · 15h ago

Are you the same Tromp famous for playing the 5 game match against Zen?

tromp · 10h ago

Yes, as mentioned at https://senseis.xmp.net/?JohnTromp

marvinborner · 22h ago

Also, once you get used to it, it makes writing large lambda terms faster, easier and more intuitive. The terms all just kind of magically interconnect in your mind after a while. At least that has been my experience after writing a lot of code in pure de Bruijn-indexed lambda calculus using bruijn [1]. Otherwise, thinking a few reduction steps ahead in complex terms requires alpha conversion, which often ends in confusion.

Similarly, it seems like languages with de Bruijn indices are immune to LLMs, since they require an internal stack/graph of sorts and don't reduce only on a textual basis.

[1]: https://bruijn.marvinborner.de/

shotnothing · 16h ago

> Similarly, it seems like languages with de Bruijn indices are immune to LLMs

i think large chain of thought LLMs (e.g. o3) might be able to manage

kccqzy · 1d ago

While I have used de bruijn indices successfully in the past, a part of me wonders, why can't this problem be solved at the parser level before we even get a syntax tree? The parser knows when we are introducing a new variable in a lambda abstraction, and it can keep track of a stack of variables to be able to give them new fresh names so that in later passes we won't have to worry about name conflicts. I mean when writing a parser I basically already need to keep a stack of environments to resolve names. I sometimes keep variable names as simply a source code range where it is first introduced: so instead of dealing with the variable "f" I deal with (1,4)-(1,5) meaning the variable introduced in line 1 between columns 4 and 5; here I can still recover the string-based name but I also get to differentiate between identically named variables defined in different locations.

Is this a viable idea?

thunderseethe · 1d ago

Take a look at The Rapier in GHC: https://www.researchgate.net/publication/220676611_Secrets_o...

It employs a similar idea. Track the set of in scope variables and use them to unique variables on collision.

maplant · 22h ago

Yeah, it's viable. I do this for my scheme implementation. Defining a new lexical variable generates a new globally unique identifier: https://github.com/maplant/scheme-rs/blob/fb89f6481443130399...

cvoss · 23h ago

What happens in the evaluator when you have

(\ a . a a) (\ x . \ y . x y)

Variables are uniquely named at parse time, but after one beta step, you have two distinct instances each of x and y, albeit in different scopes, and after a second beta step, you have two distinct variables named y in the same scope.

kccqzy · 21h ago

Interesting example! I think at that time I was either working with simply typed lambda calculus or something slightly better (maybe Hindley-Milner based system) where the omega combinator isn't allowed. Do you know what sort of type system restriction would make the above idea sound?

cvoss · 17h ago

If you'll permit me to tweak the example, I'll now instead choose

(\ a . a a) (\ x . \y . y x) => \ y . y (\ x . \ y' . y' x)

which still exhibits the variable shadowing problem.

In a good language (where "good" means the type system lets you do as much as you want to do while still being sound) you should be able to find some type for omega. The question is whether you can find the right type for omega to let us do what we want for this example.

Try the simple thing first:

w := (\ a : * -> * . a a) : (* -> *) -> * // works because (* -> *) is a subtype of *

Here is a bad thing you shouldn't be able to do with omega:

w w : untypable // thankfully, because (* -> *) -> * is not a subtype of * -> *

And here is a perfectly safe thing to do, which is exactly our example:

f := (\ x : * . \ y : * -> T . y x) : * -> (* -> T) -> T

w f : *

Unhappily such a weak type makes it pretty unusable within larger terms... So alternatively, we can try harder to find a good type for omega

w := (\ a : * -> (* -> T) -> T . a a) : (* -> (* -> T) -> T) -> (* -> T) -> T

w f : (* -> T) -> T // better!

Something similar would happen with the Y combinator. In a good language, there ought to be many valid types you can find for `\ f . (\ x . f x x) (\ x . f x x)` but not any that would permit you to construct a diverging computation with it.

trealira · 22h ago

> This is "the capture problem", and it is quite annoying. Generally to avoid this, we need to rename everything1 in our "substituted" term to names that are free (do not occur) in the "subsitutee" so nothing is accidentally captured. What if we could introduce a notation that avoids this?

I've heard of this problem, but I've never really gotten why the solution in lambda calculus is to rename variables in lower scopes with overlapping names so that everything has a globally unique name. The way that programming languages usually solve this is by keeping a stack of symbol tables for scopes, right? Why is that never presented as a possible solution?

Of course, De Bruijn notation seems to sidestep that completely, which is convenient. I guess I'm just commenting on lambda calculus implementations in general.

enricozb · 22h ago

Lambda variables are also scoped, in the sense that a bound variable shadows one with the same name if it is in scope at bind-time. When looking at lambda calculus as purely a term rewriting system, the rule for substitution is just a bit nuanced, that's all. With DeBruijn notation, substitution is a much simpler operation.

chriswarbo · 18h ago

> I've heard of this problem, but I've never really gotten why the solution in lambda calculus is to rename variables in lower scopes with overlapping names so that everything has a globally unique name. The way that programming languages usually solve this is by keeping a stack of symbol tables for scopes, right? Why is that never presented as a possible solution?

That's a perfectly fine way to implement this-or-that language (and is hence why many/most programming languages do it). However, it's not a valid solution in lambda calculus, since lambda calculus only has variables, lambdas or applications; there's literally no way to write down "a stack of symbol tables", or whathaveyou (note that you can implement such a thing using lambda calculus, but that would be more like an interpreter for some other language; "running" that implementation would still require a way to beta-reduce the "host" language in a capture-avoiding way)

torstenvl · 21h ago

The idea is that you want functions that can return functions, those inner functions being defined in part by the arguments to the outer function. This can't happen without intentionally leaky namespaces.

Bad pseudocode:

function stepincrementgenerator(step) -> function (num) -> num + step

addsix = stepincrementgenerator(6)

addsix(5)

hollerith · 18h ago

Stack of symbol tables is more elegant, but is only understandable by programmers whereas maybe lambda calc gets taught to a lot of non-programmers?

Note that lambda calc was invented ("defined") before the invention of the digital computer.

rtpg · 12h ago

A stack of symbol tables is more machinery. It might also be hard to prove that the stack of symbol tables strategy is actually correct!

A big part of the value in the lambda calculus is in its aggressive simplicity meaning that you can do a bunch of theoretical work with relatively few concepts.

The less concepts and moving parts, the less proving necessary. And then you can layer on things and prove that they can also fit into your calculus.

I think most LC work _isn't_ about implementing something based on LC so much as providing theoretical foundations for your work being sound. And in that model you are likely going to focus on theoretical ease.

hollerith · 45m ago

Yeah. It is coming back to me now: in lambda calc, there is no universe of values to which the universe of lambda expressions might be mapped. Instead, there is a process called lambda reduction that maps the set of all lambda expressions to the set of lambda expressions in normal form.

The expression λf.λx.f(f(fx)) for example is in normal form. The number 3 and the string "3" do not exist in the formalism, but at least some of the time, λf.λx.x is used to represent 0, λf.λx.fx to represent 1, λf.λx.f(fx) to represent 2, λf.λx.f(f(fx)) to represent 3, etc.

Again, lambda calc precedes the digital computer.

practal · 10h ago

De Bruijn indices are great, I use them to construct what I call the "De Bruijn Abstraction Algebra"; it comes in two forms, one is used to give a characterisation of alpha equivalence for abstraction algebra, the other one is used to prove completeness of abstraction logic. It is described in [1], and I have described and proven correct everything in excruciating detail. That makes it quite a tough read, although in the end, it is elementary and simple.

[1] http://abstractionlogic.com

chriswarbo · 20h ago

I used de Bruijn indexes in Plumb[0], which is an alternative syntax for Lambda Calculus that embeds nicely in common scripting languages like PHP and Python.

Whilst a classic lambda expression like `λx.y` has three parts: the argument name `x`, the body/result `y`, and the `λ_._` syntax to distinguish it from a variable or application; if we instead use de Bruijn indices, then we don't need a separate part for the argument: all we need is a body wrapped in some distinguishing syntax (like `λy` in the article).

In Plumb, that distinguishing syntax is square brackets, so the "lambda term" `λy` would be written `[y]`. This wouldn't be possible if we had to write argument names, but ends up having some nice consequences:

- We can write this as-is in many languages, and the result is a data structure (usually a list) which we can easily walk over with an interpreter function.

- Variables are numbers, which can also be written as-is in many languages, and result in data that an interpreter function can handle.

- The final piece of syntax is applications, which necessarily has two parts (the function and the input value). Classic Lambda Calculus uses juxtaposition, whilst Plumb uses an explicit binary operator: the comma `,`. Again, commas can be written as-is in many languages, and they interact nicely with square bracket notation, (host language) function application, and (in the case of Python and Haskell) tuple notation.

The nice thing about using de Bruijn indexen, rather than de Bruijn "levels" or some other approaches to anonymous bindings (like those mentioned at the end of TFA), is that functions defined this way have an "intrinsic meaning"; e.g. `λ0` (or `[0]` in Plumb) is always the identity function; `λλλ(2 0 1)` (or `[[[2 , 0 , 1]]]` in Plumb) is always boolean NOT; etc. This lets us mix and match snippets of Plumb code with snippets of host-language code, embed each inside the other, and sprinkle around calls to an interpreter function willy-nilly, without worrying that the meaning depends on where our code ends up (e.g. at which "level" it reaches an interpreter).

[0] http://www.chriswarbo.net/projects/plumb/using.html

Joker_vD · 1d ago

> Now we're cool! Everything works as expected, and it takes much less work

Does it though? It's about the same amount of work as in renaming potentially conflicting variables (and more work than renaming actually conflicting variables). AFAIK, the main argument for them (by B. C. Pierce in his TAPL?) is that if you botch up the implementation of de Bruijn indices/levels, it will blow up in your face almost immediately, while the conventional renaming schemes may have subtle undetected bugs for months.

hoping1 · 23h ago

This and the other comment under this seem to be talking about the work the computer is doing at runtime. I believe the point is about the developer's work in implementing this. (For example, renaming potentially conflicting variables is clearly _less_ work than renaming only the actually conflicting variables, in this framing.) This kind of work is important for (1) compilers whose bugs are potentially very financially expensive, and (2) compilers with very intentionally-portable implementations (including but not limited to educational implementations)

gfaster · 23h ago

From my perspective, not having to deal with strings is one of the big benefits, but I believe the main argument is actually that it trivializes alpha equivalence.

As for renaming, if you're going to be looking at every binding anyway, changing most of them is no big deal (unless of course that entails doing many deep copies)

gfaster · 1d ago

I used De Bruijn indices when attempting to prove correctness of a toy language of mine. It's an elegant solution, but I found them exceedingly difficult to reason about both intuitively and in proofs. That being said, I've yet to find a better system (that isn't just augmenting De Bruijn indices) for implementing any sort of lambda calculus.

matheist · 22h ago

The Faust compiler uses de Bruijn indices internally to reuse computations. Anyone else know any other examples?

https://github.com/grame-cncm/faust

dunham · 14h ago

Idris[1] and most of the dependent typed languages that I've looked at use de Bruijn numbers. (As does my own[2].)

The Idris implementation also has a list of names in scope as an argument to the type of a Term, which makes the compiler (also Idris) check if you're making a mistake with your de Bruijn indices. (I did not do this in mine, but I should have.)

Edit: Oh, I see you mention reusing computations, that may be a bit more subtle. I'll have to look at your repo to see what you mean by that.

[1]: https://github.com/idris-lang/Idris2 [2]: https://github.com/dunhamsteve/newt

matheist · 8h ago

Thanks! (Just clarifying, it's not my project)

anthk · 20h ago

Related: http://t3x.org/clc/index.html

Code: http://t3x.org/clc/code.html

If you know Scheme, you almost don't need the book.

guerrilla · 23h ago

I've implemted this several times before (years ago), but somone tell me, why don't we just consider variables unique. Each instance of a name* should have an absolute (i.e. global) sequence number associated with it. That seems simpler than this relative sequence numbering and the complexity that comes from it.

Scrolling comments, I see somone else nentions this as well. I feel like this is something that got stuck in academia hell but in practice could be solved in more oractical ways. I remember various papers on fresh naming and name calculi from the 00's but why? We have a parser and we have numbers.

The reason I bring this up is because I recall it being easy to introduce bugs into de Bruijn calculi and also adding symbolic debugging to real-world languages was annoying for reasons I have no memory of.

* Note I said name not variable. (\ x . x (\x . x)) Would be (\ x#0 . x#0 (\x#1 . x#1)) similar to de Bruijn indices but could differ in other situations.

hoping1 · 23h ago

See cvoss's comment in another thread:

" What happens in the evaluator when you have

(\ a . a a) (\ x . \ y . x y)

This doesn't come up in most mainstream languages, but Zig and Jai are getting people excited about comptime evaluation, and dependent types need comptime evaluation as well. Uniqueness is fine until you're doing higher-order evaluation in your compilation pipeline like this, which will be more and more desirable in the future, at least if the current trends are any indication.

guerrilla · 23h ago

(\ a#0 . a#0 a#0) (\ x#1 . \ y#2 . x#1 y#2)

-->

(\ x#3 . \ y#4 . x#3 y#4)(\ x#5 . \ y#6 . x#5 y#6)

SkiFire13 · 22h ago

This implies that applying a function is no longer the same as a plain substitution. Moreover this can be pretty expensive computationally.

guerrilla · 11h ago

The semantics are the same for the user and debugging works, which is what matters. If you're using an object oriented language for implementation then this is already the computational cost. If you're using a functional language then yes there is some space overhead because it breaks sharing but that's the price. Was there something else you were thinking of?

SkiFire13 · 5h ago

It's not just space overhead, it requires you to recursively walk the argument of the application and replace all variables with fresh ones, once for every occurrence of the function argument. This means that function application is no longer O(size of the function) but rather O(size of the function + occurrences of the argument * size of the argument). If the size of the argument is big this makes the function application much much slower.

No comments yet

Precision Clock Mk IV (mitxela.com)

Show HN: AI Peer Reviewer – Multiagent System for Scientific Manuscript Analysis (github.com)

Beware of Fast-Math (simonbyrne.github.io)

AtomVM, the Erlang virtual machine for IoT devices (atomvm.net)

Show HN: PunchCard Key Backup (github.com)

Photos taken inside musical instruments (dpreview.com)

The Two Ideals of Fields (susam.net)

Acclimation of Osmoregulatory Function in Salmon (unm.edu)

Designing Pareto-optimal RAG workflows with syftr (datarobot.com)

Ask HN: Anyone making a living from a paid API?

Webb telescope helps refines Hubble constant, suggesting resolution rate debate (phys.org)

Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist (github.com)

Gradients Are the New Intervals (mattkeeter.com)

Using lots of little tools to aggressively reject the bots (lambdacreate.com)

Pure vs. Impure Iterators in Go (jub0bs.com)

Using Ed(1) as My Static Site Generator (aartaka.me)

Surprisingly fast AI-generated kernels we didn't mean to publish yet (crfm.stanford.edu)

The ‘white-collar bloodbath’ is all part of the AI hype machine (cnn.com)

Show HN: Fontofweb – Discover Fonts Used on a Website or Websites Using Font(s) (fontofweb.com)

Beating Google's kernelCTF PoW using AVX512 (anemato.de)

AccessOwl (YC S22) is hiring an AI TypeScript Engineer to connect 100s of SaaS (ycombinator.com)

The Illusion of Causality in Charts (filwd.substack.com)

C++ to Rust Phrasebook (cel.cs.brown.edu)

Simpler Backoff (commaok.xyz)

Show HN: Icepi Zero – The FPGA Raspberry Pi Zero Equivalent (github.com)

The Trackers and SDKs in ChatGPT, Claude, Grok and Perplexity (jamesoclaire.com)

Microsandbox: Virtual Machines that feel and perform like containers (github.com)

Revenge of the Chickenized Reverse-Centaurs (pluralistic.net)

Reverse engineering of Linear's sync engine (github.com)

Systems Correctness Practices at Amazon Web Services (cacm.acm.org)

Randomness Requirements for Security (2005) (datatracker.ietf.org)

The Rise of the Japanese Toilet (nytimes.com)

Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory (dariobalinzo.medium.com)

Web dev is still fun if you want it to be (github.com)

Grammar Index, Syntax Highlighting on GitHub (github.com)

Cap: Lightweight, modern open-source CAPTCHA alternative using proof-of-work (capjs.js.org)

The Darwin Gödel Machine: AI that improves itself by rewriting its own code (sakana.ai)

Show HN: MCP Defender – OSS AI Firewall for Protecting MCP in Cursor/Claude etc (mcpdefender.com)

Every 5x5 Nonogram (pixelogic.app)

What's working for YC companies since the AI boom (jamesin.substack.com)

Ray Tracing in J (idle.nprescott.com)

Valkey Turns One: Community fork of Redis (gomomento.com)

Copy Excel to Markdown Table (and vice versa) (thisdavej.com)

Silicon Valley finally has a big electronics retailer again: Micro Center opens (microcenter.com)

Radio Astronomy Software Defined Radio (Rasdr) (radio-astronomy.org)

Sid Meier's Pirates – In-depth (2017) (shot97retro.blogspot.com)

Adam Riess and the Hubble tension (theatlantic.com)

De Bruijn notation, and why it's useful (blueberrywren.dev)

Breakthrough cancer drug doubles survival in trial (bbc.com)

Jerry Lewis's “The Day the Clown Cried” discovered in Sweden after 53 years (thenationalnews.com)

De Bruijn notation, and why it's useful

Comments (40)