Just ran into your writeup a day ago. Truly amazing!
Disposal8433 · 23h ago
To you and OP and the others: you're crazy or geniuses, but thanks for posting because that's inspiring whether I understand it or not.
jonahx · 12h ago
Beautiful.
Unrelated, but what would be your advice/study plan for a human to reach near perfect game play in Connect 4, and how difficult would it be?
tromp · 10h ago
I can recommend this book [1] on connect-4 by James Allen, the first person to solve the game. It remains very difficult for humans to play near perfectly, as the harder puzzles in the book demonstrate.
Also, once you get used to it, it makes writing large lambda terms faster, easier and more intuitive. The terms all just kind of magically interconnect in your mind after a while. At least that has been my experience after writing a lot of code in pure de Bruijn-indexed lambda calculus using bruijn [1]. Otherwise, thinking a few reduction steps ahead in complex terms requires alpha conversion, which often ends in confusion.
Similarly, it seems like languages with de Bruijn indices are immune to LLMs, since they require an internal stack/graph of sorts and don't reduce only on a textual basis.
> Similarly, it seems like languages with de Bruijn indices are immune to LLMs
i think large chain of thought LLMs (e.g. o3) might be able to manage
kccqzy · 1d ago
While I have used de bruijn indices successfully in the past, a part of me wonders, why can't this problem be solved at the parser level before we even get a syntax tree? The parser knows when we are introducing a new variable in a lambda abstraction, and it can keep track of a stack of variables to be able to give them new fresh names so that in later passes we won't have to worry about name conflicts. I mean when writing a parser I basically already need to keep a stack of environments to resolve names. I sometimes keep variable names as simply a source code range where it is first introduced: so instead of dealing with the variable "f" I deal with (1,4)-(1,5) meaning the variable introduced in line 1 between columns 4 and 5; here I can still recover the string-based name but I also get to differentiate between identically named variables defined in different locations.
Variables are uniquely named at parse time, but after one beta step, you have two distinct instances each of x and y, albeit in different scopes, and after a second beta step, you have two distinct variables named y in the same scope.
kccqzy · 21h ago
Interesting example! I think at that time I was either working with simply typed lambda calculus or something slightly better (maybe Hindley-Milner based system) where the omega combinator isn't allowed. Do you know what sort of type system restriction would make the above idea sound?
cvoss · 17h ago
If you'll permit me to tweak the example, I'll now instead choose
(\ a . a a) (\ x . \y . y x)
=> \ y . y (\ x . \ y' . y' x)
which still exhibits the variable shadowing problem.
In a good language (where "good" means the type system lets you do as much as you want to do while still being sound) you should be able to find some type for omega. The question is whether you can find the right type for omega to let us do what we want for this example.
Try the simple thing first:
w := (\ a : * -> * . a a) : (* -> *) -> * // works because (* -> *) is a subtype of *
Here is a bad thing you shouldn't be able to do with omega:
w w : untypable // thankfully, because (* -> *) -> * is not a subtype of * -> *
And here is a perfectly safe thing to do, which is exactly our example:
f := (\ x : * . \ y : * -> T . y x) : * -> (* -> T) -> T
w f : *
Unhappily such a weak type makes it pretty unusable within larger terms... So alternatively, we can try harder to find a good type for omega
w := (\ a : * -> (* -> T) -> T . a a) : (* -> (* -> T) -> T) -> (* -> T) -> T
w f : (* -> T) -> T // better!
Something similar would happen with the Y combinator. In a good language, there ought to be many valid types you can find for `\ f . (\ x . f x x) (\ x . f x x)` but not any that would permit you to construct a diverging computation with it.
trealira · 22h ago
> This is "the capture problem", and it is quite annoying. Generally to avoid this, we need to rename everything1 in our "substituted" term to names that are free (do not occur) in the "subsitutee" so nothing is accidentally captured. What if we could introduce a notation that avoids this?
I've heard of this problem, but I've never really gotten why the solution in lambda calculus is to rename variables in lower scopes with overlapping names so that everything has a globally unique name. The way that programming languages usually solve this is by keeping a stack of symbol tables for scopes, right? Why is that never presented as a possible solution?
Of course, De Bruijn notation seems to sidestep that completely, which is convenient. I guess I'm just commenting on lambda calculus implementations in general.
enricozb · 22h ago
Lambda variables are also scoped, in the sense that a bound variable shadows one with the same name if it is in scope at bind-time. When looking at lambda calculus as purely a term rewriting system, the rule for substitution is just a bit nuanced, that's all. With DeBruijn notation, substitution is a much simpler operation.
chriswarbo · 18h ago
> I've heard of this problem, but I've never really gotten why the solution in lambda calculus is to rename variables in lower scopes with overlapping names so that everything has a globally unique name. The way that programming languages usually solve this is by keeping a stack of symbol tables for scopes, right? Why is that never presented as a possible solution?
That's a perfectly fine way to implement this-or-that language (and is hence why many/most programming languages do it). However, it's not a valid solution in lambda calculus, since lambda calculus only has variables, lambdas or applications; there's literally no way to write down "a stack of symbol tables", or whathaveyou (note that you can implement such a thing using lambda calculus, but that would be more like an interpreter for some other language; "running" that implementation would still require a way to beta-reduce the "host" language in a capture-avoiding way)
torstenvl · 21h ago
The idea is that you want functions that can return functions, those inner functions being defined in part by the arguments to the outer function. This can't happen without intentionally leaky namespaces.
Bad pseudocode:
function stepincrementgenerator(step) -> function (num) -> num + step
addsix = stepincrementgenerator(6)
addsix(5)
11
hollerith · 18h ago
Stack of symbol tables is more elegant, but is only understandable by programmers whereas maybe lambda calc gets taught to a lot of non-programmers?
Note that lambda calc was invented ("defined") before the invention of the digital computer.
rtpg · 12h ago
A stack of symbol tables is more machinery. It might also be hard to prove that the stack of symbol tables strategy is actually correct!
A big part of the value in the lambda calculus is in its aggressive simplicity meaning that you can do a bunch of theoretical work with relatively few concepts.
The less concepts and moving parts, the less proving necessary. And then you can layer on things and prove that they can also fit into your calculus.
I think most LC work _isn't_ about implementing something based on LC so much as providing theoretical foundations for your work being sound. And in that model you are likely going to focus on theoretical ease.
hollerith · 45m ago
Yeah. It is coming back to me now: in lambda calc, there is no universe of values to which the universe of lambda expressions might be mapped. Instead, there is a process called lambda reduction that maps the set of all lambda expressions to the set of lambda expressions in normal form.
The expression λf.λx.f(f(fx)) for example is in normal form. The number 3 and the string "3" do not exist in the formalism, but at least some of the time, λf.λx.x is used to represent 0, λf.λx.fx to represent 1, λf.λx.f(fx) to represent 2, λf.λx.f(f(fx)) to represent 3, etc.
Again, lambda calc precedes the digital computer.
practal · 10h ago
De Bruijn indices are great, I use them to construct what I call the "De Bruijn Abstraction Algebra"; it comes in two forms, one is used to give a characterisation of alpha equivalence for abstraction algebra, the other one is used to prove completeness of abstraction logic. It is described in [1], and I have described and proven correct everything in excruciating detail. That makes it quite a tough read, although in the end, it is elementary and simple.
I used de Bruijn indexes in Plumb[0], which is an alternative syntax for Lambda Calculus that embeds nicely in common scripting languages like PHP and Python.
Whilst a classic lambda expression like `λx.y` has three parts: the argument name `x`, the body/result `y`, and the `λ_._` syntax to distinguish it from a variable or application; if we instead use de Bruijn indices, then we don't need a separate part for the argument: all we need is a body wrapped in some distinguishing syntax (like `λy` in the article).
In Plumb, that distinguishing syntax is square brackets, so the "lambda term" `λy` would be written `[y]`. This wouldn't be possible if we had to write argument names, but ends up having some nice consequences:
- We can write this as-is in many languages, and the result is a data structure (usually a list) which we can easily walk over with an interpreter function.
- Variables are numbers, which can also be written as-is in many languages, and result in data that an interpreter function can handle.
- The final piece of syntax is applications, which necessarily has two parts (the function and the input value). Classic Lambda Calculus uses juxtaposition, whilst Plumb uses an explicit binary operator: the comma `,`. Again, commas can be written as-is in many languages, and they interact nicely with square bracket notation, (host language) function application, and (in the case of Python and Haskell) tuple notation.
The nice thing about using de Bruijn indexen, rather than de Bruijn "levels" or some other approaches to anonymous bindings (like those mentioned at the end of TFA), is that functions defined this way have an "intrinsic meaning"; e.g. `λ0` (or `[0]` in Plumb) is always the identity function; `λλλ(2 0 1)` (or `[[[2 , 0 , 1]]]` in Plumb) is always boolean NOT; etc. This lets us mix and match snippets of Plumb code with snippets of host-language code, embed each inside the other, and sprinkle around calls to an interpreter function willy-nilly, without worrying that the meaning depends on where our code ends up (e.g. at which "level" it reaches an interpreter).
> Now we're cool! Everything works as expected, and it takes much less work
Does it though? It's about the same amount of work as in renaming potentially conflicting variables (and more work than renaming actually conflicting variables). AFAIK, the main argument for them (by B. C. Pierce in his TAPL?) is that if you botch up the implementation of de Bruijn indices/levels, it will blow up in your face almost immediately, while the conventional renaming schemes may have subtle undetected bugs for months.
hoping1 · 23h ago
This and the other comment under this seem to be talking about the work the computer is doing at runtime. I believe the point is about the developer's work in implementing this. (For example, renaming potentially conflicting variables is clearly _less_ work than renaming only the actually conflicting variables, in this framing.) This kind of work is important for (1) compilers whose bugs are potentially very financially expensive, and (2) compilers with very intentionally-portable implementations (including but not limited to educational implementations)
gfaster · 23h ago
From my perspective, not having to deal with strings is one of the big benefits, but I believe the main argument is actually that it trivializes alpha equivalence.
As for renaming, if you're going to be looking at every binding anyway, changing most of them is no big deal (unless of course that entails doing many deep copies)
gfaster · 1d ago
I used De Bruijn indices when attempting to prove correctness of a toy language of mine. It's an elegant solution, but I found them exceedingly difficult to reason about both intuitively and in proofs. That being said, I've yet to find a better system (that isn't just augmenting De Bruijn indices) for implementing any sort of lambda calculus.
matheist · 22h ago
The Faust compiler uses de Bruijn indices internally to reuse computations. Anyone else know any other examples?
Idris[1] and most of the dependent typed languages that I've looked at use de Bruijn numbers. (As does my own[2].)
The Idris implementation also has a list of names in scope as an argument to the type of a Term, which makes the compiler (also Idris) check if you're making a mistake with your de Bruijn indices. (I did not do this in mine, but I should have.)
Edit: Oh, I see you mention reusing computations, that may be a bit more subtle. I'll have to look at your repo to see what you mean by that.
If you know Scheme, you almost don't need the book.
guerrilla · 23h ago
I've implemted this several times before (years ago), but somone tell me, why don't we just consider variables unique. Each instance of a name* should have an absolute (i.e. global) sequence number associated with it. That seems simpler than this relative sequence numbering and the complexity that comes from it.
Scrolling comments, I see somone else nentions this as well. I feel like this is something that got stuck in academia hell but in practice could be solved in more oractical ways. I remember various papers on fresh naming and name calculi from the 00's but why? We have a parser and we have numbers.
The reason I bring this up is because I recall it being easy to introduce bugs into de Bruijn calculi and also adding symbolic debugging to real-world languages was annoying for reasons I have no memory of.
* Note I said name not variable. (\ x . x (\x . x)) Would be (\ x#0 . x#0 (\x#1 . x#1)) similar to de Bruijn indices but could differ in other situations.
hoping1 · 23h ago
See cvoss's comment in another thread:
"
What happens in the evaluator when you have
(\ a . a a) (\ x . \ y . x y)
Variables are uniquely named at parse time, but after one beta step, you have two distinct instances each of x and y, albeit in different scopes, and after a second beta step, you have two distinct variables named y in the same scope.
"
This doesn't come up in most mainstream languages, but Zig and Jai are getting people excited about comptime evaluation, and dependent types need comptime evaluation as well. Uniqueness is fine until you're doing higher-order evaluation in your compilation pipeline like this, which will be more and more desirable in the future, at least if the current trends are any indication.
This implies that applying a function is no longer the same as a plain substitution. Moreover this can be pretty expensive computationally.
guerrilla · 11h ago
The semantics are the same for the user and debugging works, which is what matters.
If you're using an object oriented language for implementation then this is already the computational cost. If you're using a functional language then yes there is some space overhead because it breaks sharing but that's the price. Was there something else you were thinking of?
SkiFire13 · 5h ago
It's not just space overhead, it requires you to recursively walk the argument of the application and replace all variables with fresh ones, once for every occurrence of the function argument. This means that function application is no longer O(size of the function) but rather O(size of the function + occurrences of the argument * size of the argument). If the size of the argument is big this makes the function application much much slower.
[1] https://www.ioccc.org/2012/tromp/
[1] https://github.com/melvinzhang/binary-lambda-calculus
Unrelated, but what would be your advice/study plan for a human to reach near perfect game play in Connect 4, and how difficult would it be?
[1] https://www.waterstones.com/book/the-complete-book-of-connec...
Similarly, it seems like languages with de Bruijn indices are immune to LLMs, since they require an internal stack/graph of sorts and don't reduce only on a textual basis.
[1]: https://bruijn.marvinborner.de/
i think large chain of thought LLMs (e.g. o3) might be able to manage
Is this a viable idea?
It employs a similar idea. Track the set of in scope variables and use them to unique variables on collision.
(\ a . a a) (\ x . \ y . x y)
Variables are uniquely named at parse time, but after one beta step, you have two distinct instances each of x and y, albeit in different scopes, and after a second beta step, you have two distinct variables named y in the same scope.
(\ a . a a) (\ x . \y . y x) => \ y . y (\ x . \ y' . y' x)
which still exhibits the variable shadowing problem.
In a good language (where "good" means the type system lets you do as much as you want to do while still being sound) you should be able to find some type for omega. The question is whether you can find the right type for omega to let us do what we want for this example.
Try the simple thing first:
w := (\ a : * -> * . a a) : (* -> *) -> * // works because (* -> *) is a subtype of *
Here is a bad thing you shouldn't be able to do with omega:
w w : untypable // thankfully, because (* -> *) -> * is not a subtype of * -> *
And here is a perfectly safe thing to do, which is exactly our example:
f := (\ x : * . \ y : * -> T . y x) : * -> (* -> T) -> T
w f : *
Unhappily such a weak type makes it pretty unusable within larger terms... So alternatively, we can try harder to find a good type for omega
w := (\ a : * -> (* -> T) -> T . a a) : (* -> (* -> T) -> T) -> (* -> T) -> T
w f : (* -> T) -> T // better!
Something similar would happen with the Y combinator. In a good language, there ought to be many valid types you can find for `\ f . (\ x . f x x) (\ x . f x x)` but not any that would permit you to construct a diverging computation with it.
I've heard of this problem, but I've never really gotten why the solution in lambda calculus is to rename variables in lower scopes with overlapping names so that everything has a globally unique name. The way that programming languages usually solve this is by keeping a stack of symbol tables for scopes, right? Why is that never presented as a possible solution?
Of course, De Bruijn notation seems to sidestep that completely, which is convenient. I guess I'm just commenting on lambda calculus implementations in general.
That's a perfectly fine way to implement this-or-that language (and is hence why many/most programming languages do it). However, it's not a valid solution in lambda calculus, since lambda calculus only has variables, lambdas or applications; there's literally no way to write down "a stack of symbol tables", or whathaveyou (note that you can implement such a thing using lambda calculus, but that would be more like an interpreter for some other language; "running" that implementation would still require a way to beta-reduce the "host" language in a capture-avoiding way)
Bad pseudocode:
function stepincrementgenerator(step) -> function (num) -> num + step
addsix = stepincrementgenerator(6)
addsix(5)
11
Note that lambda calc was invented ("defined") before the invention of the digital computer.
A big part of the value in the lambda calculus is in its aggressive simplicity meaning that you can do a bunch of theoretical work with relatively few concepts.
The less concepts and moving parts, the less proving necessary. And then you can layer on things and prove that they can also fit into your calculus.
I think most LC work _isn't_ about implementing something based on LC so much as providing theoretical foundations for your work being sound. And in that model you are likely going to focus on theoretical ease.
The expression λf.λx.f(f(fx)) for example is in normal form. The number 3 and the string "3" do not exist in the formalism, but at least some of the time, λf.λx.x is used to represent 0, λf.λx.fx to represent 1, λf.λx.f(fx) to represent 2, λf.λx.f(f(fx)) to represent 3, etc.
Again, lambda calc precedes the digital computer.
[1] http://abstractionlogic.com
Whilst a classic lambda expression like `λx.y` has three parts: the argument name `x`, the body/result `y`, and the `λ_._` syntax to distinguish it from a variable or application; if we instead use de Bruijn indices, then we don't need a separate part for the argument: all we need is a body wrapped in some distinguishing syntax (like `λy` in the article).
In Plumb, that distinguishing syntax is square brackets, so the "lambda term" `λy` would be written `[y]`. This wouldn't be possible if we had to write argument names, but ends up having some nice consequences:
- We can write this as-is in many languages, and the result is a data structure (usually a list) which we can easily walk over with an interpreter function.
- Variables are numbers, which can also be written as-is in many languages, and result in data that an interpreter function can handle.
- The final piece of syntax is applications, which necessarily has two parts (the function and the input value). Classic Lambda Calculus uses juxtaposition, whilst Plumb uses an explicit binary operator: the comma `,`. Again, commas can be written as-is in many languages, and they interact nicely with square bracket notation, (host language) function application, and (in the case of Python and Haskell) tuple notation.
The nice thing about using de Bruijn indexen, rather than de Bruijn "levels" or some other approaches to anonymous bindings (like those mentioned at the end of TFA), is that functions defined this way have an "intrinsic meaning"; e.g. `λ0` (or `[0]` in Plumb) is always the identity function; `λλλ(2 0 1)` (or `[[[2 , 0 , 1]]]` in Plumb) is always boolean NOT; etc. This lets us mix and match snippets of Plumb code with snippets of host-language code, embed each inside the other, and sprinkle around calls to an interpreter function willy-nilly, without worrying that the meaning depends on where our code ends up (e.g. at which "level" it reaches an interpreter).
[0] http://www.chriswarbo.net/projects/plumb/using.html
Does it though? It's about the same amount of work as in renaming potentially conflicting variables (and more work than renaming actually conflicting variables). AFAIK, the main argument for them (by B. C. Pierce in his TAPL?) is that if you botch up the implementation of de Bruijn indices/levels, it will blow up in your face almost immediately, while the conventional renaming schemes may have subtle undetected bugs for months.
As for renaming, if you're going to be looking at every binding anyway, changing most of them is no big deal (unless of course that entails doing many deep copies)
https://github.com/grame-cncm/faust
The Idris implementation also has a list of names in scope as an argument to the type of a Term, which makes the compiler (also Idris) check if you're making a mistake with your de Bruijn indices. (I did not do this in mine, but I should have.)
Edit: Oh, I see you mention reusing computations, that may be a bit more subtle. I'll have to look at your repo to see what you mean by that.
[1]: https://github.com/idris-lang/Idris2 [2]: https://github.com/dunhamsteve/newt
Code: http://t3x.org/clc/code.html
If you know Scheme, you almost don't need the book.
Scrolling comments, I see somone else nentions this as well. I feel like this is something that got stuck in academia hell but in practice could be solved in more oractical ways. I remember various papers on fresh naming and name calculi from the 00's but why? We have a parser and we have numbers.
The reason I bring this up is because I recall it being easy to introduce bugs into de Bruijn calculi and also adding symbolic debugging to real-world languages was annoying for reasons I have no memory of.
* Note I said name not variable. (\ x . x (\x . x)) Would be (\ x#0 . x#0 (\x#1 . x#1)) similar to de Bruijn indices but could differ in other situations.
" What happens in the evaluator when you have
(\ a . a a) (\ x . \ y . x y)
Variables are uniquely named at parse time, but after one beta step, you have two distinct instances each of x and y, albeit in different scopes, and after a second beta step, you have two distinct variables named y in the same scope. "
This doesn't come up in most mainstream languages, but Zig and Jai are getting people excited about comptime evaluation, and dependent types need comptime evaluation as well. Uniqueness is fine until you're doing higher-order evaluation in your compilation pipeline like this, which will be more and more desirable in the future, at least if the current trends are any indication.
-->
(\ x#3 . \ y#4 . x#3 y#4)(\ x#5 . \ y#6 . x#5 y#6)
No comments yet