I always thought APL was written in the wrong direction. It writes like a concatenative language that's backwards--you tack things onto the front. NumPy fixes it by making the verbs all dotted function calls, effectively mirroring the order. e.g. in APL you write ⍴⍳100 but in NumPy you write np.arange(1, 101).reshape(10, 10).
trjordan · 1h ago
Seems like it could easily be training data set size as well.
I'd love to see some quantification of errors in q/kdb+ (or hebrew) vs. languages of similar size that are left-to-right.
fer · 31m ago
>Seems like it could easily be training data set size as well.
I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.
And don't get me started when using it for Nix...
So not surprised about something with orders of magnitude less public corpus.
dotancohen · 13m ago
I realized this too, and it led me to the conclusion that LLMs really can't program. I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM. It turns out that it's extremely verbose, especially in variable names, function names, class names, etc. Actually, it turned out that classes were very redundant. But the real insight was that LLMs are great at naming things, and performing small operations on the little things they named. They're really not good at any logic that they can't copy paste from something they found on the web.
weird-eye-issue · 8m ago
> I did some experiments to find what a programming language would look like, instead of e.g. python, if it were designed to be written and edited by an LLM.
Did your experiment consist of asking an LLM to design a programming language for itself?
dotancohen · 4m ago
Yes. ChatGPT 4 and Claude 3.7. They led me to similar conclusions, but they produced very different syntax, which led me to believe that they were not just regurgitating from a common source.
dlahoda · 26m ago
i tried gemini, openai, copilot, claude on reasonably big rust project.
claude worked well to fix use, clippy, renames, refactorings, ci. i used highest cost claude with custom context per crate.
never was able to get it write new code well.
for nix, i is nice template engine to start or search. did not tried big nix changes.
gizmo686 · 48m ago
Hebrew is still written sequentially in Unicode. The right-to-left aspect there is simply about how the characters get displayed. On mixed documents, there is U+200E and U+200F to change the text direction mid stream.
From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.
cubefox · 28m ago
> Hebrew is still written sequentially
Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.
dotancohen · 19m ago
An older Hebrew encoding actually encoded the last character first, then the penultimate character, then the character preceding that, etc.
Exercise to the reader to guess how line breaks, text wrapping, and search algorithms worked.
goatlover · 21m ago
Multiple characters can be written at once, they can also be done in reverse or out of order.
cubefox · 17m ago
No no, the second character you write must always be temporally preceded by the character you wrote first. Otherwise the second wouldn't have been the second, but the first, and moreover, the first would have been the second, which it wasn't.
dotancohen · 11m ago
I encourage you to find some place that still uses a Hebrew typewriter. When they have to type numbers, they'll type the number in backwards. And an old Hebrew encoding also encoded characters in reverse order.
vessenes · 1h ago
Interesting. Upshot - right to left eval means you generally must start at the end, or at least hold an expression in working memory - LLMs - not so good at this.
I wonder if diffusion models would be better at this; most start out as sequential token generators and then get finetuned.
Humans can't either? I think if this convention had been more usable form of programming, we'd know by now
anonzzzies · 26m ago
Once you get used to it, traditional ways look tedious and annoying to me. I think the power is in 'once you get used to it'. That will keep out most people. See python llm implementations vs k ones as a novice and you will see verbose unreadable stuff vs line noise. When you learn the math you see verbose code where the verbose code adds nothing at all vs exactly what you would write if you could.
maest · 55m ago
I think there is a reason for this, but maybe not a good one.
1. Function application should be left to right, e.g. `sqrt 4`
2. Precedence order should be very simple. In k, everything has the same precedence order (with the exceptions of brackets)
1 + 2 forces you to have this right to left convention, annoyingly.
Fwiw, I think 2 is great and I would rather give up 1 than 2. However, writing function application as `my_fun arg` is a very strong convention.
cess11 · 1h ago
"Claude is aware of that, but it struggled to write correct code based on those rules"
It's actually not, and unless they in some way run a rule engine on top of their LLM SaaS stuff it seems far fetched to believe it adheres to rule sets in any way.
Local models confuse Python, Elixir, PHP and Bash when I've tried to use them for coding. They seem more stable for JS, but sometimes they slip out of that too.
Seems pretty contrived and desperate to invent transpilers from quasi-Python to other languages to try and find a software development use for LLM SaaS. Warnings about Lisp macros and other code rewrite tools ought to apply here as well. Plus, of course, the loss of 'notation as a tool of thought'.
strangescript · 49m ago
If your model is getting confused by python, its a bad model. Python is routinely the best language for all major models.
cess11 · 10m ago
I don't know what counts as a major model. Relevant to this, I've dabbled with Gemma, Qwen, Mistral, Llama, Granite and Phi models, mostly 3-14b varieties but also some larger ones on CPU on a machine that has 64 GB RAM.
rob_c · 1h ago
Same reason the same models don't fundamentally understand all languages. They're not trained to. Frankly the design changes to get this to work in training is minimal but this isn't the way English works so expect most of the corporate LLM to struggle because that's where the interest and money is.
Give it time until we have true globally multi lingual models for superior context awareness.
strangescript · 48m ago
A byte tokenized model is naturally 100% multi-lingual in all languages in its data set. There just isn't a lot of reason for teams to spend the extra training time to build that sort of model.
benjaminwootton · 59m ago
I just submitted a similar article about using LLMs (Gemini and Claude) to write SQL which I found to be very successful.
As ClickHouse (which I used in that test) is sometimes compared with Kdb+ I thought it was worth dropping a link here.
As I mention in the article, I tried this stuff a year or two ago and it was complex and flakey. With MCP servers and better reasoning models, being able to ask questions in natural language is just about crossing over to be viable IMO.
I'd love to see some quantification of errors in q/kdb+ (or hebrew) vs. languages of similar size that are left-to-right.
I'm convinced that's the case. On any major LLM I can carpet bomb Java/Python boilerplate without issue. For Rust, at least last time I checked, it comes up with non-existing traits, more frequent hallucinations and struggle to use the context effectively. In agent mode it turns into a first fight with the compiler, often ending in credit destroying loops.
And don't get me started when using it for Nix...
So not surprised about something with orders of magnitude less public corpus.
Did your experiment consist of asking an LLM to design a programming language for itself?
for nix, i is nice template engine to start or search. did not tried big nix changes.
From the perspective of a LLM learning from Unicode, this would appear as a delimeter that needs to be inserted on language direction boundaries; but everything else should work the same.
Everything is written sequentially in the sense that the character that is written first can only be followed by the character that is written next. In this sense writing non-sequentially is logically impossible.
Exercise to the reader to guess how line breaks, text wrapping, and search algorithms worked.
I wonder if diffusion models would be better at this; most start out as sequential token generators and then get finetuned.
1. Function application should be left to right, e.g. `sqrt 4`
2. Precedence order should be very simple. In k, everything has the same precedence order (with the exceptions of brackets)
1 + 2 forces you to have this right to left convention, annoyingly.
Fwiw, I think 2 is great and I would rather give up 1 than 2. However, writing function application as `my_fun arg` is a very strong convention.
It's actually not, and unless they in some way run a rule engine on top of their LLM SaaS stuff it seems far fetched to believe it adheres to rule sets in any way.
Local models confuse Python, Elixir, PHP and Bash when I've tried to use them for coding. They seem more stable for JS, but sometimes they slip out of that too.
Seems pretty contrived and desperate to invent transpilers from quasi-Python to other languages to try and find a software development use for LLM SaaS. Warnings about Lisp macros and other code rewrite tools ought to apply here as well. Plus, of course, the loss of 'notation as a tool of thought'.
Give it time until we have true globally multi lingual models for superior context awareness.
As ClickHouse (which I used in that test) is sometimes compared with Kdb+ I thought it was worth dropping a link here.
https://news.ycombinator.com/item?id=44509510
As I mention in the article, I tried this stuff a year or two ago and it was complex and flakey. With MCP servers and better reasoning models, being able to ask questions in natural language is just about crossing over to be viable IMO.