Cool stuff! I can see some GPT comments that can be removed
// Increased for better learning
this doesn't tell me anything
// Use the constants from lib.rs
const MAX_SEQ_LEN: usize = 80;
const EMBEDDING_DIM: usize = 128;
const HIDDEN_DIM: usize = 256;
these are already defined in lib.rs, why not use them (as the comment suggests)
sloppytoppy · 1h ago
Oh yea I'm totally running this on my hardware. Extra credit for "from scratch" in the title. The future sucks.
untrimmed · 3h ago
As someone who has spent days wrestling with Python dependency hell just to get a model running, a simple cargo run feels like a dream. But I'm wondering, what was the most painful part of NOT having a framework? I'm betting my coffee money it was debugging the backpropagation logic.
ricardobeat · 2h ago
Have you tried uv [1]? It has removed 90% of the pain of running python projects for me.
uv is great, but I think the real fix is just abandoning Python.
The culture that language maintains is rather hostile to maintainable development, easier to just switch to Rust and just write better code by default.
trklausss · 1h ago
Every tool for the right job. If you are doing tons of scripting (for e.g. tests on platforms different than Rust), Python can be a solid valid alternative.
Also, tons of CAE platforms have Python bindings, so you are "forced" to work on Python. Sometimes the solution is not just "abandoning a language".
If it fits your purpose, knock yourself out, for others that may be reading: uv is great for Python dependency management on development, I still have to test it for deployment :)
aeve890 · 1h ago
>Every tool for the right job. If you are doing tons of scripting (for e.g. tests on platforms different than Rust), Python can be a solid valid alternative.
I'd say Go is a better alternative if you want to replace python scripting. Less friction and much faster compilation times than Rust.
DiabloD3 · 52m ago
I am not a huge fan of Go, but if all the world's "serious" Python became Go, the average code quality would skyrocket, so I think I can agree to this proposal.
physicsguy · 35m ago
Go performance is terrible for numeric stuff though, no SIMD support.
DiabloD3 · 4m ago
(given the context of LLMs) Unless you're doing CPU-side inference for corner cases where GPU inference is worse, lack of SIMD isn't a huge issue.
There are libraries to write SIMD in Go now, but I think the better fix is being able to autovectorize during the LLVM IR optimization stage, so its available with multiple languages.
I think LLVM has it now, its just not super great yet.
pclmulqdq · 9m ago
There are Go SIMD libraries now, and there's also easy use of C libraries via Cgo.
pjmlp · 18m ago
I know Python since version 1.6.
It is great for learning on how to program (BASIC replacement), OS scripting tasks as Perl replacement, and embedded scripting in GUI applications.
Additionally understand PYTHONPATH, and don't mess with anything else.
All the other stuff that is supposed to fix Python issues, I never bothered with them.
Thankfully, other languages are starting to also have bindings to the same C and C++ compute libraries.
airza · 1h ago
There's not really another game in town if you want to do fast ML development :/
DiabloD3 · 1h ago
Dunno, almost all of the people I know anywhere in the ML space are on the C and Rust end of the spectrum.
Lack of types, lack of static analysis, lack of ... well, lack of everything Python doesn't provide and fights users on costs too much developer time. It is a net negative to continue pouring time and money into anything Python-based.
The sole exclusion I've seen to my social circle is those working at companies that don't directly do ML, but provide drivers/hardware/supporting software to ML people in academia, and have to try to fix their cursed shit for them.
Also, fwiw, there is no reason why Triton is Python. I dislike Triton for a lot of reasons, but its just a matmul kernel DSL, there is nothing inherent in it that has to be, or benefits from, being Python.... it takes DSL in, outputs shader text out, then has the vendor's API run it (ie, CUDA, ROCm, etc). It, too, would benefit from becoming Rust.
nkozyra · 8m ago
> Dunno, almost all of the people I know anywhere in the ML space are on the C and Rust end of the spectrum.
I wish this were broadly true.
But there's too much legacy Python sunk cost for most people though. Just so much inertia behind Python for people to abandon it and try to rebuild an extensive history of ML tooling.
I think ML will fade away from Python eventually but right now it's still everywhere.
Exuma · 1h ago
i hate python, but the idea of replacing python with rust is absurd
TheAceOfHearts · 1h ago
Switching to uv made my python experience drastically better.
If something doesn't work or I'm still encountering any kind of error with uv, LLMs have gotten good enough that I can just copy / paste the error and I'm very likely to zero-in on a working solution after a few iterations.
Sometimes it's a bit confusing figuring out how to run open source AI-related python projects, but the combination of uv and iterating on any errors with an LLM has so far been able to resolve all the issues I've experienced.
codetiger · 2h ago
I guess, resource utilization like GPU, etc
Galanwe · 1h ago
> spent days wrestling with Python dependency hell
I mean I would understand that comment in 2010, but in 2025 it's grossly ridiculous.
taminka · 2h ago
lowkey ppl who praise cargo seem to have no idea of the tradeoffs involved in dependency management
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
dev_l1x_be · 1h ago
> lowkey ppl who praise cargo seem to have no idea
Way to go on insulting people on HN. Cargo is literally the reason why people coming to Rust from languages like C++ where the lack of standardized tooling is giant glaring bomb crater that poses burden on people every single time they need to do some basic things (like for example version upgrades).
i'm saying that ease of dependency inclusion should not be a main criterion for evaluating how good a build system is, not that it isn't the main criterion for many people...
like the entire point of my comment is that people have misguided criteria for evaluating build systems, and your comment seems to just affirm this?
Sl1mb0 · 16m ago
> dependency inclusion _should not_ be a main criterion for evaluating how good a build system is
That's just like, your opinion, man.
taminka · 3m ago
i mean, unless you have some absolute divine truths, that's kind of the best i have :shrug
adwn · 16m ago
> like the entire point of my comment is that people have misguided criteria for evaluating build systems, and your comment seems to just affirm this?
I think dev_l1x_be's comment is meant to imply that your believe about people having misguided criteria [for evaluation build systems] is itself misguided, and that your favored approach [that the difficulty of including a dependency should be proportional to the risk you're taking on] is also misguided.
quantumspandex · 2h ago
Security is another problem, and should be tackled systematically. Artificially making dependency inclusion hard is not it and is detrimental to the more casual use cases.
itsibitzi · 2h ago
What tool or ecosystem does this well, in your opinion?
taminka · 37m ago
any language that has a standardised build system (virtually every language nowadays?), but doesn't have a centralised package repository, such that including a dependency is seamless, but takes a bit of time and intent
i like how zig does this, and the creator of odin has a whole talk where he basically uses the same arguments as my original comment to reason why odin doesn't have a package manager
IshKebab · 1h ago
This is the weirdest excuse for Python's terrible tooling that I've ever heard.
"It's deliberately shit so that people won't use it unless they really have to."
taminka · 40m ago
i just realised that my comment sounds like it's praising python's package management since it's often so inconvenient to use, i want to mention that that wasn't my intended point, python's package management contains the worst aspects from both words: being centralised AND horrible to use lol
my mistake :)
jokethrowaway · 1h ago
Is your argument that python's package management & ecosystem is bad by design - to increase security?
In my experience it's just bugs and poor decision making on the maintainers (eg. pytorch dropping support for intel mac, leftpad in node) or on the language and package manager developers side (py2->3, commonjs, esm, go not having a package manager, etc).
Cargo has less friction than pypi and npm. npm has less friction than pypi.
And yet, you just need to compromise one lone, unpaid maintainer to wreck the security of the ecosystem.
taminka · 34m ago
nah python's package management is just straight up terrible by every metric, i just used it as a tangent to talk about how imo ppl incorrectly evaluate build systems
zoobab · 52m ago
"a simple cargo run feels like a dream"
A cargo build that warms up your CPU during winter while recompiling the whole internet is better?
linking both rand-core 0.9.0 and rand-core 0.9.3 which the project could maybe avoid by just specifying 0.9 for its own dep on it
tonyhart7 · 3h ago
is this satire or does I must know context behind this comment???
stevedonovan · 3h ago
These are a few well-chosen dependencies for a serious project.
Rust projects can really go bananas on dependencies, partly because it's so easy to include them
obsoleszenz · 3h ago
The project only has 3 dependencies which i interpret as a sign of quality
Charon77 · 3h ago
Absolutely love how readable the entire project is
emporas · 3h ago
It is very procedural/object oriented. This is not considered good Rust practice. Iterators make it more functional, which is better, more succinct that is, and enums more algebraic. But it's totally fine for a thought experiment.
koakuma-chan · 2h ago
It's AI generated
Revisional_Sin · 2h ago
How do you know? The over-commenting?
koakuma-chan · 2h ago
I know because this is how an AI generated project looks. Clearly AI generated README, "clean" code, the way files are named, etc.
magackame · 2h ago
Not sure myself. Commit messages look pretty human. But the emojis in readme and comments like "// Re-export key structs for easier access", "# Add any test-specific dependencies here if needed" are sus indeed.
cmrdporcupine · 2h ago
To me it looks like LLM generated README, but not necessarily the source (or at least not all of it).
Or there's been a cleaning pass done over it.
koakuma-chan · 2h ago
I think pretty clearly the source is also at least partially generated. None the less, just a README like that already sends a strong signal to stop looking and not trust anything written there.
GardenLetter27 · 2h ago
The repeated Impls are strange.
magackame · 2h ago
Where? Don't see any on latest main (685467e).
yahoozoo · 1h ago
`llm.rs` has many `impl LLM` blocks
yieldcrv · 3h ago
Never knew Rust could be that readable. Makes me think other Rust engineers are stuck in a masochistic ego driven contest, which would explain everything else I've encountered about the Rust community and recruiting on that side.
jmaker · 3h ago
Not sure what you’re alluding to but that’s just ordinary Rust without performance or async IO concerns.
GardenLetter27 · 2h ago
Most Rust code looks like this - only generic library code goes crazy with all the generics and lifetimes, due to the need to avoid unnecessary mallocs and also provide a flexible API to users.
But most people aren't writing libraries.
Snuggly73 · 2h ago
Congrats - there is a very small problem with the LLM - its reusing transformer blocks and you want to use different instances of them.
Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.
icemanx · 2h ago
correction: It's a cool exercise if you write it yourself and not use GPT
Snuggly73 · 1h ago
well, hopefully the author did learn something or at least enjoyed the process :)
(the code looks like a very junior or a non-dev wrote it tbh).
Goto80 · 3h ago
Nice. Mind to put a license on that?
thomask1995 · 1h ago
License added! Good catch
kachapopopow · 3h ago
This looks rather similar to when I asked an AI to implement a basic xor problem solver I guess fundementally there's really only a very limited amount of ways to implement this.
ndai · 3h ago
I’m curious where you got your training data? I will look myself, but saw this and thought I’d ask. I have a CPU-first, no-backprop architecture that works very well on classification datasets. It can do single‑example incremental updates which might be useful for continuous learning. I made a toy demo to train on tiny.txt and it can predict next characters, but I’ve never tried to make an LLM before. I think my architecture might work well as an on-device assistant or for on-premises needs, but I want to work with it more before I embarrass myself. Any open-source LLM training datasets you would recommend?
huggingface has plenty of openai and antrophic user to assistant chains, beware there are dragons (hallucinations), but good enough for instruction training. I actually recommend distilling kimi k2 instead for instruction following capabilities.
abricq · 2h ago
This is great ! Congratulations. I really like your project, especially I like how easily it is to peak at.
Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?
Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?
// Increased for better learning
this doesn't tell me anything
// Use the constants from lib.rs
const MAX_SEQ_LEN: usize = 80;
const EMBEDDING_DIM: usize = 128;
const HIDDEN_DIM: usize = 256;
these are already defined in lib.rs, why not use them (as the comment suggests)
[1] https://github.com/astral-sh/uv
The culture that language maintains is rather hostile to maintainable development, easier to just switch to Rust and just write better code by default.
Also, tons of CAE platforms have Python bindings, so you are "forced" to work on Python. Sometimes the solution is not just "abandoning a language".
If it fits your purpose, knock yourself out, for others that may be reading: uv is great for Python dependency management on development, I still have to test it for deployment :)
I'd say Go is a better alternative if you want to replace python scripting. Less friction and much faster compilation times than Rust.
There are libraries to write SIMD in Go now, but I think the better fix is being able to autovectorize during the LLVM IR optimization stage, so its available with multiple languages.
I think LLVM has it now, its just not super great yet.
It is great for learning on how to program (BASIC replacement), OS scripting tasks as Perl replacement, and embedded scripting in GUI applications.
Additionally understand PYTHONPATH, and don't mess with anything else.
All the other stuff that is supposed to fix Python issues, I never bothered with them.
Thankfully, other languages are starting to also have bindings to the same C and C++ compute libraries.
Lack of types, lack of static analysis, lack of ... well, lack of everything Python doesn't provide and fights users on costs too much developer time. It is a net negative to continue pouring time and money into anything Python-based.
The sole exclusion I've seen to my social circle is those working at companies that don't directly do ML, but provide drivers/hardware/supporting software to ML people in academia, and have to try to fix their cursed shit for them.
Also, fwiw, there is no reason why Triton is Python. I dislike Triton for a lot of reasons, but its just a matmul kernel DSL, there is nothing inherent in it that has to be, or benefits from, being Python.... it takes DSL in, outputs shader text out, then has the vendor's API run it (ie, CUDA, ROCm, etc). It, too, would benefit from becoming Rust.
I wish this were broadly true.
But there's too much legacy Python sunk cost for most people though. Just so much inertia behind Python for people to abandon it and try to rebuild an extensive history of ML tooling.
I think ML will fade away from Python eventually but right now it's still everywhere.
If something doesn't work or I'm still encountering any kind of error with uv, LLMs have gotten good enough that I can just copy / paste the error and I'm very likely to zero-in on a working solution after a few iterations.
Sometimes it's a bit confusing figuring out how to run open source AI-related python projects, but the combination of uv and iterating on any errors with an LLM has so far been able to resolve all the issues I've experienced.
I mean I would understand that comment in 2010, but in 2025 it's grossly ridiculous.
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
Way to go on insulting people on HN. Cargo is literally the reason why people coming to Rust from languages like C++ where the lack of standardized tooling is giant glaring bomb crater that poses burden on people every single time they need to do some basic things (like for example version upgrades).
Example:
https://github.com/facebook/folly/blob/main/build.sh
like the entire point of my comment is that people have misguided criteria for evaluating build systems, and your comment seems to just affirm this?
That's just like, your opinion, man.
I think dev_l1x_be's comment is meant to imply that your believe about people having misguided criteria [for evaluation build systems] is itself misguided, and that your favored approach [that the difficulty of including a dependency should be proportional to the risk you're taking on] is also misguided.
i like how zig does this, and the creator of odin has a whole talk where he basically uses the same arguments as my original comment to reason why odin doesn't have a package manager
"It's deliberately shit so that people won't use it unless they really have to."
my mistake :)
In my experience it's just bugs and poor decision making on the maintainers (eg. pytorch dropping support for intel mac, leftpad in node) or on the language and package manager developers side (py2->3, commonjs, esm, go not having a package manager, etc).
Cargo has less friction than pypi and npm. npm has less friction than pypi.
And yet, you just need to compromise one lone, unpaid maintainer to wreck the security of the ecosystem.
A cargo build that warms up your CPU during winter while recompiling the whole internet is better?
Looking good!
yep, still looks relatively good.
Rust projects can really go bananas on dependencies, partly because it's so easy to include them
Or there's been a cleaning pass done over it.
But most people aren't writing libraries.
Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.
(the code looks like a very junior or a non-dev wrote it tbh).
For just plain text, I really like this one - https://huggingface.co/datasets/roneneldan/TinyStories
Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?
Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?
[0]: https://github.com/enricozb/picogpt-rust [1]: https://jaykmody.com/blog/gpt-from-scratch/