Congrats - there is a very small problem with the LLM - its reusing transformer blocks and you want to use different instances of them.
Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.
Cool stuff! I can see some GPT comments that can be removed
// Increased for better learning
this doesn't tell me anything
// Use the constants from lib.rs
const MAX_SEQ_LEN: usize = 80;
const EMBEDDING_DIM: usize = 128;
const HIDDEN_DIM: usize = 256;
these are already defined in lib.rs, why not use them (as the comment suggests)
untrimmed · 56m ago
As someone who has spent days wrestling with Python dependency hell just to get a model running, a simple cargo run feels like a dream. But I'm wondering, what was the most painful part of NOT having a framework? I'm betting my coffee money it was debugging the backpropagation logic.
ricardobeat · 3m ago
Have you tried uv [1]? It has removed 90% of the pain of running python projects for me.
lowkey ppl who praise cargo seem to have no idea of the tradeoffs involved in dependency management
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
quantumspandex · 5m ago
Security is another problem, and should be tackled systematically. Artificially making dependency inclusion hard is not it and is detrimental to the more casual use cases.
itsibitzi · 6m ago
What tool or ecosystem does this well, in your opinion?
abricq · 27m ago
This is great ! Congratulations. I really like your project, especially I like how easily it is to peak at.
Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?
Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?
kachapopopow · 1h ago
This looks rather similar to when I asked an AI to implement a basic xor problem solver I guess fundementally there's really only a very limited amount of ways to implement this.
linking both rand-core 0.9.0 and rand-core 0.9.3 which the project could maybe avoid by just specifying 0.9 for its own dep on it
tonyhart7 · 1h ago
is this satire or does I must know context behind this comment???
stevedonovan · 1h ago
These are a few well-chosen dependencies for a serious project.
Rust projects can really go bananas on dependencies, partly because it's so easy to include them
obsoleszenz · 56m ago
The project only has 3 dependencies which i interpret as a sign of quality
Goto80 · 58m ago
Nice. Mind to put a license on that?
Charon77 · 1h ago
Absolutely love how readable the entire project is
emporas · 57m ago
It is very procedural/object oriented. This is not considered good Rust practice. Iterators make it more functional, which is better, more succinct that is, and enums more algebraic. But it's totally fine for a thought experiment.
koakuma-chan · 25m ago
It's AI generated
Revisional_Sin · 17m ago
How do you know? The over-commenting?
GardenLetter27 · 2m ago
The repeated Impls are strange.
koakuma-chan · 8m ago
I know because this is how an AI generated project looks. Clearly AI generated README, "clean" code, the way files are named, etc.
magackame · 15s ago
Not sure myself. Commit messages look pretty human. But the emojis in readme and comments like "// Re-export key structs for easier access", "# Add any test-specific dependencies here if needed" are sus indeed.
cmrdporcupine · 5m ago
To me it looks like LLM generated README, but not necessarily the source (or at least not all of it).
Or there's been a cleaning pass done over it.
koakuma-chan · 25s ago
I think pretty clearly the source is also at least partially generated. None the less, just a README like that already sends a strong signal to stop looking and not trust anything written there.
yieldcrv · 1h ago
Never knew Rust could be that readable. Makes me think other Rust engineers are stuck in a masochistic ego driven contest, which would explain everything else I've encountered about the Rust community and recruiting on that side.
GardenLetter27 · 12s ago
Most Rust code looks like this - only generic library code goes crazy with all the generics and lifetimes, due to the need to avoid unnecessary mallocs and also provide a flexible API to users.
But most people aren't writing libraries.
jmaker · 1h ago
Not sure what you’re alluding to but that’s just ordinary Rust without performance or async IO concerns.
enricozb · 50m ago
I did this [0] (gpt in rust) with picogpt, following the great blog by jaykmody [1].
I’m curious where you got your training data? I will look myself, but saw this and thought I’d ask. I have a CPU-first, no-backprop architecture that works very well on classification datasets. It can do single‑example incremental updates which might be useful for continuous learning. I made a toy demo to train on tiny.txt and it can predict next characters, but I’ve never tried to make an LLM before. I think my architecture might work well as an on-device assistant or for on-premises needs, but I want to work with it more before I embarrass myself. Any open-source LLM training datasets you would recommend?
huggingface has plenty of openai and antrophic user to assistant chains, beware there are dragons (hallucinations), but good enough for instruction training. I actually recommend distilling kimi k2 instead for instruction following capabilities.
Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.
// Increased for better learning
this doesn't tell me anything
// Use the constants from lib.rs
const MAX_SEQ_LEN: usize = 80;
const EMBEDDING_DIM: usize = 128;
const HIDDEN_DIM: usize = 256;
these are already defined in lib.rs, why not use them (as the comment suggests)
[1] https://github.com/astral-sh/uv
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?
Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?
Looking good!
yep, still looks relatively good.
Rust projects can really go bananas on dependencies, partly because it's so easy to include them
Or there's been a cleaning pass done over it.
But most people aren't writing libraries.
[0]: https://github.com/enricozb/picogpt-rust [1]: https://jaykmody.com/blog/gpt-from-scratch/