RustGPT: A pure-Rust transformer LLM built from scratch

95 amazonhut 35 9/15/2025, 9:47:18 AM github.com ↗

Comments (35)

Snuggly73 · 1m ago
Congrats - there is a very small problem with the LLM - its reusing transformer blocks and you want to use different instances of them.

Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.

jlmcgraw · 2m ago
ramon156 · 10m ago
Cool stuff! I can see some GPT comments that can be removed

// Increased for better learning

this doesn't tell me anything

// Use the constants from lib.rs

const MAX_SEQ_LEN: usize = 80;

const EMBEDDING_DIM: usize = 128;

const HIDDEN_DIM: usize = 256;

these are already defined in lib.rs, why not use them (as the comment suggests)

untrimmed · 56m ago
As someone who has spent days wrestling with Python dependency hell just to get a model running, a simple cargo run feels like a dream. But I'm wondering, what was the most painful part of NOT having a framework? I'm betting my coffee money it was debugging the backpropagation logic.
ricardobeat · 3m ago
Have you tried uv [1]? It has removed 90% of the pain of running python projects for me.

[1] https://github.com/astral-sh/uv

codetiger · 21m ago
I guess, resource utilization like GPU, etc
taminka · 38m ago
lowkey ppl who praise cargo seem to have no idea of the tradeoffs involved in dependency management

the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc

how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies

quantumspandex · 5m ago
Security is another problem, and should be tackled systematically. Artificially making dependency inclusion hard is not it and is detrimental to the more casual use cases.
itsibitzi · 6m ago
What tool or ecosystem does this well, in your opinion?
abricq · 27m ago
This is great ! Congratulations. I really like your project, especially I like how easily it is to peak at.

Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?

Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?

kachapopopow · 1h ago
This looks rather similar to when I asked an AI to implement a basic xor problem solver I guess fundementally there's really only a very limited amount of ways to implement this.
techsystems · 1h ago
> ndarray = "0.16.1" rand = "0.9.0" rand_distr = "0.5.0"

Looking good!

kachapopopow · 1h ago
I was slightly curious: cargo tree llm v0.1.0 (RustGPT) ├── ndarray v0.16.1 │ ├── matrixmultiply v0.3.9 │ │ └── rawpointer v0.2.1 │ │ [build-dependencies] │ │ └── autocfg v1.4.0 │ ├── num-complex v0.4.6 │ │ └── num-traits v0.2.19 │ │ └── libm v0.2.15 │ │ [build-dependencies] │ │ └── autocfg v1.4.0 │ ├── num-integer v0.1.46 │ │ └── num-traits v0.2.19 () │ ├── num-traits v0.2.19 () │ └── rawpointer v0.2.1 ├── rand v0.9.0 │ ├── rand_chacha v0.9.0 │ │ ├── ppv-lite86 v0.2.20 │ │ │ └── zerocopy v0.7.35 │ │ │ ├── byteorder v1.5.0 │ │ │ └── zerocopy-derive v0.7.35 (proc-macro) │ │ │ ├── proc-macro2 v1.0.94 │ │ │ │ └── unicode-ident v1.0.18 │ │ │ ├── quote v1.0.39 │ │ │ │ └── proc-macro2 v1.0.94 () │ │ │ └── syn v2.0.99 │ │ │ ├── proc-macro2 v1.0.94 () │ │ │ ├── quote v1.0.39 () │ │ │ └── unicode-ident v1.0.18 │ │ └── rand_core v0.9.3 │ │ └── getrandom v0.3.1 │ │ ├── cfg-if v1.0.0 │ │ └── libc v0.2.170 │ ├── rand_core v0.9.3 () │ └── zerocopy v0.8.23 └── rand_distr v0.5.1 ├── num-traits v0.2.19 () └── rand v0.9.0 ()

yep, still looks relatively good.

cmrdporcupine · 23m ago
linking both rand-core 0.9.0 and rand-core 0.9.3 which the project could maybe avoid by just specifying 0.9 for its own dep on it
tonyhart7 · 1h ago
is this satire or does I must know context behind this comment???
stevedonovan · 1h ago
These are a few well-chosen dependencies for a serious project.

Rust projects can really go bananas on dependencies, partly because it's so easy to include them

obsoleszenz · 56m ago
The project only has 3 dependencies which i interpret as a sign of quality
Goto80 · 58m ago
Nice. Mind to put a license on that?
Charon77 · 1h ago
Absolutely love how readable the entire project is
emporas · 57m ago
It is very procedural/object oriented. This is not considered good Rust practice. Iterators make it more functional, which is better, more succinct that is, and enums more algebraic. But it's totally fine for a thought experiment.
koakuma-chan · 25m ago
It's AI generated
Revisional_Sin · 17m ago
How do you know? The over-commenting?
GardenLetter27 · 2m ago
The repeated Impls are strange.
koakuma-chan · 8m ago
I know because this is how an AI generated project looks. Clearly AI generated README, "clean" code, the way files are named, etc.
magackame · 15s ago
Not sure myself. Commit messages look pretty human. But the emojis in readme and comments like "// Re-export key structs for easier access", "# Add any test-specific dependencies here if needed" are sus indeed.
cmrdporcupine · 5m ago
To me it looks like LLM generated README, but not necessarily the source (or at least not all of it).

Or there's been a cleaning pass done over it.

koakuma-chan · 25s ago
I think pretty clearly the source is also at least partially generated. None the less, just a README like that already sends a strong signal to stop looking and not trust anything written there.
yieldcrv · 1h ago
Never knew Rust could be that readable. Makes me think other Rust engineers are stuck in a masochistic ego driven contest, which would explain everything else I've encountered about the Rust community and recruiting on that side.
GardenLetter27 · 12s ago
Most Rust code looks like this - only generic library code goes crazy with all the generics and lifetimes, due to the need to avoid unnecessary mallocs and also provide a flexible API to users.

But most people aren't writing libraries.

jmaker · 1h ago
Not sure what you’re alluding to but that’s just ordinary Rust without performance or async IO concerns.
enricozb · 50m ago
I did this [0] (gpt in rust) with picogpt, following the great blog by jaykmody [1].

[0]: https://github.com/enricozb/picogpt-rust [1]: https://jaykmody.com/blog/gpt-from-scratch/

ndai · 1h ago
I’m curious where you got your training data? I will look myself, but saw this and thought I’d ask. I have a CPU-first, no-backprop architecture that works very well on classification datasets. It can do single‑example incremental updates which might be useful for continuous learning. I made a toy demo to train on tiny.txt and it can predict next characters, but I’ve never tried to make an LLM before. I think my architecture might work well as an on-device assistant or for on-premises needs, but I want to work with it more before I embarrass myself. Any open-source LLM training datasets you would recommend?
kachapopopow · 1h ago
huggingface has plenty of openai and antrophic user to assistant chains, beware there are dragons (hallucinations), but good enough for instruction training. I actually recommend distilling kimi k2 instead for instruction following capabilities.
bigmuzzy · 46m ago
nice