Show HN: I implemented a RNN from scratch by reading a dense neural network book

3 dangmanhtruong 0 8/12/2025, 5:58:12 PM github.com ↗
Hi everyone. I have been learning about deep learning for some time, and I've tried to implement CNN, neural networks, U-Net, transformers etc. to learn and understand them more and also to get my hands dirty on the frameworks, however I've noticed that many tutorials online are not very detailed, so concepts are not explained clearly, so people would understand neural networks only shallowly. On the other hand, many sources like books may show many, many equations but do not show the main points, so people reading those books would get lost in mathematical details, which hampers learning. When I tried reading about RNN, or LSTM, I've noticed that many tutorials do not fully explain them. Some show pictures to make visualization easier, some show forward equations but the backward equations are not discussed. But there is something which I don't think is talked about much, is that many tutorials, even if they show the backpropagation, only limit it to a single RNN layer (this is also true for LSTM/GRU).

Some time ago, I read this book called "Neural Network design" by M. Hagan, and I found the explanations of the book to be quite good (even though the book is not new). The book explains things clearly enough for you to build everything and does not handwaive anything. When I checked the part about RNN, I noticed that the book explains how to do backprop for RNN with arbitrary connections, not just one RNN layer, which I think is something not many sources online show. The book also derives for the conditions of different delays, which I think is completely skipped in other sources.

So I decided to go ahead and implement it. The URL provides link to my implementation, which includes: The implementation includes:

- Full BPTT for RNN networks with arbitrary recurrent connections and delays

- Comprehensive gradient checking using finite differences

- Bayesian regularization and multiple optimization algorithms

- Extensive numerical validation throughout

I think I learned a lot during the implementation, both about how to implement a neural network, and also about how to structure my program, etc. I tried to be systematic and included tests for correctness of backprop by approximate difference equation (you know the [f(x+delta)-f(x-delta)]/(2*delta) thing). This also made me try to learn about Einstein summation (using Numpy) which really help things. During this period, I also learned that equation (14.39) has some slight error which is fixed in later equations (this was confirmed in private emails with the authors). The gradient checking was essential for debugging these subtle mathematical issues.

Key lessons:

- Systematic software development techniques, coupled with mathematical rigour, help catch ML bugs more effectively.

- Implementing from first principles help solidify your understanding and reveal the inner workings which frameworks hide.

- Einstein summation is a good thing to make the maths much cleaner.

Even though RNN is not the latest architecture, I think there is value in firstly grounding in fundamentals before jumping to more complex models.

Comments (0)

No comments yet