Show HN: I implemented a RNN from scratch by reading a dense neural network book
Some time ago, I read this book called "Neural Network design" by M. Hagan, and I found the explanations of the book to be quite good (even though the book is not new). The book explains things clearly enough for you to build everything and does not handwaive anything. When I checked the part about RNN, I noticed that the book explains how to do backprop for RNN with arbitrary connections, not just one RNN layer, which I think is something not many sources online show. The book also derives for the conditions of different delays, which I think is completely skipped in other sources.
So I decided to go ahead and implement it. The URL provides link to my implementation, which includes: The implementation includes:
- Full BPTT for RNN networks with arbitrary recurrent connections and delays
- Comprehensive gradient checking using finite differences
- Bayesian regularization and multiple optimization algorithms
- Extensive numerical validation throughout
I think I learned a lot during the implementation, both about how to implement a neural network, and also about how to structure my program, etc. I tried to be systematic and included tests for correctness of backprop by approximate difference equation (you know the [f(x+delta)-f(x-delta)]/(2*delta) thing). This also made me try to learn about Einstein summation (using Numpy) which really help things. During this period, I also learned that equation (14.39) has some slight error which is fixed in later equations (this was confirmed in private emails with the authors). The gradient checking was essential for debugging these subtle mathematical issues.
Key lessons:
- Systematic software development techniques, coupled with mathematical rigour, help catch ML bugs more effectively.
- Implementing from first principles help solidify your understanding and reveal the inner workings which frameworks hide.
- Einstein summation is a good thing to make the maths much cleaner.
Even though RNN is not the latest architecture, I think there is value in firstly grounding in fundamentals before jumping to more complex models.
No comments yet