The classic reference on the subject is "Numerical Linear Algebra" by Lloyd Trefethen. Skip to the last chapter on the iterative methods for computational aspects. You'll learn a lot more and faster with Matlab.
AD comes from a different tradition - dating back to FORTRAN 77 programers attempt to differentiate non-elementary functions (For Loops, procedural functions, Subroutines, etc). Note the hardware specs for some nostalgia
https://www.mcs.anl.gov/research/projects/adifor/
rdyro · 10h ago
A really cool post and a great set of visualizations!
Computing sparse Jacobians can save a lot of compute if there's a real lack of dependency between part of the input and the output. Discovering this automatically through coloring is very appealing.
Another alternative is to implement sparse rules for each operation yourself, but that often requires custom autodiff implementations which aren't easy to get right, I wrote a small toy version of a sparse rules-based autodiff here: https://github.com/rdyro/SpAutoDiff.jl
Blog post author here, happy to answer any questions you may have!
The prerequisites for understanding the blog post are an undergrad course in calculus and linear algebra, and some graph theory. I can look up some accessible resources if you're interested :)
JohnKemeny · 8h ago
Does this article exist as a (LaTeX) pdf for printing too?
gdalle · 5h ago
Our Arxiv preprint is a slightly longer read, available in PDF form with more precise descriptions: https://arxiv.org/abs/2501.17737
carterschonwald · 2h ago
Thx!
It’s always fun to see new flavors of AD work. My attempts in that direction haven’t been the most successful
>Is this type of analysis a part of a particular mathematical heritage ?
It is a mixture of two very much related areas of mathematics. Analysis, called calculus in the US, and numerics.
The ideas behind automatic differentiation arise from the question of how to compute the derivative of a function on a computer. The "derivative" part is the Analysis part and the "on a computer" part is the numerics.
As it turns out writing down the formal definition of the derivative and approximating it on a computer has many undesirable properties. So alternative approaches, like AD, were developed. But AD is much older than the recent Neural network trend.
yorwba · 10h ago
The blog post mentions in an aside that "The authors of this blog post are all developers of the ASD ecosystem in Julia." which might be the closest thing to an intellectual school that this kind of work is associated with.
molticrystal · 11h ago
Maybe someone else can summarize more accurately or do a better job but I'll take a shot:
The Jacobian often appears in the final segment of a three part calculus series when exploring chain rules and variable transformations. Look up the Jacobian used in converting between x,y,z and spherical coordinates ρ,φ,θ and note its matrix structure. Skimming your Medium Lobosi article it seems it emphasizes this aspect.
The Jacobian also serves another purpose. As stated in the OP's article "The Jacobian operator Df:x⟼Df(x) is a linear map which provides the best linear approximation of f around a given point x."
We like approximations, we can make a speed vs accuracy/memory trade off, you only have so much space in a register or memory cell, and trying to get more accuracy past a certain point takes more memory/computations/time.
The article then notes that many computations involve Jacobians with sparse matrices meaning some matrix elements can be ignored so we don't have to waste our time on them if handled cleverly.
Subsequent sections cover methods to identify and label sparsity patterns. The article explains how applying their proposed coloring techniques to large matrices common in machine learning yields significant efficiency gains.
As far as the mathematical heritage, I don't know the family tree, but I suspect it stems from courses blending matrix theory linear algebra and algorithms so you'd want the computer science version of such math. Functional approximation ties to numerical methods though I am uncertain if introductory texts cover Jacobians. Check out Newton's method to grasp its mechanics and understand how that works then explore its Jacobian extension. For the coloring aspect graph theory is where to turn. You can learn its basics with minimal prerequisites by studying the seven bridges problem or the map coloring problem, do the five color version. Many of these concepts can be simplified into small programming projects. They will not rival Matlab but they will solidify your understanding.
ghurtado · 12h ago
I quickly realized it was approximately 20,000 ft over my head, but I still power through these sort of things to see if anything "sticks".
Yes, this blog post indeed inspired us to submit ours!
nathan_douglas · 1d ago
Picking my way through this slowly... I'm familiar with autodiff but some of these ideas are very new to me. This seems really, really exciting though.
shihabkhanbd · 7h ago
Very informative blogpost!
oulipo · 10h ago
Sparsely-related question: is the blog style/css open-source?
molticrystal · 10h ago
It seems to be based off of Al-Folio, MIT licensed
https://davidtabora.wordpress.com/wp-content/uploads/2015/01...
A short overview is chapter 11 in Gilbert Strangs's Intro to linear Algebra https://math.mit.edu/~gs/linearalgebra/ila5/linearalgebra5_1...
AD comes from a different tradition - dating back to FORTRAN 77 programers attempt to differentiate non-elementary functions (For Loops, procedural functions, Subroutines, etc). Note the hardware specs for some nostalgia https://www.mcs.anl.gov/research/projects/adifor/
Computing sparse Jacobians can save a lot of compute if there's a real lack of dependency between part of the input and the output. Discovering this automatically through coloring is very appealing.
Another alternative is to implement sparse rules for each operation yourself, but that often requires custom autodiff implementations which aren't easy to get right, I wrote a small toy version of a sparse rules-based autodiff here: https://github.com/rdyro/SpAutoDiff.jl
Another example (a much more serious one) is https://github.com/microsoft/folx
Is this type of analysis a part of a particular mathematical heritage ?
What would it be called ?
Is this article relevant ? https://medium.com/@lobosi/calculus-for-machine-learning-jac...
The prerequisites for understanding the blog post are an undergrad course in calculus and linear algebra, and some graph theory. I can look up some accessible resources if you're interested :)
It’s always fun to see new flavors of AD work. My attempts in that direction haven’t been the most successful
It is a mixture of two very much related areas of mathematics. Analysis, called calculus in the US, and numerics.
The ideas behind automatic differentiation arise from the question of how to compute the derivative of a function on a computer. The "derivative" part is the Analysis part and the "on a computer" part is the numerics.
As it turns out writing down the formal definition of the derivative and approximating it on a computer has many undesirable properties. So alternative approaches, like AD, were developed. But AD is much older than the recent Neural network trend.
The Jacobian often appears in the final segment of a three part calculus series when exploring chain rules and variable transformations. Look up the Jacobian used in converting between x,y,z and spherical coordinates ρ,φ,θ and note its matrix structure. Skimming your Medium Lobosi article it seems it emphasizes this aspect.
The Jacobian also serves another purpose. As stated in the OP's article "The Jacobian operator Df:x⟼Df(x) is a linear map which provides the best linear approximation of f around a given point x."
We like approximations, we can make a speed vs accuracy/memory trade off, you only have so much space in a register or memory cell, and trying to get more accuracy past a certain point takes more memory/computations/time.
The article then notes that many computations involve Jacobians with sparse matrices meaning some matrix elements can be ignored so we don't have to waste our time on them if handled cleverly.
Subsequent sections cover methods to identify and label sparsity patterns. The article explains how applying their proposed coloring techniques to large matrices common in machine learning yields significant efficiency gains.
As far as the mathematical heritage, I don't know the family tree, but I suspect it stems from courses blending matrix theory linear algebra and algorithms so you'd want the computer science version of such math. Functional approximation ties to numerical methods though I am uncertain if introductory texts cover Jacobians. Check out Newton's method to grasp its mechanics and understand how that works then explore its Jacobian extension. For the coloring aspect graph theory is where to turn. You can learn its basics with minimal prerequisites by studying the seven bridges problem or the map coloring problem, do the five color version. Many of these concepts can be simplified into small programming projects. They will not rival Matlab but they will solidify your understanding.
So far, nothing but I'll keep trying ..
https://github.com/alshedivat/al-folio