Derivatives, Gradients, Jacobians and Hessians

29 ibobev 11 8/17/2025, 2:08:18 PM blog.demofox.org ↗

Comments (11)

ziofill · 8m ago
Mmh, this is a bit sloppy. The derivative of a function f::a -> b is a function Df::a -> a -o b where the second funny arrow indicates a linear function. I.e. the derivative Df takes a point in the domain and returns a linear approximation of f (the jacobian) at that point. And it’s always the jacobian, it’s just that when f is R -> R we conflate the jacobian (a 1x1 matrix in this case) with the number inside of it.
flufluflufluffy · 5m ago
Fantastic post! As short as it needs to be while still communicating its points effectively. I love walking up the generalization levels in math.
sestep · 11m ago
A bit more advanced than this post, but for calculating Hessians, the Julia folks have done some cool work recently building on classical automatic differentiation research: https://iclr-blogposts.github.io/2025/blog/sparse-autodiff/
whatever1 · 35m ago
I can look around me and find the minimum of anything without tracing its surface and following the gradient. I can also identify immediately global minima instead of local ones.

We all can do it in 2-3D. But our algorithms don’t do it. Even in 2D.

Sure if I was blindfolded, feeling the surface and looking for minimization direction would be the way to go. But when I see, I don’t have to.

What are we missing?

ks2048 · 27m ago
When you look at a 2D surface, you directly observe all the values on that surface.

For a loss-function, the value at each point must be computed.

You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search.

For high dimensions, there's just way too many "points" to compute.

Chinjut · 29m ago
You're thinking of situations where you are able to see a whole object at once. If you were dealing with an object too large to see all of, you'd have to start making decisions about how to explore it.
i_am_proteus · 19m ago
Without looking up the answer (because someone has already computed this for you), how would you find the highest geographic point (highest elevation) in your country?
hackinthebochs · 23m ago
You're ignoring all the calculations that go on unconsciously that realize your conscious experience of "immediately" apprehending the global minima.
fancyfredbot · 23m ago
Your visual cortex is a massively parallel processor.
adrianN · 31m ago
The inputs you can process visually are of trivial size even for naive algorithms, and probably also simple instances. I certainly can’t find global minima in 2d for any even slightly adversarial function.
pestatije · 27m ago
touch and sight sense essentially the same...the difference is in the magnitudes involved