Mmh, this is a bit sloppy. The derivative of a function f::a -> b is a function Df::a -> a -o b where the second funny arrow indicates a linear function. I.e. the derivative Df takes a point in the domain and returns a linear approximation of f (the jacobian) at that point. And it’s always the jacobian, it’s just that when f is R -> R we conflate the jacobian (a 1x1 matrix in this case) with the number inside of it.
flufluflufluffy · 5m ago
Fantastic post! As short as it needs to be while still communicating its points effectively. I love walking up the generalization levels in math.
I can look around me and find the minimum of anything without tracing its surface and following the gradient. I can also identify immediately global minima instead of local ones.
We all can do it in 2-3D. But our algorithms don’t do it. Even in 2D.
Sure if I was blindfolded, feeling the surface and looking for minimization direction would be the way to go. But when I see, I don’t have to.
What are we missing?
ks2048 · 27m ago
When you look at a 2D surface, you directly observe all the values on that surface.
For a loss-function, the value at each point must be computed.
You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search.
For high dimensions, there's just way too many "points" to compute.
Chinjut · 29m ago
You're thinking of situations where you are able to see a whole object at once. If you were dealing with an object too large to see all of, you'd have to start making decisions about how to explore it.
i_am_proteus · 19m ago
Without looking up the answer (because someone has already computed this for you), how would you find the highest geographic point (highest elevation) in your country?
hackinthebochs · 23m ago
You're ignoring all the calculations that go on unconsciously that realize your conscious experience of "immediately" apprehending the global minima.
fancyfredbot · 23m ago
Your visual cortex is a massively parallel processor.
adrianN · 31m ago
The inputs you can process visually are of trivial size even for naive algorithms, and probably also simple instances. I certainly can’t find global minima in 2d for any even slightly adversarial function.
pestatije · 27m ago
touch and sight sense essentially the same...the difference is in the magnitudes involved
We all can do it in 2-3D. But our algorithms don’t do it. Even in 2D.
Sure if I was blindfolded, feeling the surface and looking for minimization direction would be the way to go. But when I see, I don’t have to.
What are we missing?
For a loss-function, the value at each point must be computed.
You can compute them all and "look at" the surface and just directly choose the lowest - that is called a grid search.
For high dimensions, there's just way too many "points" to compute.