Can you suggest a good reference for understanding which algorithms map well onto the regular grid systolic arrays used by TPUs? The fine article says dese matmul and convolution are good, but is there anything else? Eigendecomposition? SVD? matrix exponential? Solving Ax = b or AX = B? Cholesky?
musebox35 · 36m ago
I think https://jax-ml.github.io/scaling-book/ is one of the best references to go through. It details how single device and distributed computations map to TPU hardware features. The emphasis is on mapping the transformer computations, both forwards and backwards, so requires some familiarity with how transformer networks are structured.
WithinReason · 58m ago
Anything that you can express as 128x128 (but ideally much larger) dense matrix multiplication and nothing else
serf · 53m ago
does that cooling channel have a NEMA stepper on it as a pump or metering valve?[0]
> In essence, caches allow hardware to be flexible and adapt to a wide range of applications. This is a large reason why GPUs are very flexible hardware (note: compared to TPUs).
this is correct but mis-stated - it's not the caches themselves that cost energy but MMUs that automatically load/fetch/store to cache on "page faults". TPUs don't have MMUs and furthermore are a push architecture (as opposed to pull).
jan_Sate · 3h ago
I thought that it would be about 3D printer filament.
If so, wild. That seems like overkill.
[0]: https://henryhmko.github.io/posts/tpu/images/tpu_tray.png
this is correct but mis-stated - it's not the caches themselves that cost energy but MMUs that automatically load/fetch/store to cache on "page faults". TPUs don't have MMUs and furthermore are a push architecture (as opposed to pull).