Can you suggest a good reference for understanding which algorithms map well onto the regular grid systolic arrays used by TPUs? The fine article says dese matmul and convolution are good, but is there anything else? Eigendecomposition? SVD? matrix exponential? Solving Ax = b or AX = B? Cholesky?
WithinReason · 6m ago
Anything that you can express as 128x128 (but ideally much larger) dense matrix multiplication and nothing else
almostgotcaught · 2h ago
> In essence, caches allow hardware to be flexible and adapt to a wide range of applications. This is a large reason why GPUs are very flexible hardware (note: compared to TPUs).
this is correct but mis-stated - it's not the caches themselves that cost energy but MMUs that automatically load/fetch/store to cache on "page faults". TPUs don't have MMUs and furthermore are a push architecture (as opposed to pull).
jan_Sate · 2h ago
I thought that it would be about 3D printer filament.
If so, wild. That seems like overkill.
[0]: https://henryhmko.github.io/posts/tpu/images/tpu_tray.png
this is correct but mis-stated - it's not the caches themselves that cost energy but MMUs that automatically load/fetch/store to cache on "page faults". TPUs don't have MMUs and furthermore are a push architecture (as opposed to pull).