Show HN: Efficient `Torch.cdist` Using Triton

1 codeinassembly 0 6/17/2025, 8:18:34 AM github.com ↗
For my research, I needed to use `torch.cdist` https://docs.pytorch.org/docs/stable/generated/torch.cdist.h... with p != 2.

Turns out torch's implementation for cdist with p != 2 is prohibitively slow. I came up with an alternative implementation using triton https://triton-lang.org/, which supports backprop and so can be used for training too.

Should be pretty plug-and-play, if you're using `torch.cdist` too it'd be awesome if you could try it with your code and share feedback for things that break.

Huge shoutout to both torch's and triton's teams for making it easy to write and integrate custom ops, I'm new to this and looking back, the whole process was streamlined very effectively.

Comments (0)

No comments yet