Ask HN: Why hasn't x86 caught up with Apple M series?
Ask HN: Best codebases to study to learn software design?
Show HN: PyTorch K-Means GPU-friendly, single-file, hierarchical and resampling
I was working on dataset sampling and approximate nearest neighbor search, and tried several existing libraries for large-scale K-Means. I couldn't find something that was fast, simple, and would run comfortably on my own workstation without hitting memory limits. Maybe I missed an existing solution, but I ended up writing one that fit my needs.
The core insight: Keep your data on CPU (where you have more RAM) and intelligently move only the necessary chunks to GPU for computation during the iterative steps. Results always come back to CPU for easy post-processing. (Note: For K-Means++ initialization when computing on GPU, the full dataset still needs to fit on the GPU.)
It offers a few practical features:
- Chunked Computations: Memory-efficient processing of large datasets by only moving necessary data chunks to the GPU, preventing Out-Of-Memory errors
- Cluster splitting: Refine existing clusters by splitting a single cluster into multiple sub-clusters
- Zero Dependencies: Single file, only requires PyTorch. Copy-paste into any project
- Advanced Clustering: Hierarchical K-Means with optional resampling (following recent research), cluster splitting utilities.
- Device Flexibility: Explicit device control - data can live anywhere, computation happens where you specify (any accelerator PyTorch supports)
Future plans: - Add support for memory-mapped files to handle even bigger datasets
- Explore PyTorch distributed for multi-node K-Means
The implementation handles both L2 and cosine distances, includes K-Means++ initialization.Available on PyPI (`pip install pt_kmeans`) and the full implementation is at: https://gitlab.com/hassonofer/pt_kmeans
Would love feedback on the approach and any use cases I might have missed!
No comments yet