There's No Gold in Removing Signaling Costs (liorbd.me)
OmniNeural – First NPU-Aware Multimodal Model (huggingface.co)
Show HN: Text-to-Explainer Video in Seconds (trytorial.com)
(ROCm) AMD-GPU-Boost: Unlock Full Performance on Consumer AMD GPUs
I've been frustrated with AMD GPU performance in AI/ML applications - my RX 6800 XT was only using ~25% of its potential in PyTorch. The issue? ROCm was designed for MI-series enterprise GPUs and severely underdetects consumer GPU capabilities.
ROCm reports only 36 compute units instead of 72, and uses warp size 32 instead of the optimal 64 for RDNA2/3. This affects the entire RX 6000/7000 series.
AMD-GPU-BOOST fixes this at runtime by monkey-patching PyTorch's device detection. Results: - 4x performance improvement in inference - "NVIDIA-only" apps now run perfectly on AMD - Works with ComfyUI, Stable Diffusion, WAN 2.1, etc.
The tool includes a GUI installer for easy Pinokio integration and supports 18+ GPU models from RX 6400 to RX 7900 XTX.
This has been a major pain point for the AMD AI community - curious what the HN crowd thinks about runtime hardware detection fixes vs. proper driver/framework solutions.
Demo: My RX 6800 XT went from 1152 threads (36×32) to 4608 threads (72×64) - exactly what the hardware specs promise.