How can a mutex in Wine be faster than a native one on Linux
3 points by lh_mouse 12h ago 1 comments
Ask HN: Best codebases to study to learn software design?
100 points by pixelworm 2d ago 89 comments
Nvidia trains 10T model in 4 bit precision (NVFP4)
6 opcode84 3 8/26/2025, 4:54:51 PM developer.nvidia.com ↗
This is a 12B parameter model trained on 10T tokens.
It's also editorialized which is against HN.
Title is: "NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit"
A version of the 12B Hybrid Mamba-Transformer model was initially trained with 8-bit precision—FP8, which has been shown in previous studies to closely match 16-bit precision, and hence served as our baseline for comparison. We then successfully trained this same 12B model from scratch using NVFP4, demonstrating that this new low-precision format can support full pretraining at trillion-token scale. The NVFP4 run exhibited stable convergence without the training instabilities or divergence issues that typically plague ultra-low precision training.
Figure 3 below shows that NVFP4’s validation loss curve closely matches the loss curves from the higher-precision baseline (i.e., FP8) throughout the entire duration of training. The quantization techniques outlined above ensure that even with aggressive bit-width reduction, the 4-bit pretraining dynamics closely resemble those of higher-precision runs.