Nvidia trains 10T model in 4 bit precision (NVFP4)

Comments (3)

jasonjmcghee · 3h ago

Incorrect and wildly misleading headline.

This is a 12B parameter model trained on 10T tokens.

It's also editorialized which is against HN.

Title is: "NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit"

patrickhogan1 · 1h ago

12B model

opcode84 · 4h ago

For narrow-precision formats to be practical in large-scale pretraining, they must ensure both model accuracy and stable convergence. To assess the viability of 4-bit precision in large-scale model training, experiments were conducted with FP8 and NVFP4 on a 12-billion parameter model based on a combined Mamba-Transformer architecture (12B Hybrid Mamba-Transformer model)—similar to NVIDIA Nemotron Nano 2. This model was trained on a massive dataset of 10 trillion tokens using a phased data-blending approach, switching to a different dataset mix in the second phase of training at 70%, and in the third phase of training at 90% during pretraining.

A version of the 12B Hybrid Mamba-Transformer model was initially trained with 8-bit precision—FP8, which has been shown in previous studies to closely match 16-bit precision, and hence served as our baseline for comparison. We then successfully trained this same 12B model from scratch using NVFP4, demonstrating that this new low-precision format can support full pretraining at trillion-token scale. The NVFP4 run exhibited stable convergence without the training instabilities or divergence issues that typically plague ultra-low precision training.

Figure 3 below shows that NVFP4’s validation loss curve closely matches the loss curves from the higher-precision baseline (i.e., FP8) throughout the entire duration of training. The quantization techniques outlined above ensure that even with aggressive bit-width reduction, the 4-bit pretraining dynamics closely resemble those of higher-precision runs.

LiteLLM (YC W23) is hiring a back end engineer (ycombinator.com)

SigNoz (YC W21, Open Source Datadog) Is Hiring Platform Engineers (Remote) (jobs.ashbyhq.com)

Motion (YC W20) Is Hiring Principal Software Engineers (jobs.ashbyhq.com)

Bild AI (YC W25) Is Hiring an Applied AI Engineer (workatastartup.com)

Text.ai (YC X25) Is Hiring Founding Full-Stack Engineer (ycombinator.com)

Cua (YC X25) is hiring design engineers in SF (ycombinator.com)

Activeloop (YC S18) Is Hiring Member of Technical Staff – Back End Engineering (careers.activeloop.ai)

Coris (YC S22) Is Hiring (ycombinator.com)

14.ai (YC W24) is hiring engineers in SF to build an AI-native Zendesk (14.ai)

Spice Data (YC S19) Is Hiring a Product Associate (New Grad) (ycombinator.com)

Ashby (YC W19) Is Hiring Design Engineers in AMER and EMEA (ashbyhq.com)

EasyPost (YC S13) Is Hiring (easypost.com)

Tesorio (YC S15) Is Hiring a Senior GenAI Engineer (100% Remote) (tesorio.com)

OneSignal (YC S11) Is Hiring Engineers (onesignal.com)

Axle (YC S22) is hiring product engineers (ycombinator.com)

Mbodi AI (YC X25) Is Hiring a Founding Research Engineer (Robotics) (ycombinator.com)

ReadMe (YC W15) Is Hiring a Developer Experience PM (readme.com)

Weave (YC W25) is hiring a founding AI engineer (ycombinator.com)

Depot (YC W23) Is Hiring a Community and Events Manager (Remote) (ycombinator.com)

CoLoop (YC S21) Is Hiring AI Engineers in London

Trellis (YC W24) Is Hiring: Automate Prior Auth in Healthcare (ycombinator.com)

Type (YC W23) is hiring a founding engineer to build an AI-native doc editor (ycombinator.com)

Foundry (YC F24) is hiring staff-level product engineers (ycombinator.com)

GoGoGrandparent (YC S16) Is Hiring Back End and Full-Stack Engineers

Kyber (YC W23) is hiring enterprise account executives (ycombinator.com)

Converge (YC S23) well-capitalized New York startup seeks product developers (runconverge.com)

Great Question (YC W21) Is Hiring a VP of Engineering (Remote) (ycombinator.com)

Coverage Cat (YC S22) Is Hiring a Senior, Staff, or Principal Engineer (coveragecat.com)

Kaizen (YC X25) is hiring engineers to build browser agents that work (kaizenautomation.com)

Infracost (YC W21) hiring first PM to shift $600B cloud spend to proactive (ycombinator.com)

Sei (YC W22) Is Hiring a Full Stack Engineer in Chennai, India (ycombinator.com)

Nvidia trains 10T model in 4 bit precision (NVFP4)

Comments (3)