Ask HN: Is GPU nondeterminism bad for AI?

2 ramity 0 6/5/2025, 9:47:52 PM
Argument:

- GPUs use parallelism

- Floating point math is not associative

- Rounding error accumulates differently

- GPUs generate noisy computations

- Known noise vs accuracy tradeoff in data

- Noise requires overparameterization/larger network to generalize

- Overparameterization prevents the network from fully generalizing to the problem space

Therefore, GPU nondeterminism seems bad for AI. Where did I go wrong?

Questions:

- Has this been quantified? As I understand it, the answer would be situational and tied to other details like network depth, width, architecture, learning rate, etc. At the end of the day, entropy means some sort of noise/accuracy tradeoff, but are we talking magnitudes like 10%, 1%, 0.1%?

- Because of the noise/accuracy tradeoff, it seems to hold that one could use a smaller network trained deterministically and achieve the same performance as X bigger network trained non-deterministically. Is this true, even if we're talking only a single neuron of a difference?

- If something like the problem space of driving a car is too large to be fully represented into a dataset (consider the atoms of the universe as a hard drive), how can we be sure a dataset is a perfect sampling of the problem space?

- Wouldn't overparameterization guarantee the model learns the dataset and not the problem space? Is it incorrect to conceptualize this as using a polynomial of a higher degree to represent another?

- Even with perfect sampling, noisy computation seems incompatible when a small amount of noise is capable of causing an avalanche. If this noise is somehow quantified to 1%, couldn't you say the dataset's "impression" left in the network would be 1% larger than it should, maybe spilling over in a sense? Eval data points "very close to" but not included in training datapoints would be more likely to incorrectly evaluate to as the same "nearby" training datapoint. Maybe I'm reinventing edge case and overfitting here, but I don't think overfitting just spontaneously starts happening towards the end of training.

Comments (0)

No comments yet