Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass
38 the_othernet 22 5/10/2025, 9:25:21 PM github.com ↗
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.
GitHub (MIT): https://github.com/danielcota/LoopMix128
Highlights:
* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.
* Passes TestU01 BigCrush & PractRand (32TB).
* Guaranteed 2^128 period.
* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.
* Core requires only stdint.h.
Seeking feedback on design, use cases, or further testing.
Basically any generator with a 192 bit state can pass BigCrush/PranctRand- even known terrible ones like middle square!
https://www.pcg-random.org/posts/too-big-to-fail.html
Here are corresponding results for LCGs, interestingly there's a clear affine relationship between state size and log(bytes to practrand failure).
https://www.pcg-random.org/posts/does-it-beat-the-minimal-st...
I did have to alter the output to be "(GR * mix) + fast_loop" and change the rotation constants to be 12 and 5.
The state update function is effectively "a = rotate(a, constant) + b; b = rotate(b, constant) + constant;" and the output derivation is "output = (a + b) * constant".
That update function is _barely_ nonlinear, and the output derivation is linear. The output would probably be slightly better as "(a ^ b) * constant".
The slow_loop thing to guarantee 2^128 period is probably not needed - anyone with an application that cares about a period that high is probably going to choose a more robust generator (a few rounds of hardware-accelerated AES in counter mode is your best bet there)
The use of the Z3 prover is neat and I should read up on that more.
For example, if f(0) = 1 and f(1) = 0, even if the rest of f's domain is injective the period of f is still only 2 when the initial value is 0 or 1.
When iterating I first tried to make fast_loop as random as possible by trying all possible rotational values and then having each option tested in PractRand 1000 times from 256M to 8GB. There was a wide range of performance by rotation. 47 was the best (for the GR constant) and resulted in the most tests being passed. The goal was a stronger than normal core for the PRNG that could feed actively into mix.
I found the multiplication to be necessary for passing PractRand and BigCrush with the state mixing as posted.
I had a variant which assigned mix as rotate(mix,constant) + (mix^fast_mix). That would pass cleanly with mix directly outputted (with no multiplication) - but I couldn’t get Z3 prover to show injectivity so I decided to go with the posted route.
Wondering how this PRNG compares to PCG PRNGs that also claim to be "simple fast space-efficient statistically good algorithms" and "hard to predict" [0].
In any case, it's good to see the excellent work being accomplished in this space.
[0] https://www.pcg-random.org/
Curious to learn more about the circumstance :)