Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass
76 the_othernet 34 5/10/2025, 9:25:21 PM github.com ↗
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.
GitHub (MIT): https://github.com/danielcota/LoopMix128
Highlights:
* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.
* Passes TestU01 BigCrush & PractRand (32TB).
* Guaranteed 2^128 period.
* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.
* Core requires only stdint.h.
Seeking feedback on design, use cases, or further testing.
The state update function is effectively "a = rotate(a, constant) + b; b = rotate(b, constant) + constant;" and the output derivation is "output = (a + b) * constant".
That update function is _barely_ nonlinear, and the output derivation is linear. The output would probably be slightly better as "(a ^ b) * constant".
The slow_loop thing to guarantee 2^128 period is probably not needed - anyone with an application that cares about a period that high is probably going to choose a more robust generator (a few rounds of hardware-accelerated AES in counter mode is your best bet there)
The use of the Z3 prover is neat and I should read up on that more.
When iterating I first tried to make fast_loop as random as possible by trying all possible rotational values and then having each option tested in PractRand 1000 times from 256M to 8GB. There was a wide range of performance by rotation. 47 was the best (for the GR constant) and resulted in the most tests being passed. The goal was a stronger than normal core for the PRNG that could feed actively into mix.
I found the multiplication to be necessary for passing PractRand and BigCrush with the state mixing as posted.
I had a variant which assigned mix as rotate(mix,constant) + (mix^fast_mix). That would pass cleanly with mix directly outputted (with no multiplication) - but I couldn’t get Z3 prover to show injectivity so I decided to go with the posted route.
For example, if f(0) = 1 and f(1) = 0, even if the rest of f's domain is injective the period of f is still only 2 when the initial value is 0 or 1.
https://dms.umontreal.ca/~andrew/PDF/CycleLengths1.pdf
Basically any generator with a 192 bit state can pass BigCrush/PranctRand- even known terrible ones like middle square!
https://www.pcg-random.org/posts/too-big-to-fail.html
Here are corresponding results for LCGs, interestingly there's a clear affine relationship between state size and log(bytes to practrand failure).
https://www.pcg-random.org/posts/does-it-beat-the-minimal-st...
The main thing this knowledge is worth when designing RNGs is that it tells you how reasonable it is to 'lose' N bits. Eg. if you're 2 bits worse than PCG, you're about twice as much worse than ideal.
I did have to alter the output to be "(GR * mix) + fast_loop" and change the rotation constants to be 12 and 5.
Wondering how this PRNG compares to PCG PRNGs that also claim to be "simple fast space-efficient statistically good algorithms" and "hard to predict" [0].
In any case, it's good to see the excellent work being accomplished in this space.
[0] https://www.pcg-random.org/
See https://www.pcg-random.org/posts/does-it-beat-the-minimal-st... for examples and constants
- your test codes are not reproducible: running twice generates different sets of numbers because they have unknown seeds. As a result, if you change (a) compilers or compiler switches, (b) operating system versions, (c) host processors or (d) architectures, the question arises: what is wrong? What is different? This is known as a regression test.
- try shishua as another speedy PRNG. See https://espadrine.github.io/blog/posts/shishua-the-fastest-p...
- You only test a few times? I think one hundred BigCrush tests, using a set of 100 seeds, would be suitable. Takes a few days on an RPI 4 (with cooler). Run the same 100 tests on a Ryzen and Xeon, just to be sure. They should be bit-for-bit identical.
- 100 BigCrush tests should show only a handful (4 or fewer) duplicate test failures.
- your seeds are almost great: too many people think "42" is random in a space of 0 through 2^64. But 0xdeadbeef is so 1990s...
- you don't need different seeds per PRNG; you can generate reproducible ones (2x to 4x 64bit) from a single good 64-bit seed and your favourite PRNG. Your test code should read a seed or set from the command line (see first item).
- warmups? Really?
- Remember that BigCrush and other tests are created by mathematical people not practical people. Do they test for equal numbers of odd and even results? Hmmmm....
- try Collatz, see https://news.ycombinator.com/item?id=39733685
- the tests are very cache-friendly; nobody does this. It's true that everybody compares against this unrealistic scenario, however.
- if you've spent a month in twisty little PRNG passages, all alike, you are on the first steps of your journey. Take a towel.
J
And you are so right about being just one month in. Every time I think I'm starting to understand what's going on here, I realize the maze just keeps getting deeper.
LoopMix128 ns/call: 0.380 ns
xoroshiro128++ ns/call: 0.757 ns
wyrand ns/call: 0.379 ns
PCG64 ns/call: 0.757 ns
RomuQuad ns/call: 0.575 ns
RomuDuoJr ns/call: 0.416 ns
Curious to learn more about the circumstance :)
A purely random function can lead to clumping and aliasing.