Show HN: LoopMix128 – Fast C PRNG (.46ns), 2^128 Period, BigCrush/PractRand Pass

24 the_othernet 14 5/10/2025, 9:25:21 PM github.com ↗
LoopMix128 is a fast C PRNG I wrote for non-cryptographic tasks.

GitHub (MIT): https://github.com/danielcota/LoopMix128

Highlights:

* ~0.37 ns/value (GCC 11.4, -O3 -march=native), 98% faster than xoroshiro128++ and PCG64.

* Passes TestU01 BigCrush & PractRand (32TB).

* Guaranteed 2^128 period.

* Proven injective (192-bit state) via Z3 SMT solver; allows parallel streams.

* Core requires only stdint.h.

Seeking feedback on design, use cases, or further testing.

Comments (14)

aappleby · 58m ago
MurmurHash/SMHasher author here. While I don't doubt this passes BigCrush etc, I do find it very surprising that it does.

The state update function is effectively "a = rotate(a, constant) + b; b = rotate(b, constant) + constant;" and the output derivation is "output = (a + b) * constant".

That update function is _barely_ nonlinear, and the output derivation is linear. The output would probably be slightly better as "(a ^ b) * constant".

The slow_loop thing to guarantee 2^128 period is probably not needed - anyone with an application that cares about a period that high is probably going to choose a more robust generator (a few rounds of hardware-accelerated AES in counter mode is your best bet there)

The use of the Z3 prover is neat and I should read up on that more.

aappleby · 45m ago
I'm not sure that the claim "the mix function is injective" is sufficient to support the claim "The period is at least 2^128". If the mix is reversible then it forms a permutation on 2^192, but that does not imply that it forms a single cyclic permutation.

For example, if f(0) = 1 and f(1) = 0, even if the rest of f's domain is injective the period of f is still only 2 when the initial value is 0 or 1.

the_othernet · 38m ago
I wasn't able to analyze the cyclic behavior of the mix directly, but for the purpose of minimal period only fast_loop and slow_loop are used (as a 128bit counter).
the_othernet · 44m ago
Hello! Awesome work on your hashing by the way!

When iterating I first tried to make fast_loop as random as possible by trying all possible rotational values and then having each option tested in PractRand 1000 times from 256M to 8GB. There was a wide range of performance by rotation. 47 was the best (for the GR constant) and resulted in the most tests being passed. The goal was a stronger than normal core for the PRNG that could feed actively into mix.

I found the multiplication to be necessary for passing PractRand and BigCrush with the state mixing as posted.

I had a variant which assigned mix as rotate(mix,constant) + (mix^fast_mix). That would pass cleanly with mix directly outputted (with no multiplication) - but I couldn’t get Z3 prover to show injectivity so I decided to go with the posted route.

jrapdx3 · 1h ago
From the Github repo: "Created by Daniel Cota after he fell down the PRNG rabbit-hole by circumstance." I understand how that can happen.

Wondering how this PRNG compares to PCG PRNGs that also claim to be "simple fast space-efficient statistically good algorithms" and "hard to predict" [0].

In any case, it's good to see the excellent work being accomplished in this space.

[0] https://www.pcg-random.org/

the_othernet · 1h ago
I just added PGC64 to benchmark.c in the Github. PCG64 speed looks to be about the same as xoroshiro128++.
zX41ZdbW · 1h ago
Also interesting to include PCG for comparison.
the_othernet · 1h ago
Just added to benchmark.c in the Github. Performance is comparable to xoroshiro128++.
RainyDayTmrw · 1h ago
When do people both want (PRNG performance is a measurable fraction of total program performance) and can use (no security constraints) an extremely fast PRNG?
pierrec · 47m ago
It's common in graphics and audio programming. In audio, maybe you're synthesizing noise or any of the myriad synthesis techniques that require noise. In graphics you have lighting, textures, etc that can use this. And when you're doing something every audio sample or every pixel, "extremely fast" is desirable. The question of whether to use a pre-rendered lookup table or a fast algorithm often comes up (and has no universal answer... though I always go for the latter)
986aignan · 1h ago
Monte Carlo simulations would be the obvious example.
zX41ZdbW · 1h ago
> after he fell down the PRNG rabbit-hole by circumstance

Curious to learn more about the circumstance :)

the_othernet · 1h ago
I have an offline poker app, and a user asked me what the algorithm was to generate the random numbers. That was a month ago. :)
kstrauser · 23m ago
I feel this in my bones. I’m not qualified to comment on the quality of the result, but I certainly appreciate and identify with the circumstances.