Show HN: Fast Random Library for C++17
Random number generation feels is a somewhat underrepresented topic in the C++ realm. There is a lot of questionable info about it found online and even the standard library is quite behind the times in terms of it's algorithms. It suffers from trying to accommodate sometimes impractical standard requirements and has several ways of getting significantly bad statistical results. This leaves a lot easily achievable performance & quality on the table.
So, being a mathematician who mostly works with stochastic models and wants these models to run fast and well, I embarked on a journey of trying to summarize "what is good and what is bad" and implement the "best stuff there currently is".
Thankfully, the design of C++ <random> is quite flexible and easy to extend. With some cleanup, generalization and compile-time logic all the different algorithms can be wrapped in a generic standard-compatible API.
A result of this work is single-header RNG library which has:
- <random>-compatible generators (PRNGSs) with 3x-6x better performance
- Cryptographically secure generators (CSPRNGs)
- Faster uniform / normal distributions that produce same sequences on every platform
- Quick approximations of some non-linear distributions
- More reliable entropy sources that std::random_device()
- rand()-like API for when we just want random numbers without the boilerplate of proper a <random> setup
Effectively all of this gets us 2x-8x speedups on many workloads while producing even better statistical quality.Don't think there is anything else like it, so I would like to showcase the result here and hear some opinions on its improvement:
https://github.com/DmitriBogdanov/UTL/blob/master/docs/modul...
For those interested, there is a more detailed rundown of all the quirks of this topic at end of the docs, might prove an interesting read.
One note on API design: I think it's a FANTASTIC idea to have default `rand()` functions available, since very often, you don't really care about the generator and you're just like "just give me a random number, i don't care how". But if you do, you shouldn't have a `seed()` function, because that means you can then never upgrade it without breaking your API contract. It should always be seeded with entropy. This is why glibc's `rand()` is still using an LCG from the late neolithic, they can't ever update it without breaking a gazillion applications and tests. This is the classic example of "Hyrum's law". Doing this also helps with thread-safety: you can just make the seed/state thread-local and then it just works.
Basically, if you want the ability to seed your PRNG, you also need to specify the generator explicitly. The global one is for convenience only, and there should be no ability in the API to seed it.
EDIT: btw, this is a really nice thing about C++, doing this is dead easy:
Don't think it's a problem in practice because every online tutorial or Stack Overflow posts contain that multi-line snippet for real entropy. C++ programmers are somewhat used to verbosity anyways.
No, it was more like: it's easy to write code that appears to fully seed a RNG with entropy, but in fact only seeds a tiny part of its state. Making it seed the whole lot, especially in a portable way, is surprisingly hard to get right. But I now can't find the article that explains this.
I seem to remember I had to use Boost.Random instead of the C++ standard library to do this in a real program, but this was a few years ago and the latest standard that was available in most implementations was C++11. Perhaps things have improved since.
One caveat:
> Note 2: If no hardware randomness is available, std::random_device falls back onto an internal PRNG ....
> ...
> std::random_device has a critical deficiency in it's design — in case its implementation doesn't provide a proper source of entropy, it is free to fallback onto a regular PRNGs that don't change from run to run. The method std::random_device::entropy() which should be able to detect that information is notoriously unreliable and returns different things on every platform.
> entropy() samples several sources of entropy (including the std::random_device itself) and is (almost) guaranteed to change from run to run even if it can't provide a proper hardware-sourced entropy that would be suitable for cryptography.
Personally, I think it would be best if there was a way to communicate to the system (or here, to the library in specific) what is the use case. For cryptographic applications, I don't want the library to fall back gracefully to something insecure; I would want a dark red critical error message and immediate termination with an "insufficient entropy error" error code.
However, for a game graceful degradation might be quite okay, because nobody is going to die in the real world if a monster behaves a little less random.
I learned a lot about recent advances in pseudo-random number generators by reading your code and associated documentation, including some stuff that DEK has yet to incorporate into volume 2 of TAOCP. ;)
- Faster uniform / normal distributions that produce same sequences on every platform
This is very useful if you want to reliably repeat simulations across platforms!
One question:
Isn't this returning a reference to a local variable? In my understanding, 'objects' is destroyed when rand_choice() returns.> How is it faster than std: It uses the fact that popcount of a uniformly distributed integer follows a binomial distribution. By rescaling that binomial distribution and adding some linear fill we can achieve a curve very similar to a proper normal distribution in just a few instructions. While that level of precision is not suitable for a general use, in instances where quality is not particularly important (gamedev, fuzzing) this is perhaps the fastest possible way of generating normally distributed floats.
https://github.com/nessan/xoshiro
Documented at: https://nessan.github.io/xoshiro/
Handles the full family of these generators featuring arbitrary jump sizes, stream partitioning for parallel applications, and, like this library, a number of convenience sampling methods to shield the casual user from the complexities of using <random>.
Can be used with a companion `bit` library (https://github.com/nessan/bit) that performs efficient linear algebra and polynomial reduction over GF(2) for those that want to explore some of the mathematics behind these linear generators.
The algorithm is provably secure, so long as AES is secure. It is also backtracking resistant: an adversary with the current RNG state cannot step backwards.
On hardware with AES primitives, it's faster than MT, though slower than pcg64.
Science, neural networks, simulation, gaming, rendering, weather, nuclear, robotics, signal processing, engineering, finance, and more industries require fast rngs to get billions to trillions of them quickly.
Very few things actually need secure - only the things that need a secure endpoint, and most of those simply use the secure rng to do a private key transfer algo, after which there is no more rngs.
Use the right tool for the job. Widen your view of what things are used for. Etc.
On hardware with AES primitives this is simply false. Yes, embedded cases are different. There's some neat work on probabilistic computing that uses a xorshift RNG in hardware. These are specialized use cases that probably aren't your use case. Use the right tool for the job and try to be less condescending.
????
Do you even code these things?
I'm unaware of any CPU with AES hardware that lets you compute SIMD under AES, and AES instructions on every CPU I've used (quite a few) are again slower even for single data paths. Once you switch to PRNGs using SIMD that alone is an order of magnitude faster than any AES method. It's not hard to check (say via Agner Fog's CPU instruction tables if you want Intel/AMD) that no AES method is going to beat the fastest non-crypto RNGS. Or simply code and profile a few.
AES primitives are generally used for AES, and even for RNG, they are slow. No one is using crypto rand, even with AES, for all the uses I mentioned, which is orders of magnitude more rands than all the crypto in the world. Between physical simulation, video game uses, and no data centers and AI running PRNGs up the wazoo, all the crypto rand in the world is a tiny, tiny, fraction of the uses - and even there they're used a handful of times to set up a private key system with no more PRNGs needed (and this is why AES instructions were built into CPUs in the first place - for symmetric key encryption - no PRNGS there...)
And once you stop with the simple single rng call mentality, the world runs on GPUs and parallel HW where you get 1000 to ~20,000+ cores, all of which are doing non-crypto rand a lot for the uses above.
There's a reason places like CERN publish RNG algorithms suitable for their needs, which are often adopted in many industries that need actual high speed, high quality PRNGs, and none of them are crypto RNGs. Not one.
So go ahead and show me this fast AES PRNG that is as fast as the fastest non-crpyto RNGs. I'll wait......
I've worked in this space a long time, done DARPA HPC projects, written articles on PRNGs, designed specialized PRNGs, have a math PhD (my advisor did crypto, so I've done quite a bit of that), and have written numerical software for just about every application mentioned above. Read my posting history. I know what I'm taking about.
> Use the right tool for the job ....
Yes, indeed.
> ... and try to be less condescending.
Then don't make false claims, then double down with more. I'm not the only one calling you out on your claims in this thread; probably there's a reason.
There are quite a few applications:
- audio applications
- computer games
- simulations
- etc.
It can also be used to dither[1] the result when doing a final rendering, to improve the noise floor.
[1]: https://en.wikipedia.org/wiki/Noise_shaping
My guess is that yes, Randen is fast, but it is not quite as fast as PCG or other pseudo-random number generators.
Whether or not it is relevant, that's another story. I can not agree to your claim "there are vanishingly few use cases" for insecure random number generators.
You can check via: google scholar, search randen random number, note there is no version with a journal, click citations and look through the few articles mentioning it, you can see which of those are published, can see a few, not one has a journal ref to it.
Why would one claim they're fast, then simply fiddle with benchmarks so they find a place where they look fast?
When someone wants a fast RNG, they're not calling it 10 times per sec. They're hammering it for actual needs.
The naive collision resolution ideas start like this: chained items (slow, memory costs), linear probing (results in clustering), quadratic probing (better, still has problems), double hashing (good), PRNG sequences with good properties (so any item hitting the same spot checks the same seq of overflow slots. One can alternate linear and PRNG or other combos.
Start with the basic wiki article https://en.wikipedia.org/wiki/Hash_table, check through many of the modern has table types linked from there.
That's not what a PRNG does. A PRNG produces a sequence of pseudo-random values (with a particular distribution) based on an initial seed value.
A hash function is stateless, a PRNG is not.
In a hash table, the hash function is not used to produce a series of random numbers, that's why it's a bit silly to call it a PRNG. Instead, the purpose is to reduce a (possibly larger) input value to a fixed size output value, often with the requirement that similar input values should result in very different output values to avoid clustering.
However, your hash table example is still useful as an analogy.
Did you guys get a chance compare with [1]. These seen to be the standard for high-performance not Crypto RNGs.
[1]: https://github.com/DEShawResearch/random123
No comments yet
arc4random has been provided by glibc 2.36 (2022), and is available on all the above-mentioned systems as well. If you don't want to make a syscall per request (outside Linux), just use arc4random; it'll be the fastest method available. musl libc lacks arc4random, unfortunately, but you can always ship a small wrapper.
Systems that support arc4random also support arc4random_uniform, which is a way to get an unbiased unsigned integer between 0 and N (up to 2^32-1). That's probably the most important reason to use the arc4random family.
I think neither are unbelievably slow. I dunno, take some measurements and see, maybe it suits you.
Not quite believable...