Ask HN: What "developer holy war" have you flip-flopped on?
10 points by meowface 1d ago 33 comments
Ask HN: Why Is My Happiness Tied to My Productivity?
32 points by hnquestion12345 4d ago 27 comments
How randomness improves algorithms (2023)
32 kehiy 16 8/14/2025, 9:43:53 AM quantamagazine.org ↗
Randomness seems to help more with intentionally belligerent users than anything else. There is no worst pattern because the same question twice yields a different result. For internal use that can help a little with batch processing, because the last time I did that seriously I found a big problem with clustering of outlier workloads. One team of users had 8-10x the cost of any other team. But since this work involved populating several caches, I sorted by user instead of team and that sorted it. Also meant that one request for team A was more likely to arrive after another request had populated the team data in cache instead of two of them concurrently asking the same questions. So it not only smoothed out server load it also reduced the overall read intensity a bit. But shuffling the data worked almost as well and took,hours instead of days.
Here's another crossover analogy. In games, for any deterministic strategy there are games where you can't play a deterministic (pure) strategy and hope to get the best outcome, you need randomized (mixed) ones. Otherwise the Adversary could have anticipated and generated a different pathological response.
This thesis presents and analyzes a simple principle for building systems: that there should be a random component in all arbitrary decisions. If no randomness is used, system performance can vary widely and unpredictably due to small changes in the system workload or configuration. This makes measurements hard to reproduce and less meaningful as predictors of performance that could be expected in similar situations.
[1] https://tlb.org/docs/thesis.pdf
But all else is seldom equal and Random 2 works as well or better.
In case anyone is curious, the way to phrase it as a question would be, "How does randomness improve algorithms?"
It's that there's a population of values (integers for factoring, nodes-to-delete for the graph) where we know a way to get a lot of information cheaply from most values, but we don't know which values, so we sample them.
Which isn't to say the PRNG isn't doing work - maybe it is, maybe any straightforward iteration through the sample space has problems, failure values being clumped together, or similar values providing overlapping information.
If so that suggests to me that you can do better sampling than PRNG, although maybe the benefit is small. When the article talks about 'derandomizing' an algorithm, is it referring to removing the concept of sampling from this space entirely, or is it talking about doing a better job sampling than 'random'?
A pseudo random sequence of choices is still sufficiently detached from the input. Random here means "I'm making a decision in a way that is independent from the input sufficiently so that structuring the input adversarially won't cause worst case performance." Coupled with "the cost of this algorithm is expressed assuming real random numbers".
That's the work Random is doing.
INB4 worst case: you can do worst case analysis on randomized analysis but it's either worst case across any choice or worst case in expectation, not worst case given a poor implementation of RNG, effectively randomization sometimes serves to shake you out of an increasingly niche and unlikely series of bad decisions that is the crux of an adversarial input.
To wit
> In the rare cases where the algorithm makes an unlucky choice and gets bogged down at the last step, they could just stop and run it again.
Instead we could say "we know shortcuts that work for most values in a population, but we don't know how to figure out which values without expensive computation" and that's not very mystical or paradoxical.
And using random sampling is now the obvious way to deal with a situation. But it's not random doing the work, that's an implementation detail of how you do the sampling.
It very well can be that there isn't another obvious way to iterate through the sample space that isn't a disaster. Maybe bad values are clumped together so when you retry on failure you retry a lot. But if that's the case, there might also be a better way to sample than random - if bad values are clumped then there may be an intelligent way to sample such that you hit two bad values in a row less often than random.
My question is if that's what's being referred to as 'derandomizing' - taking one of these algorithms that uses random sampling and sample more intelligently. Or if they instead mean using what they learned from the (so-called) probabilistic algorithm to go back to a more traditional form.
How randomization helps is by making it much easier to design algorithms. E.g. Verifying a solution is cheap, so proving your random choice is in some class of good choices and making an algorithm that turns that into a solution is still an interesting approach and opens up solutions to things we cant yet solve.
Derandomizing as presented apparently means proving for a few random but careful choices you're in the funnel towards the right solution, or (hopefully) can detect otherwise, and a few transformations or reasonably performant steps produce your solution.
Oh, that's cool, do you have a reference for that?