Simpler Backoff

123 todsacerdoti 45 5/31/2025, 4:43:14 AM commaok.xyz ↗

Comments (45)

efitz · 1d ago

Why a lookup table? Why not just have the client remember the last backoff, and multiply it by a constant and apply a ceiling?

def getNewBackoff( oldBackoff: int): int {

newBackoff = oldBackoff * BACKOFF_CONSTANT

if (newBackoff > BACKOFF_CEILING) { newBackoff = BACKOFF_CEILING }

return newBackoff }

if you bit align the ceiling value you can replace the conditional with a bitwise and mask. If you use 2 as your exponent you can replace the multiplication with a bit shift left.

It’s also a good idea to introduce jitter, e.g. don’t set the new backoff to the old backoff * a constant, but add a random value such that the old value gets randomized around the desired new value- this helps prevent pile-ons when a bunch of clients noticed about the same time that the server became unresponsive; if they all have the same backoff algorithm then they’re all going to hit the server again at about the same time. It’s good to spread that out to try to avoid server overloading during restarts/cold starts/whatever.

The best results I ever got were when I used server side hinting to avoid pile-ons when the retries from the client were all clustered around the same time.

When my server got in an overloaded state, the auth path for clients short circuited and delivered an overloaded message with a randomized suggested retry value. The server can’t make the clients honor it (we could in my case because we wrote the clients) but if the clients do honor it you can spread out reconnects over a period instead of make them all point-in-time and dampen or prevent oscillating overloads.

akshayshah · 22h ago

Full agreement from me that jitter is essential. Here’s AWS on good retry policies: https://aws.amazon.com/builders-library/timeouts-retries-and...

mariusor · 1d ago

Similar to your suggestion, I implemented a wrapper around retries in Go before and despite liking the lookup table version quite a lot, I think it's more important to be able to parametrize the initial duration of the back-off and also the rate at which it grows, so that's what I focused on.

I'm not sure how my example fares against on a simplicity scale, but it feels a little more readable to me:

    Retry(5, BackOff(time.Second, Linear(1.4), func(_ context.Context) { // do stuff } )

Example cribbed from the documentation: https://pkg.go.dev/git.sr.ht/~mariusor/ssm#StrategyFn

0x696C6961 · 1d ago

Backoff APIs often take a retry count as the input. Your suggestion would require changing the API which isn't always practical.

gdiamos · 1d ago

explicit is better than implicit

efitz · 20h ago

A server cannot make a client behave a certain way. Explicit doesn’t matter in this case, and as many of the comments said, most of the time the best thing to do for the server is to spread retry load randomly over an interval. Explicit is bad for the server.

The only thing (IMO) explicit retries are good for is to make client state more deterministic on the client, but there are many ways of doing that other than hard coding retry intervals. For example, use asynchronous patterns for networking APIs and allow querying state of API-calls-in-progress.

unsnap_biceps · 1d ago

While I like the simplicity of the lookup table, I find myself in larger projects to have different backoffs for different upstream services. I'll have shorter ones for services that can degrade without a large impact to service results and longer ones for services that are critical for results. I would prefer to have one algorithmically generated call that is flexible vs multiple individual functions that have different lookup tables, or generating tables and passing them in each time.

kiitos · 19h ago

Backoff is a client-side behavior, so if you want to give different backoff behaviors to different classes of upstream services, you just define those backoff tables somewhere where the client can pick from them based on the target service it's trying to talk to.

harpiaharpyja · 23h ago

Generating tables is the way to go in that case. The flexibility you need, with the advantages of being data driven.

fmbb · 1d ago

You could have two lookup tables.

VWWHFSfQ · 1d ago

what if you have 100 lookup tables

mceachen · 1d ago

If you don’t have lookup tables for your lookup tables you aren’t doing it right

avinassh · 1d ago

you could use SQLite, store all of them

kstrauser · 23h ago

MongoDB is web scale.

barbazoo · 1d ago

Careful, that’s awfully close to N. /s

handsclean · 1d ago

Here’s a version that’s IMO much simpler still, about half as long as both (after fixing v2’s line count cheating), and fixes a bug: both article versions wait an extra 60 seconds between the last attempt and returning an error.

    func do(ctx context.Context) error {
        for attempt := 1; ; attempt++ {
            if request(ctx) == nil {
                return nil
            }

            if attempt == 10 {
                return fmt.Errorf("failed after %d attempts", attempt)
            }

            select {
            case <-ctx.Done():
                return ctx.Err()
            case <-time.After(time.Second *
                min(60, math.Pow(2, attempt-1)*(0.75+rand.Float64()*0.5))):
            }
        }
    }

I think the problem with the original isn’t using a formula, it’s over-parameterization. Declaring constants right next to where they’re used once isn’t giving you anything but the opportunity for naming issues and rigidity in how you combine them. If they aren’t passed in or repeated, then I wouldn’t pull them out; if they are, then I’d try to look for minimal and natural boundary points, like just min and max, or maybe an attempt → delay function.

Waterluvian · 1d ago

I sometimes agree, sometimes disagree. I don’t like magic numbers like retry(3, 1_000, 2_500)

(Though sometimes your text editor and docs are good enough to help, but I don’t like making those feel required).

I like how Python (and probably many other languages) allows for using the name of a positional arg. So you could call the exact same function but write it as retry(count=3, min_ms=1_000, max_ms=2_500) if you feel like it needs more clarity. Now you’ve got documented values that are tied to the function signature, their names have been decided for you, and they’re inline!

Mawr · 20h ago

> Declaring constants right next to where they’re used once isn’t giving you anything but the opportunity for naming issues and rigidity in how you combine them. If they aren’t passed in or repeated, then I wouldn’t pull them out;

Using magic numbers all over the place is the mark of a novice who lacks understanding that the meaning behind the numbers he knows in his mind will not magically transfer over to other people reading the code. Advising this approach is an interesting call.

Specifically here, we've got these at the beginning of the function:

    const (
        maxAttempts = 10
        baseDelay   = 1 * time.Second
        maxDelay    = 60 * time.Second
    )

If I went out of my way to look at this function, these are probably the values I'm interested in changing, and look, they're right here, at the top. Neat. It's obvious how to adjust the values to get what I want. I can stop reading the function right there.

Without these declarations, how would I know that "10" midway through the function is the number of attempts? I'd have to read and understand the entire logic first. Great. Now multiply this effort by the amount of programmers on the team and the amount of magic numbers.

blensor · 1d ago

Is this meant as a joke or a serious argument?

I personally find the original code just as if not more readable than the static lookup table and I don't need to count out the elements if I want to know how often it will retry.

But more importantly, changing the max retries is trivial in the original code and tedious in the static lookup table, especially for bigger changes.

Also, this is something you most likely want to make configurable to adapt to certain scenarios which is not possible with a static table

There are more reasons against it but those are the the main ones that jump at me right away

masklinn · 1d ago

Not only that, but now that Go has iterators if you need this in several locations the logic of the original can easily be encapsulated in an iterator with a few relevant tunable knobs, and then you just write something like:

  for range ExponentialBackoff(ctx) {
      err := request(ctx)
      if err == nil {
          return nil
      }
  }

and if one of the callsites needs to configure the backoff then you've got something like:

  for range ExponentialBackoff(ctx, MaxAttempts(20), BaseDelay(5)) {
      err := request(ctx)
      if err == nil {
          return nil
      }
  }

ctxc · 1d ago

If the author is here - please turn word wrap off for code. It's unreadable on mobile.

jitl · 1d ago

Code wraps for me in Safari on iOS 18.5

mukiblejlok · 8h ago

For Python, I would recommend the stamina package. It provides all mentioned best practices about when and how to retry, in a form of easy to use function decorator. Plus, there is a great video about the problem by the author: https://youtu.be/BxikFuvaT1Y

rixed · 1d ago

Am I the only one who wonders how such trivial posts could end up on HN first page?

Retr0id · 1d ago

Something doesn't have to be long or complex to be interesting.

qoez · 1d ago

It's so easy to make bot accounts I wouldn't be surprised if a decent amount of them are manipulated.

jpillora · 1d ago

Simple backoff ftw https://github.com/jpillora/backoff

ricardobeat · 1d ago

That’s the exact same logic, just written more verbosely and excluding the actual retry mechanism.

yawaramin · 22h ago

On a slightly different note, I am using Fibonacci backoff for a couple of things here and there and (subjectively) finding it a bit nicer than the standard 2x exponential backoff. I have it on a system that waits for a few days to raise an alert on missing data and I find the alerts on a cadence of 1, 2, 3, 5, 8... days are a bit tighter but still not overwhelming.

axeljohnsson · 1d ago

I recently listened to Marc Brooker's talk [1] on retries. My key takeaway: You don't want to put more load on your system during stress. At AWS, they do "adaptive retries", which I found interesting.

[1] https://www.youtube.com/watch?v=rvHd4Y76-fs

zoover2020 · 1d ago

I second this, retry mechanisms can cause retry storms in vast enough, distributed systems. In Amazon code bases I found the same adaptive retry strategy after a Christmas where once we played whack-a-mole for a service to get back up as its clients kept retrying.

grg0 · 23h ago

Or a function backoff(attempt) that abstracts away the details and can be re-used in places, perhaps a higher-order function that takes as input the function that does whatever.

"This code, or something like it, probably looks really familiar:"

Better leave that company, to be honest.

jelder · 1d ago

This really just an argument for using iterator generators. Does Go offer those yet?

jitl · 1d ago

Yes

ChrisMarshallNY · 1d ago

I do this, in a couple of my Watch apps.

Might be worth looking at. The current implementation is slightly awkward, but it also uses a random delay.

Watch/iPhone communication isn't very robust. It's much better, in more recent Watches.

charcircuit · 1d ago

Switching away from an implementation that everyone is familiar with is the opposite of making something more readable.

larodi · 1d ago

indeed, back in the day i'd love to write perl code such as

this() and that() or otherr() and fail();

so this all seems very reasonable, and clear, and people who read the code hated it. I've tried many times to do it in JS sadly it does not work:

a && b || do { console.log (err) }

but this all does not make it easier to read, but makes it only very likely to outsmart yourself.

some code is algorithmically more beautiful than other, more declarative perhaps. but going away from the common understanding of trivial stuff, does not always benefit the group or the project.

spauldo · 1d ago

It's down to what's idiomatic for the language. I'd argue that the Perl example is idiomatic Perl (just as the Lisp equivalent would be idiomatic Lisp), but that's certainly not idiomatic JavaScript.

anonzzzies · 1d ago

But how many people need to read/write this code? There are tons of open source libraries, gateways, proxies etc etc.

ayhanfuat · 1d ago

Aren’t both implementations fairly common? Precomputed / hardcoded set of values vs adjusting it inside the loop?

harpiaharpyja · 1d ago

So, this is just about declarative programming style?

I thought based on the title it would have been about different approaches to backoff.

hsbauauvhabzb · 1d ago

Change of the sake of change?

OutOfHere · 1d ago

Using a lookup table doesn't allow parameter values to easily be changed. Also, some of us do not get paid by the line. Fwiw, I do often note the values in a comment, and this is generally sufficient.

kassner · 1d ago

Offtopic, but huge respect for this[1]:

> Lots of people don’t want to read about AI.

> I respect that.

> But I’m currently steeped in the world of AI, for better or for worse, and I want to blog about it. So I’ve split this blog in half.

> The normal blog, which you are reading, is now AI-free.

> There’s another, at /ai that contains only AI posts. It has its own RSS feed.

Thank you. It (non-AI) definitely goes to my RSS reader.

1: https://commaok.xyz/post/blog-schism/

No comments yet

bravesoul2 · 1d ago

Another approach is to make this a cross-cutting concern using a sidecar like Envoy.

How I got a Root Shell on a Credit Card Terminal (stefan-gloor.ch)

Atari Means Business with the Mega ST (goto10retro.com)

What We Lost with PHP and jQuery (idiallo.com)

Cinematography of "Andor" (pushing-pixels.org)

Figma Slides Is a Beautiful Disaster (allenpike.com)

A new generation of Tailscale access controls (tailscale.com)

RenderFormer: Neural rendering of triangle meshes with global illumination (microsoft.github.io)

When Fine-Tuning Makes Sense: A Developer's Guide (getkiln.ai)

Progressive JSON (overreacted.io)

RSC for Lisp Developers (overreacted.io)

Why DeepSeek is cheap at scale but expensive to run locally (seangoedecke.com)

How I like to install NixOS (declaratively) (michael.stapelberg.ch)

Google AI Edge – on-device cross-platform AI deployment (ai.google.dev)

Structured Errors in Go (2022) (southcla.ws)

Learning from the Amiga API/ABI (asm-basic-coder.neocities.org)

Father Ted Kilnettle Shrine Tape Dispenser (stephencoyle.net)

Browser extension (Firefox, Chrome, Opera, Edge) to redirect URLs based on regex (github.com)

A Beautiful Technique for Some XOR Related Problems (codeforces.com)

Hacking Pinball High Scores (gwern.net)

A Pokémon battle simulation engine (github.com)

Show HN: Patio – Rent tools, learn DIY, reduce waste (patio.so)

Show HN: A Implementation of Alpha Zero for Chess in MLX (github.com)

Ovld – Efficient and featureful multiple dispatch for Python (github.com)

New adaptive optics shows details of our star's atmosphere (nso.edu)

Stepping Back (rjp.io)

Snake on a Globe (engaging-data.com)

An optimizing compiler doesn't help much with long instruction dependencies (johnnysswlab.com)

Tldx – CLI tool for fast domain name discovery (github.com)

Why Use Structured Errors in Rust Applications? (home.expurple.me)

A Lean companion to Analysis I (terrytao.wordpress.com)

CCD co-inventor George E. Smith dies at 95 (nytimes.com)

Oniux: Kernel-level Tor isolation for any Linux app (blog.torproject.org)

YOLO-World: Real-Time Open-Vocabulary Object Detection (arxiv.org)

Oxfordshire clock still keeping village on time after 500 years (bbc.com)

Why we still can't stop plagiarism in undergraduate computer science (2018) (kevinchen.co)

Reviving Astoria – Windows's Lost Android (trungnt2910.com)

The Rise of the Japanese Toilet (nytimes.com)

I like Svelte more than React (it's store management) (river.berlin)

Precision Clock Mk IV (mitxela.com)

Canonicals Interview Process (dustri.org)

Enhancing MySQL: MySQL improvement project (github.com)

My five-year experiment with UTC (timestripe.com)

UK's Vertical VX4 eVTOL flies outside Kemble circuit for first time (flyer.co.uk)

Quantum Computing and the Hidden Subgroup Problem (daniellowengrub.com)

How Georgists Valued land in the 1900's (progressandpoverty.substack.com)

Show HN: AI Peer Reviewer – Multiagent system for scientific manuscript analysis (github.com)

Ask HN: Anyone making a living from a paid API?

Photos taken inside musical instruments (dpreview.com)

Every 5x5 Nonogram (pixelogic.app)

Do Patients Without a Terminal Illness Have the Right to Die? (nytimes.com)

Simpler Backoff

Comments (45)