Push Ifs Up and Fors Down

149 goranmoomin 66 5/17/2025, 9:31:55 AM matklad.github.io ↗

Comments (66)

gnabgib · 38s ago
(2023) Discussion at the time (662 points, 295 comments) https://news.ycombinator.com/item?id=38282950
andyg_blog · 1h ago
A more general rule is to push ifs close to the source of input: https://gieseanw.wordpress.com/2024/06/24/dont-push-ifs-up-p...

It's really about finding the entry points into your program from the outside (including data you fetch from another service), and then massaging in such a way that you make as many guarantees as possible (preferably encoded into your types) by the time it reaches any core logic, especially the resource heavy parts.

dataflow · 46m ago
Doesn't this obfuscate what assumptions you can make when trying to understand the core logic? You prefer to examine all the call chains everywhere?
furyofantares · 1m ago
The idea and examples are that the type system takes care of it. The rule of thumb is worded overly generally, it's more just about stuff like null checks if you have non-nullable types available.
fmbb · 8m ago
The ”core logic” of a program is what output it yields for a given input.

If you find a bug, you find it because you discover that a given input does not lead to the expected output.

You have to find all those ifs in your code because one of them is wrong (probably in combination with a couple of others).

If you push all your conditionals up as close to the input as possible, your hunt will be shorter, and fixing will be easier.

setr · 11m ago
If you’ve massaged and normalized the data at entry, then the assumptions at core logic should be well defined — it’s whatever the rules of the normalized output are.

You don’t need to know all of the call chains because you’ve established a “narrow waist” where ideally all things have been made clear, and errors have been handled or scoped. So you only need to know the call chain from entry point to narrow waist, and separately narrow waist till end.

avianlyric · 31m ago
This is why we invented type systems. No need to examine call chains, just examine input types. The types will not only tell you what assumptions you can make, but the compiler will even tell you if you make an invalid assumption!
dataflow · 21m ago
You can't shove every single assumption into the type system...
knome · 58s ago
You can and should put as many as you can there

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

If instead of validating that someone has sent you a phone number in one spot and then passing along a string, you can as easily have a function construct an UncheckedPhoneNumber. You can choose to only construct VerifiedPhoneNumbers if the user has gone through a code check. Both would allow you to pluck a PhoneNumber out of them for where you need to have generic calling code.

You can use this sort of pattern to encode anything into the type system. Takes a little more upfront typing than all of those being strings, but your program will be sure of what it actually has at every point. It's pretty nice.

layer8 · 4m ago
True, but there are still documented interface contracts you can program against. The compiler won’t catch violations of the non-type parts, but the requirements are still well-defined with a proper interface contract. It is a trade-off, but so is repeating the same case distinction in multiple parts of the program, or having to pass around the context needed to make the case distinction.
treyd · 1m ago
You can express a lot of concepts just through types in languages with richer type systems.
Nevermark · 11m ago
Assumptions are like di**os. You can shove them anywhere if you are brave enough. [0]

[0] Source: Abraham Lincoln. (At least I have heard he said that.)

xg15 · 7m ago
Doesn't the second rule already imply some counterexamples for the first?

When I work with batches of data, I often end up with functions like this:

  function process_batch(batch) {
    stuff = setUpNeededHelpers(batch);
    results = [];
    for (item in batch) {
      result = process_item(item, stuff);
      results.add(result);
    }
    return results;
  }
Where "stuff" might be various objects, such as counters, lists or dictionaries to track aggregated state, opened IO connections, etc etc.

So the setUpNeededHelpers() section, while not extremely expensive, can have nontrivial cost.

I usually add a clause like

  if (batch.length == 0) {
    return [];
  }
at the beginning of the function to avoid this initialization cost if the batch is empty anyway.

Also, sometimes the initialization requires to access one element from the batch, e.g. to read metadata. Therefore the check also ensures there is at least one element available.

Wouldn't this violate the rule?

layer8 · 1h ago
The example listed as “dissolving enum refactor” is essentially polymorphism, i.e. you could replace the match by a polymorphic method invocation on the enum. Its purpose is to decouple the point where a case distinction is established (the initial if) from the point where it is acted upon (the invocation of foo/bar). The case distinction is carried by the object (enum value in this case) or closure and need not to be reiterated at the point of invocation (if the match were replaced by polymorphic dispatch). That means that if the case distinction changes, only the point where it is established needs to be changed, not the points where the distinct actions based on it are triggered.

This is a trade-off: It can be beneficial to see the individual cases to be considered at the points where the actions are triggered, at the cost of having an additional code-level dependency on the list of individual cases.

password4321 · 2h ago
Code complexity scanners⁰ eventually force pushing ifs down. The article recommends the opposite:

By pushing ifs up, you often end up centralizing control flow in a single function, which has a complex branching logic, but all the actual work is delegated to straight line subroutines.

https://docs.sonarsource.com/sonarqube-server/latest/user-gu...

jt2190 · 1h ago
Code scanners reports should be treated with suspicion, not accepted as gospel. Sonar in particular will report “code smells” which aren’t actually bugs. Addressing these “not a bug” issues actually increases the risk of introducing a new error from zero to greater than zero, and can waste developer time addressing actual production issues.
xp84 · 1h ago
I agree with you. Cyclomatic complexity check may be my least favorite of these rules. I think any senior developer almost always “knows better” than the tool does what is a function of perfectly fine complexity vs too much. But I have to grudgingly grant that they have some use since if the devs in question routinely churn out 100-line functions that do 1,000 things, the CCC will basically coincidentally trigger and force a refactor which may help to fix that problem.
jerf · 1h ago
Cyclomatic complexity may be a helpful warning to detect really big functions, but the people who worry about cyclomatic complexity also seem to be the sort of people who want to set the limit really low and get fiesty if a function has much more than a for loop with a single if clause in it. These settings produce those code bases where no function anywhere actually does anything, it just dispatches to three other functions that also don't hardly do anything, making it very hard to figure out what is going on, and that is not a good design.
BurningFrog · 19m ago
I call this "poltergeist code". Dozens of tiny functions that together clearly does something complex correctly, but it's very hard to find where and how it's actually done.
mnahkies · 39m ago
I wonder if there's any value in these kind of rules for detecting AI slop / "vibe coding" and preempting the need for reviewers to call it out.
hinkley · 2h ago
The way to solve this is to split decisions from execution and that’s a notion I got from our old pal Bertrand Meyer.

    if (weShouldDoThis()) {
        doThis();
    }
It complements or is part of functional core imperative shell. All those checks being separate makes them easy to test, and if you care about complexity you can break out a function per clause in the check.
0cf8612b2e1e · 1h ago
Functions should decide or act, not both.
d_burfoot · 1h ago
This is an excellent rule.
daxfohl · 2h ago
IME this is frequently a local optimum though. "Local" meaning, until some requirement changes or edge case is discovered, where some branching needs to happen outside of the loop. Then, if you've got branching both inside and outside the loop, it gets harder to reason about.

It can be case-dependent. Are you reasonably sure that the condition will only ever effect stuff inside the loop? Then sure, go ahead and put it there. If it's not hard to imagine requirements that would also branch outside of the loop, then it may be better to preemptively design it that way. The code may be more verbose, but frequently easier to follow, and hopefully less likely to turn into spaghetti later on.

(This is why I quit writing Haskell; it tends to make you feel like you want to write the most concise, "locally optimum" logic. But that doesn't express the intent behind the logic so much as the logic itself, and can lead to horrible unrolling of stuff when minor requirements change. At least, that was my experience.)

ummonk · 1h ago
I've always hated code complexity scanners ever since I noticed them complaining about perfectly readable large functions. It's a lot more readable when you have the logic in one place, and you should only be trying to break it up when the details cause you to lose track of the big picture.
marcosdumay · 1h ago
There was a thread yesterday about LLMs where somebody asked "what other unreliable tool people accept for coding?"

Well, now I have an answer...

Waterluvian · 24m ago
My weird mental model: You have a tree of possible states/program flow. Conditions prune the tree. Prune the tree as early as possible so that you have to do work on fewer branches.

Don’t meticulously evaluate and potentially prune every single branch, only to find you have to prune the whole limb anyways.

Or even weirder: conditionals are about figuring out what work doesn’t need to be done. Loops are the “work.”

Ultimately I want my functions to be about one thing: walking the program tree or doing work.

shawnz · 50m ago
Sometimes I like to put the conditional logic in the callee because it prevents the caller from doing things in the wrong order by accident.

Like for example, if you want to make an idempotent operation, you might first check if the thing has been done already and if not, then do it.

If you push that conditional out to the caller, now every caller of your function has to individually make sure they call it in the right way to get a guarantee of idempotency and you can't abstract that guarantee for them. How do you deal with that kind of thing when applying this philosophy?

Another example might be if you want to execute a sequence of checks before doing an operation within a database transaction. How do you apply this philosophy while keeping the checks within the transaction boundary?

avianlyric · 19m ago
You’ve kind of answered your own question here.

> If you push that conditional out to the caller, now every caller of your function has to individually make sure they call it in the right way to get a guarantee of idempotency

In this situation your function is no longer idempotent, so you obviously can’t provide the guarantee. But quite frankly, if you’re having to resort to making individual functions implement state management to provide idempotency, then I suspect you’re doing something very odd, and have way too much logic happening inside a single function.

Idempotent code tends to fall into two distinct camps:

1. Code that’s inherently idempotent because the data model and operations being performed are inherently idempotent. I.e. your either performing stateless operations, or you’re performing “PUT” style operations where in the input data contains all the state the needs to be written.

2. Code that’s performing more complex business operations where you’re creating an idempotent abstraction by either performing rollbacks, or providing some kind of atomic apply abstraction that ensures partial failures don’t result in corrupted state.

For point 1, you shouldn’t be checking for order of operations, because it doesn’t matter. Everything is inherently idempotent, just perform the operations again.

For point 2, there is no simple abstraction you can apply. You need to have something record the desired operation, then ensure it either completes or fails. And once that happens, ensures that completion or failure is persistent permanently. But that kind of logic is not the kind of thing you put into a function and compose with other operations.

bee_rider · 19m ago
Maybe write the functions without the checks, then have wrapper functions that just do the checks and then call the internal function?
manmal · 1h ago
It’s a bit niche for HN, but SwiftUI rendering works way better when following this. In a ForEach, you really shouldn’t have any branching, or you‘ll pay quite catastrophic performance penalties. I found out the hard way when rendering a massive chart with Swift Charts. All branching must be pushed upwards.
ComputerGuru · 21m ago
Why? Does it interpret the code?
dcre · 2h ago
I took a version of this away from Sandi Metz’s 99 Bottles of OOP. It’s not really my style overall, but the point about moving logic forks up the call stack was very well taken when I was working on a codebase where we had added a ton of flags that got passed down through many layers.

https://sandimetz.com/99bottles

daxfohl · 2h ago
Yeah, I immediately thought of "The Wrong Abstraction" by the same author. Putting the branch inside the for loop is an abstraction, saying "the for loop is the rule, and the branch is the behavior". But very often, some new requirement will break that abstraction, so you have to work around it, and the resulting code has an abstraction that only applies in some cases and doesn't in others, or you force a bunch of extra parameters into the abstraction so that it applies everywhere but is hard to follow. Whereas if you hadn't made the abstraction in the first place, the resulting code can be easier to modify and understand.

https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction

CodesInChaos · 1h ago
I can recommend this article about the "midlayer mistake" https://lwn.net/Articles/336262/
drob518 · 29m ago
Nice reference.
neRok · 1h ago
I agree that the first example in the article is "bad"...

  fn frobnicate(walrus: Option<Walrus>)`)
but the rest makes no sense to me!

  // GOOD
  frobnicate_batch(walruses)
  // BAD
  for walrus in walruses {
    frobnicate(walrus)
  }
It doesn't follow through with the "GOOD" example though...

  fn frobnicate_batch(walruses)
    for walrus in walruses { frobnicate(walrus) }
  }
What did that achieve?

And the next example...

  // GOOD
  if condition {
    for walrus in walruses { walrus.frobnicate() }
  } else {
    for walrus in walruses { walrus.transmogrify() }
  }
  // BAD
  for walrus in walruses {
    if condition { walrus.frobnicate() }
    else { walrus.transmogrify() }
  }
What good is that when...

  walruses = get_5_closest_walruses()
  // "GOOD"
    if walruses.has_hungry() { feed_them_all() }
    else { dont_feed_any() }
  // "BAD"
    for walrus in walruses {
       if walrus.is_hungry() { feed() }
       else { dont_feed() }
magicalhippo · 53m ago
> What did that achieve?

An interface where the implementation can later be changed to do something more clever.

At work we have a lot of legacy code written the BAD way, ie the caller loops, which means we have to change dozens of call sites if we want to improve performance, rather than just one implementation.

This makes it significantly more difficult than it could have been.

jerf · 1h ago
I think the "push for loops down" is missing a bit of detail about the why. The author alludes to "superior performance" but I don't think makes it clear how that can happen.

Vectorization is a bit obscure and a lot of coders aren't worried about whether their code vectorizes, but there's a much more common example that I have seen shred the performance of a lot of real-world code bases and HTTP APIs, which is functions (including APIs) that take only a single thing when they should take the full list.

Suppose we have posts in a database, like for a forum or something. Consider the difference between:

    posts = {}
    for id in postIDs:
        post[id] = fetchPost(id)
versus

    posts = fetchPosts(postIDs)
fetchPost and fetchPosts both involve hitting the database. The singular version means that the resulting SQL will, by necessity, only have the one ID in it, and as a result, a full query will be made per post. This is a problem because it's pretty likely here that fetching a post is a very fast (indexed) operation, so the per-query overhead is going to hit you hard.

The plural "fetchPosts", on the other hand, has all the information necessary to query the DB in one shot for all the posts, which is going to be much faster. An architecture based on fetching one post at a time is intrinsically less performant in this case.

This opens up even more in the HTTP API world, where a single query is generally of even higher overhead than a DB query. I think the most frequent mistake I see in HTTP API design (at least, ignoring quibbling about which method and error code scheme to use) is providing APIs that operate on one thing at a time when the problem domain naturally lends itself to operating on arrays (or map/objects/dicts) at a time. It's probably a non-trivial part of the reason why so many web sites and apps are so much slower than they need to be.

I find it is often easy to surprise other devs with how fast your system works. This is one of my "secrets" (please steal it!); you make sure you avoid as many "per-thing" penalties as possible by keeping sets of things together as long as possible. The "per-thing" penalties can really sneak up on you. Like nested for loops, they can easily start stacking up on you if you're not careful, as the inability to fetch all the posts at once further cascades in to you then, say, fetching user avatars one-by-one in some other loop, and then a series of other individual queries. Best part is, profiling may make it look like the problem is the DB because "the DB is taking a long time to serve this" because profiles are not always that good at turning up that your problem is per-item overhead rather than the amount of real work being done.

mnahkies · 26m ago
The worst / most amusing example of this I've seen in the wild was a third party line of business application that was sequentially triaging "pending tasks" to assign priority/to workers.

Our cloud provider had an aircon/overheating incident in the region we were using, and after it was resolved network latency between the database and application increased by a few milliseconds. Turns out if you multiply that by a few million/fast arrival rate you get a significant amount of time, and the pending tasks queue backs up causing the high priority tasks to be delayed.

Based on the traces we had it looked like a classic case of "ORM made it easy to do it this way, and it works fine until it doesn't" but was unfortunately out of our control being a third party product.

If they'd fetched/processed batches of tasks from the database instead I'm confident it wouldn't have been an issue.

jmull · 2h ago
I really don't think there is any general rule of thumb here.

You've really got to have certain contexts before thinking you ought to be pushing ifs up.

I mean generally, you should consider pushing an if up. But you should also consider pushing it down, and leaving it where it is. That is, you're thinking about whether you have a good structure for your code as you write it... aka programming.

I suppose you might say, push common/general/high-level things up, and push implementation details and low-level details down. It seems almost too obvious to say, but I guess it doesn't hurt to back up a little once in a while and think more broadly about your general approach. I guess the author is feeling that ifs are usually about a higher-level concern and loops about a lower-level concern? Maybe that's true? I just don't think it matters, though, because why wouldn't you think about any given if in terms of whether it specifically ought to move up or down?

Tade0 · 13m ago
I also don't think there's a general rule.

I use `if`s a markers for special/edge cases and typically return in the last statement in the `if` block.

If I have an `else` block and it's large, then it's a clear indicator that it's actually two methods dressed as one.

Kuyawa · 2h ago
Push everything down for better code readability

  printInvoice(invoice, options) // is much better than

  if(printerReady){
    if(printerHasInk){
      if(printerHasPaper){
        if(invoiceFormatIsPortrait){
  :
The same can be said of loops

  printInvoices(invoices) // much better than

  for(invoice of invoices){
    printInvoice(invoice)
  }
At the end, while code readability is extremely important, encapsulation is much more important, so mix both accordingly.
nfw2 · 23m ago
The performance gap of running a for loop inside or outside a function call is negligible in most real usage.

The premise that you can define best patterns like this, removed from context with toy words like frobnicate, is flawed. You should abstract your code in such a way that the operations contained are clearly intuited by the names and parameters of the abstraction boundaries. Managing cognitive load >>> nickle and dime-ing performance in most cases.

imcritic · 2h ago
This article doesn't explain the benefits of the suggested approach well enough.

And the last example looks like a poor advice and contradicts previous advice: there's rarely a global condition that is enough to check once at the top: the condition usually is inside the walrus. And why do for walrus in pack {walrus.throbnicate()} instead of making throbnicate a function accepting the whole pack?

No comments yet

daxfohl · 2h ago
I like this a lot. At first, putting ifs inside the fors makes things more concise. But it seems like there's always an edge case or requirement change that eventually requires an if outside the for too. Now you've got ifs on both sides of the for, and you've got to look in multiple places to see what's happening. Or worse, subsequent changes will require updating both places.

So yeah, I agree, pulling conditions up can often be better for long-term maintenance, even if initially it seems like it creates redundancy.

hinkley · 2h ago
There’s a lot of variable hoisting involved in loving conditional logic out of for loops and it generally tends to improve legibility. If a variable is loop invariant it makes debugging easier if you can prove it is and hoist it.
variadix · 1h ago
This is a great way of putting it.

The more things you can prove are invariant, the easier it is to reason about a piece of code, and doing the hoisting in the code itself rather than expecting the compiler to do it will make future human analysis easier when it needs to be updated or debugged.

throwawaymaths · 39m ago
i would agree with the push ifs up except if youre doing options parsing. having a clean line of flow that effectively updates a struct with a bunch of "maybe" functions is much better if youre consistent with it.

anywhere else, push ifs up.

esafak · 2h ago
I agree, except for this example, where the author effectively (after a substitution) prefers the former:

    fn f() -> E {
      if condition {
        E::Foo(x)
      } else {
        E::Bar(y)
      }
    }
    
    fn g(e: E) {
      match e {
        E::Foo(x) => foo(x),
        E::Bar(y) => bar(y)
      }
    }
The latter is not only more readable, but it is safer, because a match statement can ensure all possibilities are covered.
CraigJPerry · 1h ago
> Prefers the former after a substitution...

That's not quite right, it's a substitution AND ablation of 2 functions and an enum from the code base.

There's quite a reduction in complexity he's advocating for.

Further, the enum and the additional boilerplate is not adding type safety in this example. Presumably the parameters to foo and bar are enforced in all cases so the only difference between the two examples is the additional boilerplate of a 2-armed enum.

I strongly suspect in this case (but i haven't godbolted it to be sure) that both examples compile to the same machine code. If my hunch is correct, then the remaining question is, does introduction of double-entry book keeping on the if condition add safety for future changes?

Maybe. But at what cost? This is one of those scenarios where you bank the easy win of reduced complexity.

josephg · 1h ago
> The latter is not only more readable, but it is safer, because a match statement can ensure all possibilities are covered.

Whether or not this matters depends on what, exactly, is in those match arms. Sometimes there's some symmetry to the arms of an if statement. And in that case, being exhaustive is important. But there's plenty of times where I really just have a bit of bookkeeping to do, or an early return or something. And I only want to do it in certain cases. Eg if condition { break; } else { stuff(); }

Also, if-else is exhaustive already. Its still exhaustive even if you add more "else if" clauses, like if {} else if {} else {}.

Match makes sense when the arms of the conditional are more symmetrical. Or when you're dealing with an enum. Or when you want to avoid repeating conditions. (Eg match a.cmp(b) { Greater / Equal / Less } ).

The best way to structure your code in general really comes down to what you're trying to do. Sometimes if statements are cleaner. Sometimes match expressions. It just depends on the situation.

esafak · 1h ago
It's only exhaustive in this toy case. Add another one and the burden of checking for exhaustiveness with ifs falls on your shoulders.
renewiltord · 16m ago
Yep, as a general heuristic pretty good. It avoids problems like n+1 queries and not using SIMD. And the if thing often makes it easier to reason about code. There are exceptions but I have had this same rule and it’s served me well.
stevage · 2h ago
The author's main concern seems to be optimising performance critical code.
drob518 · 24m ago
Hmmm. Seems like he’s optimizing clarity of thought, first. The performance gain just comes along for the ride. If I were to summarize the article, I’d say that it’s advocating a pattern where you write code where higher layers decide what needs to be done and then lower layers do it, using a combination of straight line code and simple loops, with little to no further conditionality. Obviously, that represents an ideal.
greesil · 1h ago
That's not clear to me. It first reads like "don't branch in a for loop (because parallelization?)" but I think it's more for keeping the code from becoming a mess over time with multiple developers.
hinkley · 2h ago
If your compiler is having trouble juggling a number of variables just think how much difficulty the human brain will have on the same code.
hamandcheese · 1h ago
Isn't this just inversion of control? AKA the I in SOLID?
hello_computer · 28m ago
This thread is a microcosm of Babel.
quantadev · 54m ago
There's always a trade-off between performance v.s. clearness in the code.

If a certain function has many preconditions it needs to check, before running, but needs to potentially run from various places in the code, then moving the precondition checks outside the method results in faster code but destroys readability.

In cases where this kind of tension exists I've sometimes named methods like 'maybeDoThing' (emphasis on 'maybe') indicating I'm calling the method, but that all the precondition checks are 'inside' the function rather than duplicate logic all across the code, everywhere the method 'maybe' needs to run.

Mystery-Machine · 2h ago
Terrible advice. It's the exact opposite of "Tell, don't ask".

Performance of an if-statement and for-loop are negligent. That's not the bottleneck of your app. If you're building something that needs to be highly performant, sure. But that's not the majority.

https://martinfowler.com/bliki/TellDontAsk.html

eapriv · 1h ago
Performance of any given CPU instruction is negligible, yet somehow they accumulate to noticeable values.
drob518 · 22m ago
Amen.
achernik · 1h ago
consider his another post in somewhat similar spirit: https://tigerbeetle.com/blog/2024-12-19-enum-of-arrays/ the author is indeed working in a performance-oriented niche
hello_computer · 24m ago
He’s giving object oriented design the medal for cache locality’s victory.