Everything I know about good system design

118 dondraper36 42 8/16/2025, 7:38:46 AM seangoedecke.com ↗

Comments (42)

bambax · 1h ago

> When querying the database, query the database. It’s almost always more efficient to get the database to do the work than to do it yourself. For instance, if you need data from multiple tables, JOIN them instead of making separate queries and stitching them together in-memory.

Oh yes! Never do a join in the application code! But also: use views! (and stored procedures if you can). A view is an abstraction about the underlying data, it's functional by nature, unlikely to break for random reasons in the future, and if done well the underlying SQL code is surprisingly readable and easy to reason about.

quietbritishjim · 1h ago

I think it's ok to have this rule as a first approximation, but like all design rules you should understand it well enough to know when to break it.

I worked on an application which joined across lots of tables, which made a few dozen records balloon to many thousands of result rows, with huge redundancy in the results. Think of something like a single conceptual result having details A, B, C from one table, X, Y from another table, and 1, 2, 3 from another table. Instead of having 8 result rows (or 9 if you include the top level one from the main table) you have 18 (AX1, AX2, AX3, AY1, ...). It gets exponentially worse with more tables.

We moved to separate queries for the different tables. Importantly, we were able to filter them all on the same condition, so we were not making multiple queries to child tables when there were lots of top-level results.

The result was much faster because the extra network overhead was overshadowed by the saving in query processing and quantity of data returned. And the application code was actually simpler, because it was a pain to pick out unique child results from the big JOIN. It was literally a win in every respect with no downsides.

(Later, we just stuffed all the data into a single JSONB in a single table, which was even better. But even that is an example of breaking the old normalisation rule.)

wongarsu · 1m ago

That reminds me of many cases of adhering to database normalisation rules even in views and queries, even in a case where you should break it. Aggregation functions like postgres's array_agg and jsonb_agg are incredibly powerful at preventing the number of rows from ballooning in situations like those

9rx · 1h ago

> which made a few dozen records balloon to many thousands of result rows

That doesn't really sound like a place where data is actually conceptually joined. I expect, as it is something commonly attempted, that you were abusing joins to try and work around the n+1 problem. As a corollary to the above, you also shouldn't de-join in application code.

bob1029 · 1h ago

This is a big part of what makes ORMs a problem.

Writing raw SQL views/queries per MVC view in SSR arrangements is one of the most elegant and performant ways to build complex web products. Let the RDBMS do the heavy lifting with the data. There are optimizations in play you can't even recall (because there's so many) if you're using something old and enterprisey like MSSQL or Oracle. The web server should be able to directly interpolate sql result sets into corresponding <table>s, etc. without having to round trip for each row or perform additional in memory join operations.

The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

hk1337 · 3m ago

Even in the article the solution wasn’t to abandon the ORM in favor of raw SQL but knowing how to write the code so it doesn’t have to run 100 extra queries when it doesn’t need to.

> Particularly if you’re using an ORM, beware accidentally making queries in an inner loop. That’s an easy way to turn a select id, name from table to a select id from table and a hundred select name from table where id = ?.

tossandthrow · 33m ago

Have you ever build a complex app like this?

In particular, have you have to do testing, security (eg. row level security), manage migrations, change management (eg. for SOC2 or other security frameworks), cache offloads (Redis, and friends), support for microservices, etc.

Comments like this give me a vibe of young developers trying out Supabase for the first time feeling like that approach can scale indefinitely.

tossandthrow · 1h ago

Views make good sense when you can check them in - and DB migrations are a poor way of doing it due to their immutable nature.

Depending on the ecosystem the code base adopts a good orm might be a better choice to do joins.

CafeRacer · 1h ago

I came here to say an exactly opposite things. There were a few instances where a relatively heavy join would not perform well, no matter what I tried. And it was faster to load/stitch data together with goroutines. So I just opted to doing it that way.

Also SQL is easy, but figuring out what's up with indexes and planner is not.

nurettin · 39m ago

for me, it is orm -> schema bound views -> views -> table functions -> stored procedure as a last resort (hopefully it doesn't come to that)

ZYbCRq22HbJ2y7 · 1h ago

> You’re supposed to store timestamps instead, and treat the presence of a timestamp as true. I do this sometimes but not always - in my view there’s some value in keeping a database schema immediately-readable.

Seems overly negative of broad advice on a good pattern?

    is_on => true
    on_at => 1023030

Sure, that makes sense.

     is_a_bear => true
     a_bear_at => 12312231231

Not so much, as most bears do not become bears at some point after not being a bear.

grey-area · 59m ago

I’d see the booleans as a bad thing in almost all cases, instead of a boolean you can have a timestamp or an integer field (which can expand later).

In the is_a case almost always a type or kind is better as you’ll rarely just have bears even if you only start with bears, just as you rarely have just two states for a status field (say on or off), often these expand in use to include things like suspended, deleted and asleep.

So generally I’d avoid booleans as they tend to multiply and increase complexity partially when they cover mutually exclusive states like live, deleted and suspended. I have seen is_visible, is_deleted and is_suspended all on the same table (without a status) and the resulting code and queries are not pretty.

I’d use an integer rather than a timestamp to replace them though.

ZYbCRq22HbJ2y7 · 40m ago

Yeah, I mean, an integer can definitely hold more data than a boolean.

If your data was simple enough, you could have an integer hold the entire meaning of a table's row, if every client understood how it was interpreted. You could do bitwise manipulations, encodings, and so on.

Sometimes it is nice to understand what the data means in the schema alone. You can do that with enums, etc.

  ate_an_apple_in_may_2024
  saw_an_eclipse_before_30

These are more of the sort of things I don't see needing enums, timestamps, integers...

setr · 1h ago

If you take the statement at face value — essentially storing booleans in the db ever is a bad smell - then he’s correct.

Although I’m not even sure it’s broadly a good principle, even in the on_at case; if you actually care about this kind of thing, you should be storing it properly in some kind of audit table. Switching bool to timestamp is more of a weird lazy hack that probably won’t be all that useful in practice because only a random subset of data is being tracked like that (Boolean data type definitely isn’t the deciding factor on whether it’s important enough to track update time on).

The main reason it’s even suggested is probably just that it’s “free” — you can smuggle the timestamp into your bool without an extra column — and it probably saved some effort accidentally; but not because it’s a broadly complete solution to the set of problems it tries to solve for

I’ve got the same suspicion with soft-deletes — I’m fairly positive it’s useless in practice, and is just a mentally lazy solution to avoid proper auditing. Like you definitely can’t just undelete it, and it doesn’t solve for update history, so all you’re really protecting against is accidental bulk delete caught immediately? Which is half the point of your backup

maxbond · 1h ago

Audit tables are a big ask both in terms of programming effort to design and support them, and in terms of performance hit due to write amplification (all inserts and updates cause an additional write to an audit table). Whereas making a bool into a timestamp is free. Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window.

moebrowne · 1h ago

It's well documented that soft delete is more of a headache than it's worth

https://brandur.org/soft-deletion

spiddy · 1h ago

though why treat booleans as special case and keep timestamps for them when you don’t for integers with this pattern:

isDarkTheme: {timestamped} paginationItems: 50

I can see when dark theme was activated but not when pagination was set to 50.

also, i can’t see when dark theme is being deactivated either.

seems like a poor-man changelog. there maybe use cases for it but i can’t think of anything tbh.

seafoamteal · 1h ago

I think in that situation, you could have an enum value that contains Bear and whatever other categories you are looking at.

ZYbCRq22HbJ2y7 · 1h ago

Sure, but this was for demonstration purposes showing that some data has other meaning that doesn't have an instantiation state dependent on time.

Lionga · 59m ago

All this general advice is quite useless and needs millions of asterix.

Good system design is designing a system that works best for the problem at hand.

FrankChalmers · 4m ago

That's even more general and requires another million asterisks.

KronisLV · 1h ago

> Schema design should be flexible, because once you have thousands or millions of records, it can be an enormous pain to change the schema. However, if you make it too flexible (e.g. by sticking everything in a “value” JSON column, or using “keys” and “values” tables to track arbitrary data) you load a ton of complexity into the application code (and likely buy some very awkward performance constraints). Drawing the line here is a judgment call and depends on specifics, but in general I aim to have my tables be human-readable: you should be able to go through the database schema and get a rough idea of what the application is storing and why.

I’m surprised that the drawbacks of EAV or just using JSON in your relational database don’t get called out more.

I’d very much rather have like 20 tables with clear purpose than seeing that colleagues have once more created a “classifier” mechanism and are using polymorphic links (without actual foreign keys, columns like “section” and “entity_id”) and are treating it as a grab bag of stuff. One that you also need to read the application code a bunch to even hope to understand.

Whenever I see that, I want to change careers. I get that EAV has its use cases, but in most other cases fuck EAV.

It’s right up there with N+1 issues, complex dynamically generated SQL when views would suffice and also storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic. Oh and also shared database instances and not having the ability to easily bootstrap your own, oh and also working with Oracle in general. And also putting things that’d be better off in the app inside of the DB and vice versa.

There are so many ways to decrease your quality of life when it comes to storing and accessing data.

dondraper36 · 58m ago

There's a great book SQL Antipatterns, by Bill Karwin where this specific antipattern is discussed and criticized.

That said, sometimes when I realize there's no way for me to come up even with a rough schema (say, some settings object that is returned to the frontend), I use JSONB columns in Postgres. As a rule of thumb, however, if something can be normalized, it should be, since, after all, that's still a relational database despite all the JSON(B) conveniences and optimizations in Postgres.

pelagicAustral · 20m ago

I can definitely feel the "underwhelming" factor. I've been working for +10 years on government software and I really know what an underwhelming codebase looks like, first off, it has my fucking name on it.

tetha · 1h ago

The distinction of stateful and stateless is one of the main criteria how we're dividing responsibilities between platform-infra and development.

I know it's a bit untrue, but you can't do that many things wrong with a stateless application running in a container. And often the answer is "kill it and deploy it again". As long as you don't shred your dataset with a bad migration or some bad database code, most bad things at this level can be fixed in a few minutes with a few redeployments.

I'm fine having a larger amount of people with a varying degree of experience, time for this, care and diligence working here.

With a persistence like a database or a file store, you need some degree of experience of what you have to do around the system so it doesn't become a business risk. Put plainly, a database could be a massive business risk even if it is working perfectly... because no one set backups up.

That's why our storages are run by dedicated people who have been doing this for years and years. A bad database loss easily sinks ships.

com · 2h ago

The advice about logging and metrics was good.

I had been nodding away about state and push/pull, but this section grabbed my attention, since I’ve never seen it do clearly articulated before.

dondraper36 · 2h ago

The logging part is spot on. It has happened so many times when I thought, "Oh, I wish I had logged this.", and then you face an issue or even an incident and introduce these logs anyways.

bravesoul2 · 2h ago

It is a balance. Too many logs cost money and slow down log searches both for the search and the human seeing 100 things on the same trace.

jillesvangurp · 1h ago

The trick here is to log aggressively and then filter aggressively. Logs only get costly if you keep them endlessly. Receiving them isn't that expensive. And keeping them for a short while won't break the bank either. But having logs pile up by the tens of GB every day gets costly pretty quickly. Having aggressive filtering means you don't have that problem. And when you need the logs, temporarily changing the filters is a lot easier than adding a lot of ad hoc logging back into the system and deploying that.

Same with metrics. Mostly they don't matter. But when they do, it's nice if it's there.

Basically, logging is the easy and cheap part of observability, it's the ability to filter and search that makes it useful. A lot of systems get that wrong.

bravesoul2 · 1h ago

Nice. I'm going to read up more about filtering.

dondraper36 · 2h ago

Yeah, absolutely. But the author's idea of logging all major business logic decisions (that users might question later) sounds reasonable.

bravesoul2 · 1h ago

Yes. I like the idea of assertions too. Log when an assertion fails. Then get notified to investigate.

bravesoul2 · 2h ago

Yes. Everyone should spent the small amount of time getting some logging/metrics going. It's like tests, getting from 0-1 test is psychologically hard in a org but 1-1000 then becomes "how did I live without this". Grafana has a decent free tier or you can self host.

dennisy · 51m ago

This post has some good concepts, but I do not feel it helps you design good systems. It iterates options and primitives, but good design is when and how you apply them, which the post does not provide.

dondraper36 · 26m ago

But isn't that type of advice the best we can have? Having read Designing Data Intensive Applications (DDIA) and some system design interview-focused books (like those from Alex Xu), I have noticed two types of resources:

* Fundamental books/courses on distributed systems that will help you understand the internals of most distributed systems and algorithms (DDIA is here, even though it's not even the most theoretical treatment)

* Hand-wavy cookbooks that tend to oversimplify things, and (I am intentionally exaggerating here) teach to reason like "I have assumed a billion users, let's use Cassandra"

I liked the article for its focus on real systems and the sensible rules of thumb instead of another reformulation of the gossip protocol that very few engineers will ever need to apply in practice themselves.

bravesoul2 · 2h ago

He doesnt seem to mention Conway or team topology which is an important part of system design too.

dondraper36 · 2h ago

Well, as sad as it is, such advice is often applicable to new projects when you still have runway for your own decisions.

For mostly political reasons, if you are onboarded to a team with a billion microservices and a lot of fanciness, it's unlikely that you will ever get approval or time to introduce simplicity. Or maybe I just got corrupted myself by the reality where I have to work now.

bravesoul2 · 1h ago

There is definitely a wood for the trees issue at bigger companies. I doubt there is an architect who understands the full system to see how to simplify it. Hard to even know what "simpler" looks like.

jillesvangurp · 1h ago

You should adapt your team to the architecture, not the other way around.

My former Ph.D. supervisor who moonlights as a consultant on this topic uses a nice acronym to capture this: BAPO. Business, Architecture, Process, and Organization. The idea is to end up with optimal business, an optimal architecture & design for that business, the minimum of manual processes that are necessitated by that architecture, and an organization that is efficiently executing those processes. So, you should design and engineer in that order.

Most companies do this in reverse and then end up limiting their business with an architecture that matches whatever processes that their org chart necessitated years ago in a way that doesn't makes any logical sense whatsoever except in the historical context of the org chart. If you come in as a consultant to fix such a situation, it helps understanding that whatever you are going to find is probably wrong because of this reason. I've been in the situation where I come in to fix a technical issue and immediately see that the only reason the problem exists is the org chart is bullshit. That can be a bit awkward but lucrative if you deal with it correctly. It helps asking the right questions before you get started.

Turning that around means you start from the business end (where's the money coming from?, what value can we create?, etc.), finding a solution that delivers that and then figure out processes and organizational needs. Many companies start out fairly optimal and then stuff around them changes and they forget to adapt to that.

Having micro services because you have a certain team structure is a classic mistake here. You just codified your organizational inefficiency. Before you even delivered any business value. And now your organizational latency has network latency to match that. Probably for no good reason other than that team A can't be trusted to work with team B. And even if it's optimal now, is it going to stay optimal?

If you are going to break stuff into (micro) services, do so for valid business/technical reasons. E.g. processing close to your data is cheaper, caching for efficiency means stuff is faster and cheaper, physically locating chunks of your system close to the customer means less latency, etc. But introducing network latency just because team A can't work with team B, is fundamentally stupid. Why do you even have those teams? What are those people doing? Why?

ChrisMarshallNY · 28m ago

A lot of what I have done, is design subsystems —components, meant to be integrated into larger structures. I tend to take a modular approach (not microservices —modules).

The acronym I use is “S.Q.U.I.D”[0] (Simplicity, Quality, Unambiguity, Integrity, Documentation).

But most of the stuff I’ve done, is different from what he’s written about, so it’s probably not relevant, here.

[0] https://littlegreenviper.com/itcb-04/#more-4074

magnio · 1h ago

I think it's a very good article. Even if you disagree with some of the individual points in it, the advice given are very concrete, pragmatic, and IMO tunable to the specifics of each project.

On state, in my current project, it is not statefulness that causes trouble, but when you need to synchronize two stateful systems. Every time there's bidirectional information flow, it's gonna be a headache. The solution is of course to maintain a single source of truth, but with UI application this is sometimes quite tricky.

usernamed7 · 46m ago

One thing i would add, is that a well designed system is often one that is optimized for change. It is rare that a service remains static and unchanging; browsers and libraries are regularly updated, after all. Thus if/when a developer takes on a feature ticket to add or change XYZ, it should be easy to reason about and have predictable side-effects of how that change will impact the system, and ideally be easy to change as well.

Everything I know about good system design (seangoedecke.com)

PuTTY has a new website (putty.software)

The future of large files in Git is Git (tylercipriani.com)

AI is different (antirez.com)

Show HN: Edka – Kubernetes clusters on your own Hetzner account (edka.io)

Once Again, Oil States Thwart Agreement on Plastics (e360.yale.edu)

I accidentally became PureGym’s unofficial Apple Wallet developer (drobinin.com)

Occult books digitized and put online by Amsterdam’s Ritman Library (openculture.com)

Deep-Sea Desalination Pulls Fresh Water from the Depths (scientificamerican.com)

Dokploy is the sweet spot between PaaS and EC2 (nikodunk.com)

Do Things That Don't Scale (2013) (paulgraham.com)

OpenBSD is so fast, I had to modify the program slightly to measure itself (flak.tedunangst.com)

ADHD drug treatment and risk of negative events and outcomes (bmj.com)

The electric fence stopped working years ago (soonly.com)

Launch HN: Embedder (YC S25) – Claude code for embedded software

Model intelligence is no longer the constraint for automation (latentintent.substack.com)

Recto – A Truly 2D Language (masatohagiwara.net)

TextKit 2 – The Promised Land (blog.krzyzanowskim.com)

Show HN: Prime Number Grid Visualizer (enda.sh)

A privacy VPN you can verify (vp.net)

Document.write (vladimirslepnev.me)

Bullfrog in the Dungeon (filfre.net)

Best Practices for Building Agentic AI Systems (userjot.com)

ARM adds neural accelerators to GPUs (newsroom.arm.com)

Compiler Bug Causes Compiler Bug: How a 12-Year-Old G++ Bug Took Down Solidity (osec.io)

Vaultwarden commit introduces SSO using OpenID Connect (github.com)

Is air travel getting worse? (maximum-progress.com)

A mind–reading brain implant that comes with password protection (nature.com)

Porting Gigabyte MZ33-AR1 Server Board with AMD Turin CPU to Coreboot (blog.3mdeb.com)

Open hardware desktop 3D printing is dead? (josefprusa.com)

Tiny flyers levitate on the Sun's heat alone (nature.com)

Claude Opus 4 and 4.1 can now end a rare subset of conversations (anthropic.com)

When the CIA got away with building a heart attack gun (wisewolfmedia.substack.com)

Why Metaflow? (docs.metaflow.org)

Thai Air Force seals deal for Swedish Gripen jets (scmp.com)

Secret Messengers: Disseminating Sigint in the Second World War [pdf] (media.defense.gov)

EasyPost (YC S13) Is Hiring (easypost.com)

An interactive guide to sensor fusion with quaternions (quaternion.cafe)

Non-invasive vagus nerve stimulation and exercise capacity in healthy volunteers (academic.oup.com)

I let LLMs write an Elixir NIF in C; it mostly worked (overbring.com)

Simulating and Visualising the Central Limit Theorem (blog.foletta.net)

Prompting by Activation Maximization (joecooper.me)

The Unfashionable Art of Learning Things (medium.com)

Progress towards universal Copy/Paste shortcuts on Linux (mark.stosberg.com)

Imagen 4 is now generally available (developers.googleblog.com)

Starlink announced a $5/month plan that gives unlimited usage at 500kbits/s (twitter.com)

Bird signs and cycles, February, 2024 (subject.space)

The Timmy Trap (jenson.org)

It seems like the AI crawlers learned how to solve the Anubis challenges (social.anoxinon.de)

'Constantine Cavafy' Review: A Poet's Odyssey Within (wsj.com)

Everything I know about good system design

Comments (42)