Good system design

581 dondraper36 273 8/16/2025, 7:38:46 AM seangoedecke.com ↗

Comments (273)

alixanderwang · 5h ago

> I’m often alone on this. Engineers look at complex systems with many interesting parts and think “wow, a lot of system design is happening here!” In fact, a complex system usually reflects an absence of good design.

For any job-hunters, it's important you forget this during interviews.

In the past I've made the mistake of trying to convey this in system design interviews.

Some hypothetical startup app

> Interviewer: "Well what about backpressure?"

>"That's not really worth considering for this amount of QPS"

> Interviewer: "Why wouldn't you use a queue here instead of a cron job?"

> "I don't think it's necessary for what this app is, but here's the tradeoffs."

> Interviewer: "How would you choose between sql and nosql db?"

> "Doesn't matter much. Whatever the team has most expertise in"

These are not the answers they're looking for. You want to fill the whiteboard with boxes and arrows until it looks like you've got Kubernetes managing your Kubernetes.

mkozlows · 1h ago

(For context, I've conducted hundreds of system design interviews and trained a dozen other people on how to do them at my company. Other interviewers may do things differently or care about other things, but I think what I'm saying here isn't too far off normal.)

I think three things about what you're saying:

1. The answers you're giving don't provide a lot of signal (the queue one being the exception). The question that's implicitly being asked is not just what you would choose, but why you would choose it. What factors would drive you to a particular decision? What are you thinking about when you provide an answer? You're not really verbalizing your considerations here.

A good interviewer will pry at you to get the signal they need to make a decision. So if you say that back-pressure isn't worth worrying about here, they'll ask you when it would be, and what you'd do in that situation. But not all interviewers are good interviewers, and sometimes they'll just say "I wasn't able to get much information out of the candidate" and the absence of a yes is a no. As an interviewee, you want to make the interviewer's job easy, not hard.

2. Even if the interviewer is good and does pry the information out of you, they're probably going to write down something like "the candidate was able to explain sensibly why they'd choose a particular technology, but it took a lot of prodding and prying to get the information out of them -- communications are a negative." As an interviewee, you want to communicate all the information your interviewer is looking for proactively, not grudgingly and reluctantly. (This is also true when you're not interviewing.)

3. I pretty much just disagree on that SQL/NoSQL answer. Team expertise is one factor, but those technologies have significant differences; depending on what you need to do, one of them might be way better than the other for a particular scenario. Your answer there is just going to get dinged for indicating that you don't have experience in enough scenarios to recognize this.

_fat_santa · 3h ago

This goes back to "interviews go both ways". All those answers you gave are very reasonable and if I was your interviewer I'd pass you with flying colors. On the other hand if you're interviewing at a place that doesn't pass you with flying colors for those responses, that really says more about them than it does about you and may not be a great place to work.

But to your point, many times one interviews for a job they don't really have the luxury of getting rejections and need to land somewhere fast so they can keep paying the mortgage. So while yes interviewing is a two way street, there's still quite a bit of calibration to make sure you land on the other person's side of the street so to speak.

atomicnumber3 · 3h ago

If I was your interviewer, I would: respect your answers a lot, not be able to check off anything on my rubric, try to explain this in the debrief, get told we have to stick to the rubric to counter bias, and then watch while they pass on you for someone who decided to play architecture jenga instead. I would potentially even consider emailing you to apologize later, then not do it because I'd probably get in trouble for exposing us to liability or something because apologizing can be construed as admission of guilt.

yojo · 1h ago

If a candidate doesn’t ask clarifying questions that lead them to an understanding of QPS, storage requirements, and throughput considerations, that’s a mark against.

At that point, if you want to see them design a distributed system with all the bells and whistles, you should stop them, tell them the kind of traffic they need to handle, then let them go again.

If they persist in designing a system that cannot handle the specified load, they have probably failed the interview.

willio58 · 2h ago

I’ve interviewed dozens of people and while I rarely do system design questions and our process isn’t nearly as check-all-the-boxes, it’s funny how accurate your comment still is. Near the later stages especially, politics starts coming in.

belinder · 2h ago

Exactly, it would only work if you have enough sway with your boss and the willingness to take responsibility for the hire

nostrademons · 2h ago

If I were the interviewer, I'd try to adjust the problem statement with some hypotheticals to tease out their depth of knowledge:

> "That's not really worth considering for this amount of QPS"

"What if Michael Jackson dies and your (search|news|celebrity gossip) service gets a spike in traffic way beyond the design parameters? How would you anticipate and mitigate such an event?"

(Extra points if the answer is not necessarily backpressure but they start talking about DDoS mitigation, outlier detection, caching or serving static results from extremely-common queries, spinning up new capacity to adjust to traffic spikes, blackholing traffic to protect the overall service, etc.)

> Interviewer: "Why wouldn't you use a queue here instead of a cron job?" "I don't think it's necessary for what this app is, but here's the tradeoffs."

"What if you have a subset of customers that demand faster responses than a cron job can provide?"

(And then that can become a discussion about splitting off traffic based on requirements, whether it's even worth adding the logic to split traffic vs. just using a queue for everyone, perhaps making direct API requests without either a queue or cron job for requests from just those customers, relying on the fact that they are not numerous or these requests are infrequent to trade capacity for latency, etc.)

> How would you choose between sql and nosql db?"

I would've expected the candidate to at least be able to talk about indexing, tradeoffs of joining in the DB vs. in the application, schema migrations and upgrades, creating separation between data-at-rest vs. data-in-flight, etc. If they can't do that and just handwave away as "whatever the team is most comfortable with", that's a legit hole in their knowledge. Usually you ask system design interviews of senior candidates that will be deciding on architecture and, if not hiring out the team directly, providing input to senior managers who will be hiring, so you can swap out the team nearly as easily as swapping out the architecture.

tacitusarc · 1h ago

Exactly this. I don’t want someone who will design complex, bloated systems, but I DO want them to be able to articulate tradeoffs and reasons why various components might be useful.

corytheboyd · 3h ago

As well as the “two-way street” point made in a sibling comment, I feel like a good interviewer would say “this is great, I would keep it simple too, but I am testing your knowledge of $thing right now.” If the person won’t stop talking about the wrong thing, that’s a bad sign of course.

uberduper · 2h ago

This is awful advice. Simple and elegant design does not start with dismissing potential problems.

Those questions are all prompts to have a discussion in lieu of tech trivia hour. Those responses do not demonstrate wisdom, they reveal a lack of maturity. It's not the interviewers fault you refuse to be interviewed.

reactordev · 22m ago

Louder for the back.

It’s like people crave complexity because it makes them, indispensable? Like if you’re the only one who knows how the billing reconciliation service works, they couldn’t possibly fire you?

They will.

Being pragmatic is something I look for in engineers. So long as they understand where to draw the line (and use a queue instead of cron). However that’s usually several years away at this point and them being able to say “You don’t need that, all you need is…” is welcome. Then again, that’s probably why I got fired. :shrug:

ramraj07 · 4h ago

Do you _want_ to work in these places? In my experience, if they expect you to run kube using kube in the interview, thats exactly what they do in their ststems as well.

UK-AL · 4h ago

These are the places that actually pay well.

dondraper36 · 3h ago

There's another reason for that. Deep in my heart, I would love to be part of a team that works on truly data-intensive applications (as Martin Kleppmann would call them) where all the complexity is justified.

For example, I am more of the "All you need is Postgres" kind of software engineer. But reading all those fancy blog posts on how some team at Discord works with 1 trillion messages with Cassandra and ScyllaDB makes me envious.

Also, it seems that to be hired by such employers you need to prove that you already have such experience, which is a bit of a catch-22 situation.

stavros · 2h ago

I feel like the phrase "all you need is Postgres" has the (often unspoken) continuation of "until you actually get to a trillion messages".

In other words, the developers you're envious of didn't start with Cassandra and ScyllaDB, they started with the problem of too many messages. That's not an architectural choice, that's product success.

dondraper36 · 2h ago

Absolutely. To put it differently, unfortunately not everyone has a chance to be part of a product's organic evolution from "all we need is Postgres" to "holy crap, we're a success, what is Cassandra by the way?"

DanielHB · 2h ago

Only places that are making good money can afford to have overengineering.

Overengineering is more prevalent the more money a company makes and companies who overengineers will pay good money to keep the overengineering working.

no_wizard · 32m ago

Something about my old CTO and VP of Eng I respected is they were still technical enough to call out this kind of thing. For as big as that company was they really held down complexity and overengineering to a real minimum.

Unfortunately the rest of the executive has leaned on them so hard about AI boosting productivity they aren’t able to avoid thst becoming a mess

didibus · 2h ago

You're equating simplicity of the design with simplicity of the problem.

It's good not to over engineer, over engineering can be a cause of unneeded complexity, but when complexity is warranted the ability to solve for it simply is also needed.

More importantly though, you haven't explained or rationalized why?

It's not needed for this QPS? Oh ya? Why not? What's your magic threshold? When would it be needed? How do you plan for the team to know that time is approaching? If it's needed later how would you retrofit it? Is that going to be a simple addition? How do you know the max QPS won't be too high and that traffic won't be spiky? What if a surprise incident occurred that caused the system to overload, how would your design, without backpressure, handle that, how would you mitigate and recover?

In system design there's no real right answer, as an interviewer you're looking for the candidate to demonstrate their ability to identify the point of concerns, reason through the possibilities, explain their decisions and trade offs, and so on.

dondraper36 · 5h ago

Yes, and this is exactly why LinkedIn-driven development exists in the first place. Listing a million technologies looks much more impressive on paper to recruiters than describing how you managed to only use a modular monolith and a single Postgres instance to make everything work.

Swizec · 4h ago

> These are not the answers they're looking for.

These ARE the answers we are looking for. As the system design interview (I’ve done hundreds) I want you to start with these answers then we can layer on complexity if you’ve solved the problem and there’s time left to go into navel gazing mode.

Seeing the panic slowly build in mid-level engineers’ eyes as it dawns on them that not every problem can be solved by caching is pretty fun too. “Ok cool you’ve cached it there, now how do you fill the cache without running into the same performance issue?”

nlawalker · 3h ago

> I want you to start with these answers then we can layer on complexity if you’ve solved the problem and there’s time left to go into navel gazing mode.

Do you tell people this explicitly? If so, good on you; if not, please start! I think one of the biggest problems with interviews these days is misaligned expectations, particularly interviewees coming in assuming that what's desired is immediate evidence that they're so experienced in solving FAANG-scale problems that it's their default mode.

dondraper36 · 3h ago

I believe even at FAANG-like companies, only a lucky minority is involved at that level of scale. Most developers just use the available infrastructure and tools without working on the creation of S3 or BigTable.

dmurray · 17m ago

This famous blog post [0] suggests that the default behaviour at Google at least is for everything to deal with massive scale. Doesn't mean everyone is involved in creating massive-scale infrastructure like S3 or BigTable, but it does mean using that kind of infrastructure from the start

[0] https://www.lesswrong.com/posts/koGbEwgbfst2wCbzG/i-don-t-kn...

Swizec · 3h ago

> Do you tell people this explicitly?

Yes and no. I give them rough scale numbers to design for. Part of the interview is knowing why I’m telling you this.

no_wizard · 31m ago

Or asking to get there, I find that to also be acceptable

renewiltord · 1h ago

At the level where this matters, the skill to figure it out from context is important. You aren’t the guy converting spec to code. You’re the spec maker.

nlawalker · 32m ago

I agree, but I think my point is that the interview context and expectations can differ radically different from the role context, depending on the interviewer. If the expectation of the interviewer is that the interviewee should be asking questions to determine scale needs, then they should be explicit about that. For all the interviewee knows, you're going to ding them and ultimately fail them for asking too many questions and not exhibiting knowledge and experience.

Swizec · 9m ago

> For all the interviewee knows, you're going to ding them and ultimately fail them for asking too many questions and not exhibiting knowledge and experience.

I start the interview with “I am here in the role of PM and co-engineer so you can bounce ideas off of me and ask any questions”

Stakeholders won’t start their asks with “Please ask me questions to make sure you’re building the right thing”. Asking clarifying questions is a baseline expectation of the role

Aurornis · 3h ago

> I want you to start with these answers then we can layer on complexity if you’ve solved the problem and there’s time left to go into navel gazing mode

Exactly. Part of the interview is explaining when and why these techniques are necessary as part of demonstrating your understanding.

If the candidate gives non-answers like “I don’t think it matters because you’re a startup” or “I’d just use whatever database I’m comfortable with” that’s not demonstrating knowledge at all. That’s dismissing the question in a way that leaves the interviewer thinking you don’t have that knowledge, or you don’t take their problems seriously enough to put thought into them. There is a type of candidate who applies to startups because they think nothing matters and they can YOLO anything together for a few years before moving on to the next job, and those are just as bad as the super over-engineering candidates.

The interview is your chance to show you know the topics and when to apply them, not the time to argue that the startup shouldn’t care about such matters.

Swizec · 3h ago

> The interview is your chance to show you know the topics and when to apply them, not the time to argue that the startup shouldn’t care about such matters.

A good way to answer these, I think, is some version of ”We probably won’t run into these issues at the scale we’re talking about, but when we run into A, B, C problems, we can try X, Y, Z solutions.”

This shows that you’re making a conscious tradeoff and know when the more complex solutions apply. Extra points if you can explain specifically how you’ll put measures in place to know when A, B, C happened and how you would engineer the system such that adding X, Y, Z is easy.

Also it looks amazing if you’re aware that vertical scaling can buy you a lot of time for comparably little money these days. Servers get up to 128 CPUs with 64TB of RAM on one machine :)

dondraper36 · 3h ago

This also happens because plenty of candidates learn the buzzwords and patterns without understanding the trade-offs and nuances. With a competent enough interviewer, the shallowness of knowledge can be revealed immediately.

Aurornis · 3h ago

Identifying candidates who repeat buzzwords without understanding tradeoffs is easy. It’s part of the questioning process to understand the tradeoffs.

The problem with the comment above is that it’s not discussing tradeoffs at all. It’s just jumping to conclusions and dodging any discussion of tradeoffs.

If you answer questions like that, it’s impossible to tell if the candidate is being wise or if they’re simply BSing their way around the topic and pretending to be smart about it, because both types of candidates sound the same.

It’s easy to avoid this problem by answering questions as asked and mentioning tradeoffs. Trying to dismiss questions never works in your favor.

dondraper36 · 3h ago

Yes, I would probably phrase it like this. "Under the current load, I would go super simple and use X, which can work fine long enough until it doesn't. And then we can think about horizontal scaling and use Y and Z". Then proceed with a deeper discussion of Y and Z, probably.

After all, interviewing and understanding what your interviewer expects to hear is also a valuable skill (same as with your boss or client).

no_wizard · 29m ago

Even better would be to clarify under the current load and if reasonably expected future load is similar, I would use X for Y reasons.

Sometimes the “trick” is in todays load is not tomorrows

Aurornis · 3h ago

> > Interviewer: "Well what about backpressure?"

> > "That's not really worth considering for this amount of QPS"

There is a good way and a bad way to communicate this in interviews.

If an interviewer is asking about back pressure, they’re prompting you to demonstrate your knowledge of back pressure and how and when it would be applied. Treating it as an opening to debate the validity of the question feels like dodging the question or attempting to be contrarian. Explaining when and where you would choose to add back pressure would be good, but then you should go on to answer the question.

This question hits close to home for me because I was once working at a small startup that was dealing with a unique problem where back pressure really was the correct way to manage one of our problems, but we had a number of candidates do exactly what you did: Scoff at the idea that such a topic would be relevant at a startup.

If we’ve been dealing with a problem for months and a candidate comes in and confidently tells us that problem isn’t something we would experience and dismisses our question, that’s not a positive signal.

> > Interviewer: "How would you choose between sql and nosql db?"

> > "Doesn't matter much. Whatever the team has most expertise in"

This is basically a softball question. Again, if you provide a non-answer or try to dismiss the question it feels like you’re either dodging the topic or trying to be contrarian. It’s also a warning sign to the interviewer that you might gravitate toward what’s easy for you instead of right for the project.

This one also resonates with me because I spent years of my life making MongoDB do things that would have been trivial if earlier developers had used something like SQLite instead. The reason they chose MongoDB? Because the team was familiar with it. It was hell to be locked into years of legacy code built around the wrong tool for the job because some early employees thought it didn’t matter “because startup”

As an interviewer, let me give some advice: If an interviewer asks a question, you should answer the question. Anything that feels like changing the subject, dodging the question, or arguing the merits of the question feels like the candidate either doesn’t understand the topic or wants to waste time by debating the question.

It can be very valuable to explain when and why a topic would become necessary, right before you explain it. Instead of “this application has low QPS and therefore I will not answer your question” (not literally what you said, but how it comes across) you could instead explain how the need for back pressure could be avoided first by scaling servers appropriately and then go on to answer the question that was asked.

santiagobasulto · 3h ago

I’ve been in software for 20 years and it’s the first time I hear “back pressure”. Am I too old already?

no_wizard · 26m ago

Somewhere surprising but if you never dealt with scaling issues of a certain nature it may have never came up.

Though you might be familiar with other terms that effectively mean the same thing, like counter pressure

Aurornis · 3h ago

Backpressure occurs at many levels, even down to a single machine doing something. If you ever have a producer and a consumer interacting and the consumer can’t consume as fast as the producer can produce, you need some way to have the producer pause or slow down until the consumer catches up. That’s back pressure.

stefanfisk · 2h ago

Here’s a basic example https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/....

RaftPeople · 3h ago

> it’s the first time I hear “back pressure”. Am I too old already?

It's the opposite, as you get older you will feel this more and more.

bcrosby95 · 2h ago

Services, systems, and/or databases eventually provide back pressure when they fail or get overloaded. The idea is to design in back pressure to let the system degrade gracefully rather than fail chaotically.

marcosdumay · 3h ago

It's a sign that you didn't get into the "let's distribute every problem" rabbit hole. I don't think it correlates with age.

But the keep the concept in your mind in case you have to distribute some problem. It's a central one.

stavros · 2h ago

You've just never played Factorio.

santiagobasulto · 2h ago

I have never played Factorio nor knew about it. It seems to be a very good game, thanks for the recommendation!

stavros · 2h ago

Unfortunately, it's too good. At least you'll learn all about backpressure in the days you spend lost to the world!

torginus · 2h ago

Who does this? Why make something 10x as complicated as it needs to be, when you could just use the simple thing and get 10x as far? It's not like there's not enough work to do.

abound · 3h ago

You don't need to entirely forget this. I've made a habit of regularly seeking out job opportunities and interviewing even when I'm entirely happy with my job, which is to say I've done a ton of these kinds of interviews (on both sides of the table).

Unless the initial question requirements are insane (build Twitter at Twitter scale), I start with the smallest, dumbest thing that will work. Usually that's a single machine/VM talking to a database (or even just SQLite!). Compute and storage are so fast these days that you could comfortably run your fledgling service on a Raspberry Pi, even serving three or four-digit QPS depending on the workload.

Of course, you still have to "play the game" in the interview, so make sure to be clear about how you'd change this as requirements changed (higher QPS, more features, etc)

ajmurmann · 3h ago

In some ways it's worse. There are also project review interviews. "We had a Rails/Django/whatever monolith that was backed by Postgres and we didn't need a SPA" makes for a less impressive session with many companies. This creates a lot of incentive to overcomplicate/"future proof" things for resume building.

zellyn · 4h ago

I recently had an interview like this. Felt like half the answers I gave were of the form, “You can do scaling/sharding/partitioning thing X here, but once again, for an internal app I’d try really hard to avoid doing any of that”. If you’re interviewing with capable, experienced developers, they’ll appreciate that answer (at least, I got the offer on this one!)

LoganDark · 21m ago

I don't think they're completely not looking for that entire type of answer, but those examples are pretty dry and don't really go into the reasoning for your opinion, which is probably what they're worried about. Whenever you say something isn't worth considering or doesn't seem necessary, you should be explaining exactly why you think that, and exactly where it would be worth considering or seem necessary, because otherwise you just look like someone who simply doesn't care about whatever kind of scalability they're asking about.

jstummbillig · 3h ago

If you know that those are not the answers they are looking for, you can reasonably pass by modifying the answer only slightly, while still getting your point across.

If you can't, you might be getting interviewed by people you do not what to work with and you should want to know that.

flashgordon · 3h ago

Except these are the people in your way of getting that job that could be potentially life/career changing for you financially or otherwise. In this market or depending on your situation that would be hard to ignore.

jstummbillig · 2h ago

I think that's a red herring. You are a knowledge worker. You are paid to disagree when necessary. Yes, people will probably take offense when you say "that's just a dumb question" but if they can't at least be approached when you offer your opinion in a palatable(sic) way, that's simply not going to work.

Understand what is being asked. Your insight on a topic is being tested. Offer an answer that does not read like a dodge or a coin flip.

ozgrakkurt · 3h ago

You don’t want to work at a company like this anyway.

__turbobrew__ · 52m ago

If you want to get top dollar at a FAANG you will need to go through these type of system design interviews. You could say you shouldn’t work for a FAANG which is fair, but FAANG pays top dollar.

0manrho · 3h ago

True, but people generally don't want to get evicted or have their utilities turned off either. If you need a job you need a job, and the numbers out of Cali's job market puts a lot of tech people in a position where they might not have the luxury of waiting for the "right fit". As always, YMMV and the world is a big place, everyone's different, yadda yadda.

ajuc · 1h ago

Answer what they want and finish with "but in practice it doesn't matter for this much traffic and would be wasted effort".

People ask for fizzbuzz in parallel not because it's practical.

renewiltord · 1h ago

You can always say “Since we’ve got only x QPS, I’m going to do A. If we had say y QPS, I’d do B but that would impact the rest of the design. Let me know if you anticipate growth to y and I can show you how I’d do it”

The point of an interview is to lay bare one’s thought process entirely so that the interviewer has full awareness of the person you are. And to likewise extract that from the interviewer. Getting or transmitting less information is just underutilizing the time. Interviewers are also flawed and may not be good enough at extracting the information from you.

If you’re an ideal decision maker, you will likely out-skill the majority of interviewers. You’re being hired to make their org succeed. So just do that.

I think people who describe system designs frequently fail to demarcate the space they’re operating in, so subsequent engineers cannot determine whether the original designer failed to consider something or whether the original designer considered and dismissed something. The point is to be able to express this concisely.

IMHO, doing it well means that not only do you get it right but you send the information down through time so that subsequent observers understand why and also get it right consequently.

mupuff1234 · 2h ago

I think you might be missing the point.

Your answers are completely valid but you have to communicate to the interviewer that you considered the possibilities and the tradeoffs.

If the interviewer needs to "forcefully" extract from you the logic behind your design choices than a lot of times that's enough to fail you.

motorest · 6h ago

What a great article. It's always a treat to read this sort of take.

I have some remarks though. Taken from the article:

> Avoid having five different services all write to the same table. Instead, have four of them send API requests (or emit events) to the first service, and keep the writing logic in that one service.

This is not so cut-and-dry. The trade offs are far from obvious or acceptable.

If the five services access the database then you are designing a distributed system where the interface being consumed is the database, which you do not need to design or implement, and already supports authorization and access controls out of the box, and you have out-of-the-box support for transactions and custom queries. On the other hand, if you design one service as a high-level interface over a database then you need to implement and manage your own custom interface with your own custom access controls and constrains, and you need to design and implement yourself how to handle transactions and compensation strategies.

And what exactly do you buy yourself? More failure modes and a higher micro services tax?

Additionally, having five services accessing the same database is a code smell. Odds are that database fused together two or three separate databases. This happens a lot, as most services grow by accretion and adding one more table to a database gets far less resistance than proposing creating an entire new persistence service. And is it possible that those five separate services are actually just one or two services?

dkarl · 3h ago

> And what exactly do you buy yourself?

APIs can be evolved much more easily than shared database schemas. Having worked with many instances of each kind of system, I think this outweighs all of the other considerations, and I don't think I'll ever again design a system with multiple services accessing the same database schema.

It was maybe a good idea if you were a small company in the early 2000s, when databases were well-understood and services weren't. After that era, I haven't seen a single example of a system where it wasn't a mistake for multiple services to access the same database schema (not counting systems where the read and write path were architecturally distinct components of the same service.)

CuriouslyC · 2h ago

Service specific views, my guy.

paffdragon · 6h ago

> the interface being consumed is the database, which you do not need to design or implement

You absolutely should design and implement it, exactly because it is now your interface. In fact, it will add more constraints to your design, because now you have different consumers and potentially writers all competing for the same resource with potentially different access patterns. Plus the maintenance overhead that migrations of such shared tables come with. And eventually you might have data in this table that are only needed for some of the services, so you now need to implement views and access controls at the DB level.

Ideally, if you have a chance to implement it, an API is cleaner and more flexible. The problem in most cases is simply business pushing for faster features which often leads to quick hacks including just giving direct access to some DB table from another service, because the alternative would take more time, and we don't have time, we want features, now.

But I agree with your thoughts in the last paragraph. It happens very often that people don't want to undertake the effort of a whole new design or redesign to match the evolving requirements and just patch it by adding a new table to an existing DB, then another,...

marcosdumay · 3h ago

> Plus the maintenance overhead that migrations of such shared tables come with.

Moving your data types from SQL into another language solves exactly 0 migration problems.

Every migration you can hide with that abstraction language you can also hide in SQL. Databases can express exactly the same behaviors as your application code.

sethammons · 6h ago

The goal is to minimize what needs changing when things need changing.

When you need to alter the datastore, usually for product or scalability, you have to orchestrate all access to that datastore.

Ergo: one only one thing using the datastore means less orchestration.

At work, we just updated a datastore. We had to move some tables to their own db. 3 years later, 40+ teams have updated their access. This was a product need. If this was a scale issue, the product would just have died sans some as of yet imagined solution.

wahnfrieden · 5h ago

A reused code library for DB use is an alternative there

paffdragon · 4h ago

That moves your API layer to the client library you need to distribute and build for your customers in programming languages they support. There are some cases where a thick client makes sense, but usually easier to do it server side and let customers consume the API from their env, it is easier to patch the server than to ship library updates to all users.

sgarland · 4h ago

> Additionally, having five services accessing the same database is a code smell.

Counterpoint (assuming by database you mean database cluster, not a schema): having a separate physical DB for each service means that for most places, your reliability has now gone from N to N^M.

lnenad · 1h ago

From which perspective? If a service is up, but is unable to do anything since another service is down, what good does it do other than increase some metrics on some dashboard. (Note that we are specifically talking about coupled services since the implication is writing to a single db being split up into multiple dbs - a distributed monolith).

bubblebeard · 6h ago

I think the author meant, in a general way, it’s better to avoid simultaneous writes from different services, because this is an easy way to introduce race conditions.

Muromec · 5h ago

>And what exactly do you buy yourself? More failure modes and a higher micro services tax?

Nice boxes in the architectural diagram. Each box is handed to a different team and then, when engineers from those teams don't talk to each other, the system doesn't suddenly fail in an unexpected way.

PartiallyTyped · 5h ago

At amzn a decision from atop was made that nobody would ever write in shared dynamo db tables. A team would own and provide APIs. That massively improved reliability and velocity.

paffdragon · 4h ago

The team boundary is very important. You can get away with shared DB for a long time if the same team handles all services that access it and have absolute tight control over them. If there are different teams in picture, however, the tight coupling is a source of problems and a bottleneck, beyond prototyping / idea validation, etc.

foobarian · 4h ago

I don't need a decision from atop amazon to remind me how painful it would be to migrate a widely shared dynamo instance or god forbid change dax settings

bambax · 10h ago

> When querying the database, query the database. It’s almost always more efficient to get the database to do the work than to do it yourself. For instance, if you need data from multiple tables, JOIN them instead of making separate queries and stitching them together in-memory.

Oh yes! Never do a join in the application code! But also: use views! (and stored procedures if you can). A view is an abstraction about the underlying data, it's functional by nature, unlikely to break for random reasons in the future, and if done well the underlying SQL code is surprisingly readable and easy to reason about.

bob1029 · 9h ago

This is a big part of what makes ORMs a problem.

Writing raw SQL views/queries per MVC view in SSR arrangements is one of the most elegant and performant ways to build complex web products. Let the RDBMS do the heavy lifting with the data. There are optimizations in play you can't even recall (because there's so many) if you're using something old and enterprisey like MSSQL or Oracle. The web server should be able to directly interpolate sql result sets into corresponding <table>s, etc. without having to round trip for each row or perform additional in memory join operations.

The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

Too · 5h ago

With an ORM your application code is your views.

You can write reusable plain functions as abstractions, returning QuerySets that allow further filters being chained onto the query, before the actual SQL is materialized and sent to the database.

The result of this doesn’t have to match the original object models you defined, it’s still possible to be flexible with group bys resulting in dictionaries.

tremon · 3h ago

But converting a SQL relation to a set of dictionaries already carries a lot of overhead: every cell in the resultset must be converted to a key-value pair. And the normal mechanics of vertical "slicing" a set of dictionaries is much more expensive than doing the same in a 2d relation array. So while you might want to offer a dictionary-like interface for the result set, please don't use a dictionary-like data structure.

henry2023 · 4h ago

Unpopular opinion. ORM by definition is the gcd of "supported databases" features. It exists only because people doesn't like the aesthetics of SQL but the cost to use them is immense.

CuriouslyC · 2h ago

Not unpopular. ORM hate is real. I like SQL Alchemy and Drizzle in projects for the features they give you for free (such as Alembic migrations and instant GraphQL server), but I still write SQL for most stuff.

richardlblair · 7h ago

If your ORM is going to the DB per row you're using it wrong. N+1 queries are a performance killer. They are easy to spot in any modern APM.

Rails makes this easy to avoid. Using `find_each` batches the queries (by 1,000 records at a time by default).

Reading through the comment section on this has been interesting. Either lots of people using half baked ORMs, people who have little experience with an ORM, or both.

wild_egg · 4h ago

I mean Rails also makes it easy to accidentally nest further queries inside your `find_each` block and end up with the same problem.

Your team can have rules and patterns in place to mitigate it but I'd never say "Rails makes this easy to avoid".

richardlblair · 2h ago

This is true with any any interaction with the DB, ORM or otherwise. Regardless of the layer of abstraction you choose to operate at you still need to understand the underlying complexity.

What Rails gives you is easy to use (and understand) abstractions that enable you to directly address performance issues.

Easy is highly contextual here, because none of this is trivial.

mathiaspoint · 2h ago

I think the real value in frameworks like rails and Django is that it makes it easier to collaborate. When you do it from scratch people inevitably write their own abstractions and then you can't share code so easily.

hk1337 · 8h ago

Even in the article the solution wasn’t to abandon the ORM in favor of raw SQL but knowing how to write the code so it doesn’t have to run 100 extra queries when it doesn’t need to.

> Particularly if you’re using an ORM, beware accidentally making queries in an inner loop. That’s an easy way to turn a select id, name from table to a select id from table and a hundred select name from table where id = ?.

scarface_74 · 22m ago

C#’s Linq based ORMs have always been - type safe built into the OS feature -> run time generation of an agnostic expression tree -> database provider converts it into SQL. It does database joins (unless you do something stupid like get out of IQuery land).

tossandthrow · 9h ago

Have you ever build a complex app like this?

In particular, have you have to do testing, security (eg. row level security), manage migrations, change management (eg. for SOC2 or other security frameworks), cache offloads (Redis, and friends), support for microservices, etc.

Comments like this give me a vibe of young developers trying out Supabase for the first time feeling like that approach can scale indefinitely.

rbees · 8h ago

> Comments like this give me a vibe of young developers

I don’t think so. The context is about avoiding joining in memory, which is fairly awful to do in a application, and should be avoided, along with uninformed use of ORMs, which often just add a layer of unwarranted complexity leading to things like the dreaded N+1 problem that most inexperienced Rails developers had when dealing with ActiveRecord.

If anything, what you’re talking about sounds like development hell. I can understand a database developer having to bake in support for that level of security, but developing an app that actually uses it gets you so far in the weeds that you can barely make progress trying to do normal development.

A developer with several years of experience or equivalent will have pride in developing complexity and using cool features that make them feel important.

After a developer has maybe twice that many years experience or equivalent, they may develop frameworks with the intent to make code easier to develop and manage.

And beyond that level of experience, developers just want code that’s easy to maintain and doesn’t make stupid decisions like excessive complexity. But, they know they have to let the younger devs make mistakes, because they don’t listen, so there is no choice but to watch hell burn.

Then you retire or get a different job.

tossandthrow · 6h ago

I don't know what I am talking about that sounds like hell?

I am merely talking about properties of developing complex web applications that have traditionally not been easy to work with in SQL.

I am in particular not proposing any frameworks.

How can that sound like hell?

lurking_swe · 8h ago

Not the person you replied to, but I have! A java project I worked on a couple years ago used a thin persistence layer called JOOQ (java library). It basically helps you safely write sql in java, without ORM abstractions. Worked just fine for our complex enterprise app.

Sql migrations? This is a solved problem: https://github.com/flyway/flyway

What about micro services? You write some terraform to provision a sql database (e.g. aws aurora) just like you would with dynamo db or similar. What does that have to do with ORMs?

What about redis? Suddenly we need an ORM to query redis, to check if a key exists in the cache before hitting our DB? That’s difficult code to write?

I’m confused reading your comment. It has “you don’t do things my way so you must be dumb and playing with toy projects” vibes.

__MatrixMan__ · 6h ago

As a previous user of alembic I was surprised that flyway's migrations only go forward by default and that reversing them is a premium feature. That's like having the luxury trim being the one with seatbelts.

lurking_swe · 1m ago

it’s been a while since I used flyway. is there a better option in 2025? Just curious.

tossandthrow · 6h ago

From what I can se jooq is only really type safe with pojo mappings, to what point it is an orm with an expressive query dsl.

Alternatively you use record style outputs, but that is prone to errors if positions are changed.

Regardless, even with jooq you still accept that there is a sizable application layer to take responsibility of the requirements I listed.

lurking_swe · 4m ago

i guess it’s semantics, but i agree with you actually. After all ORM = object relational mapping. However it’s certainly the most lightweight ORM i’ve used in the java and c# world. With JOOQ you are in complete control of what the SQL statements look like and when those queries happen (avoids the common N + 1 risk). _Most_ ORMs i’ve seen attempt to abstract the query from the library user.

In our project we generated pojo’s in a CI pipeline, corresponding to a new flyway migration script. The pojos were pushed to a dedicated maven library. This ensured our object mappings were always up to date. And then we wrote sql almost like the old fashioned way…but with a typesafe java DSL.

Yokohiii · 6h ago

I don't understand why all these problems should be easier handled with an ORM then with raw sql?

tossandthrow · 6h ago

It is a granluarity tradeoff.

With SQL you need to explicitly test all queries where the shape granularity is down to field level.

When you map data onto an object model (in the dto sense, not oop sense) you have bigger building blocks.

This gives a simpler application that is more reliable.

Obviously you need to pick a performant orm - and it seems a lot of people in these threads have been traumatized.

Personally, I run a complex application where developers freely use a graphql schema and requests are below 50ms p99 - gql in translated into joins by the orm, so we do not have any n+1 issues, etc.

johnmaguire · 6h ago

The issue with GraphQL tends to be unoptimized joins instead. Is your GraphQL API available for public consumers? How do you manage them issuing inefficient queries?

I've most often seen this countered through data loaders (batched queries that are merged in code) instead of joins, or query whitelists.

tossandthrow · 5h ago

While this api in particular is not publicly exposed, that would not be a concern.

The key is to hold the same schema on the database as on the graphql and use tooling that can translate a gql query into a single query.

Yokohiii · 6h ago

In my ears that's just neglect? You assume your ORM does the basic data mapping right and don't verify it?

marcosdumay · 2h ago

> You assume your ORM does the basic data mapping right

You know, it should. There's no good reason for an ORM to ever fail at runtime due to mapping problems instead of compile time or start time. (Except, of course if you change it during the software's execution.)

Yokohiii · 2h ago

Why should a raw query fail?

tossandthrow · 5h ago

No? The difference is to verify it ones for the orm VS ones for every single place your query.

Yokohiii · 3h ago

I have to respond here as I seemingly the depth limit is reached.

As you've mentioned graphql you probably comparing ORM in that sense to an traditional custom API with backed by raw sql. In a fair comparison both version would do the exactly same, require the same essential tests. Assuming more variations for the raw sql version is just assuming it does more or somehow does it badly in terms of architecture. Which is not a fair comparison.

tossandthrow · 56m ago

The orm represents deferred organization. Ie someone else is testing mapping and query generation for you.

An example is prisma. Prisma has a team og engineers that work on optimizing query generation and provide a simple and intuitive api.

Not using an orm forces you to take over that organization and test that extra complexity that goes into you code base.

It might be merited if you get substantiel performance boosts - but I have not seen any reasonably modern orm where performance is the issue.

Yokohiii · 4h ago

A raw query doesn't has to be repeated in every place it's required. Not sure what your point is.

tossandthrow · 3h ago

You will have a bigger variety of queries hwne you don't use an orm - this puts a higher load on software testing to get the same level of reliability.

sgarland · 2h ago

> 50 ms p99

You realize that’s abysmally bad performance for any reasonable OLTP query, right? Sub-msec (as measured by the DB, not including RTT etc.) is very achievable, even at scale. 2-3 msec for complex queries.

LinXitoW · 5h ago

Why is it so hard to believe that well tested, typed code is better than manual string concatenation?

Before you tell me about how you just use a Query Builder/DSL and a object mapper for convenience: That's a freaking ORM!

cpursley · 8h ago

Guessing you are a Rails dev?

mattmanser · 8h ago

Most ORMs will happily let you map stored procedures and views to a class, you can have as many models as you want. So your point doesn't really make sense.

The author's said nothing about ORMs. It feels like you're trying to post a personal beef about ORMs that's entirely against the "pragmatic" software design engineering the author's opining. Using ORMs to massively reduce your boiler-plate CRUD code, then using raw SQL (or raw SQL + ORM doing the column mapping) for everything else is a pragmatic design choice.

You might not like them, but using ORMs for CRUD saves a ton of boilerplate, error-prone, code. Yes, you can footgun yourself. But that's what being a senior developer is all about, using the tools you have pragmatically and not foot gunning yourself.

And it's just looking for the patterns, if you see a massive ORM query, you're probably seeing a code smell. A query that should be in raw SQL.

dondraper36 · 7h ago

In Go, for example, there is a mixed approach of pgx + sqlc, which is basically a combo of the best Postgres driver + type-safe code generator (based on raw SQL).

https://brandur.org/sqlc

Even though I often use pgx only, for a new project, I would use the approach above.

Yokohiii · 6h ago

The way you describe it, it would be ideal if ORMs would only handle very basic CRUD and force you to use raw sql for complex queries. But that's not reality and not how they are used, not always. In my opinion some devs take pride to do everything with their favorite ORM.

I think if an app uses 90% ORM code with the remains as raw queries, a junior is inclined to favor ORM code and is also less exposed to actually writing SQL. He is unlikely to become an SQL expert, but using SQL behind a code facade, he should become one.

mexicocitinluez · 6h ago

>The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

I can't respond to the "typical" part as most of my experience is using EF Core, but it's far from inflexible.

Most of my read-heavy, search queries are views I've hand written that integrate with EF core. This allows me to get the benefit of raw SQL, but also be able to use LINQ to do sorting/paging/filtering.

tialaramex · 6h ago

Stored procedures seem like a win but the big problem is that while I could write the rest of the software in a very nice modern language like Rust, or more practically in C# since my team all know C# if I write a stored procedure it will be in Transact-SQL because that's the only choice.

T-SQL was not a good programming language last century when it was vaguely current, and so no I do not want to write any significant amount of code in T-SQL. For my sins I maintain a piece of software with huge T-SQL procedures (multi-page elaborations by somebody who really, really like this stuff) and they're a nightmare. The tooling doesn't really believe in version control, the diagnostics when you make a mistake are either non-existent or C++ style useless spew.

We hire a lot of very junior developers. People who still need to be told not to comment out code in release, that variable numbers are for humans to read not machines, that sort of thing. We're not quite hiring physicists to write software (I have done that at a startup) but it's close. However, none of the poor "My first program" code I see in a merge request by a new hire is anywhere close to as unreadable as the T-SQL we already own and maintain.

Yokohiii · 6h ago

I've only once tried to use stored procedures in mysql and it was almost impossible to debug back then. Very painful. Average devs already have issues being smart with their databases and stored procedures would add to that.

Stored procedures also add another risk. You have to keep them in sync with code, making releases more error prone. So you have to add extra layers of complexity to manage versioning.

I can see the advantage of extreme performance/efficiency gains, but it should be really big to be justified.

CuriouslyC · 2h ago

I'm a big postgres guy and in theory I love stored procedures (so many language options!) but you're 100% right that the downsides in terms of DX make them pretty much the last thing I reach for unless they're a big performance/simplicity win and I expect them to be pretty static over time.

loglog · 5h ago

> Stored procedures also add another risk. You have to keep them in sync with code, making releases more error prone.

This one is easily solved: never change a stored procedure. Every version should get a new name.

Yokohiii · 4h ago

That's what I meant when I've mentioned versioning.

doitLP · 5h ago

I worked at a place with just such a system. Half the application code was baked into sprocs, no version control and hidden knock on effects everywhere.

There was _one guy_ who maintained it and understood how it worked. He was very smart but central to the company’s operations. So having messy stuff makes it brittle/hard to change in more ways than one and

mdavid626 · 8h ago

I disagree. In modern highly scalable architectures I’d prefer doing joins in the layer front of the database (backend).

The “backend” scales much easier than the database. Loading data by simple indexes, eg. user_id, and joining it on the backend, keeps the db fast. Spinning up another backend instance is easy - unlike db instance.

If you think, your joins must happen in db, because data too big to be loaded to memory on backend, restructure it, so it’s possible.

Bonus points for moving joins to the frontend. This makes data highly cacheable - fast to load, as you need to load less data and frees up resources on server side.

riv991 · 7h ago

High Scale is so subjective here, I'd hazard a guess that 99% of businesses are not at the scale where they need to worry about scaling larger than a single Postgres or MySQL instance can handle.

Tade0 · 7h ago

In the case of one project I've been in, the issue was the ORM creating queries, which Postgres deemed too large to do in-memory, so it fell back to performing them on-disk.

Interestingly it didn't even use JOIN everywhere it could because, according to the documentation, not all databases had the necessary features.

A hard lesson in the caveats of outsourcing work to ORMs.

richardlblair · 7h ago

I've worked both with ORMs and without. As a general rule, if the ORM is telling you there is something wrong with your query / tables it is probably right.

The only time I've seen this is my career was a project that was an absolute pile of waste. The "CTO" was self taught, all the tables were far too wide with a ton of null values. The company did very well financially, but the tech was so damn terrible. It was such a liability.

mdavid626 · 7h ago

Scalability is not the keyword here.

The same principle applies to small applications too.

If you apply it correctly, the application never going to be slow due to slow db queries and you won’t have to optimize complex queries at all.

Plus if you want to split out part of an app to its own service, it’ll be easily possible.

nicoburns · 6h ago

One of the last companies I worked at had very fast queries and response times doing all the joins in-memory in the database. And that was only on a database on a small machine with 8GB RAM. That leaves a vast amount of room for vertical scaling before we started hitting limits.

dondraper36 · 7h ago

Vertical scaling is criminally underrated, unfortunately. Maybe, it's because horizontal scaling looks so much better on Linkedin.

mdavid626 · 7h ago

Sooner or later even small apps reach hardware limits.

My proposed design doesn’t bring many hard disadvantages.

But it allows you to avoid vertical hardware scaling.

Saves money and development time.

dondraper36 · 7h ago

Not really disagreeing with you here, but that "later" never comes for most companies.

AdrianB1 · 7h ago

My manufacturing data is hundreds of GB to a few TB in size per instance and I am talking about hot data, that is actively queried. It is not possible to restructure and it is a terrible idea to do joins in the front end. Not every app is tiny.

mdavid626 · 7h ago

In some cases, it’s true.

But your thinking is rather limited. Even such data can be organized in a way, that joins are not necessarily in the db.

This kind of design always “starts” on the frontend - by choosing how and what data will be visible eg. on a table view.

Many people think, showing all data, all the time is the only way.

AdrianB1 · 7h ago

The SQL database has more than a dozen semi-independent applications that treat different aspects of the manufacturing process, for example from recipes and batches to maintenance, scrap management and raw material inventory. The data is interlocked, the apps are independent as different people in very different roles are using it. No, it never starts in the front end, it started as a system and evolved by adding more data and more apps. Think SAP as another such example.

mdavid626 · 7h ago

This is and “old-school” design. Nowadays I wouldn’t let apps meet in the database.

Simple service oriented architecture is much preferred. Each app with its own data.

Then such problems can be easily avoided.

dakiol · 6h ago

It’s not old school, it’s actually solid design. I have worked too with people that think the frontend or even services should guide the design/architecture of the whole thing. Seems tempting and it has the initial impression that it works, but long terms it’s just bad design. Having Data structures (and mainly this means database structures) stable is key to long term maintenance.

johnmaguire · 6h ago

> Seems tempting and it has the initial impression that it works, but long terms it’s just bad design.

This appears as an opinion rather than an argument. Could you explain what you find bad about the design?

In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

RaftPeople · 2h ago

> In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

I think the idea of breaking up a shared enterprise DB into many distinct but communicating and dependent DB's was driven by a desire to reduce team+system dependencies to increase ability to change.

While the pro is valid and we make use of the idea sometimes when we design things, the cons are significant. Splitting up a DB that has data that is naturally shared by many departments in the business and by many modules/functional areas of the system increases complexity substantially.

In the shared model, when some critical attribute of an item (sku) is updated, then all of the different modules+functional areas of enterprise are immediately using that current and correct master value.

In the distributed model, there is significant complexity and effort to share this state across all areas. I've worked on systems designed this way and this issue frequently causes problems related to timing.

As with everything, no single solution is best for all situations. We only split this kind of shared state when the pros outweigh the cons, which is sometimes but not that often.

mdavid626 · 1h ago

I disagree. I generally understand the problem a "split-up" database brings to the table. This is how people designed things in the last many decades.

What I propose is to leave this design behind.

The split up design fits modern use cases much better. People want all kind of data. They want to change what data they want rather often.

"One" database for all of this doesn't really work -- you can't change the schema since it's used by many applications. So, you'll stuck with a design coming from a time when requirements were probably quite different. Of course, you can make some modifications, but not many and not fundamental ones.

In the split-up design, since you're not sharing the database, you can do whatever you want. Change schema as you see fit. Store data in multiple different forms (duplicates), so it can be queried quickly. The only thing you have to keep is the interface to the outside world (department etc.). Here you can use eg. versioning of your API. Handy.

The 90's are over. We don't have to stick to the limitations people had back then.

Yes of course, data not being up-to-date in every system can be a problem. BUT business people nowadays tend to accept that more, than the inability to change data structures ("we can't add a new field", "we can't change this field" etc.).

mdavid626 · 7h ago

Good, simple solution could be data duplication, eg. store some props from the joined tables directly in the main table.

I know, for many, this is one of the deadly sins, but I think it can work out very well.

torginus · 7h ago

Are you sure about this?

Let's say you run a webshop and have two tables, one for orders with 5 fields, one for customers, with 20 fields.

Let's say you have 10k customers, and 1m orders.

A query performing a full join on this and getting all the data would result in 25 million fields transmitted, while 2 separate queries and a client side manual join would be just 5m for orders, and 200k for customers.

bambax · 31m ago

One way to think about 1-to-many relationships is to think in the other way, "many-to-one". You don't join the orders to the customers, you join the customers to the orders (enrich the orders with customer information).

It's very natural to want customer information when querying an order, and if you have a view like orders_with_customer_info, you get that with zero effort when querying that view by order id.

You also get consolidated data (orders by customer) by doing

  select count(*), sum(amount) from orders_with_customer_info group by customer_id

which I think is pretty straightforward.

jameshart · 7h ago

If you need all the orders and all the customers sure.

But usually you need some of the orders and you need the customer info associated with them. Often the set of orders you’re interested in might even be filtered by attributes of the customers they belong to.

The decision of whether to normalize our results of a database query into separate sets of orders and customers, or to return a single joined dataset of orders with customer data attached, is completely orthogonal to the decision of whether to join data in the database.

digitalPhonix · 7h ago

What sort of application is regularly doing a query for “all data”?

aembleton · 4h ago

Client report generation.

tremon · 3h ago

1) As soon as reporting requirements get serious, you build a data warehouse. Because odds are, the client will want to combine data from multiple systems in their reports anyway. If not today, then they will tomorrow.

2) such reports never need all the data, it's mostly about top N volume queries or month-over-month performance data. When a reporting application does query all the data, it's because it's building its own data warehouse so the query usually happens only once per day, at a specific time, which means the load is entirely predictable.

nicoburns · 6h ago

These days you can use JSON aggregation in the database to avoid returning duplicate data in what would otherwise be large joins.

dondraper36 · 7h ago

What I particularly like about the comments in this thread is how it proves that everything is a trade-off :)

valiant55 · 6h ago

My rule of thumb is if it's a 1:1 relationship, use a join. If it's 1:M, separate queries.

quietbritishjim · 9h ago

I think it's ok to have this rule as a first approximation, but like all design rules you should understand it well enough to know when to break it.

I worked on an application which joined across lots of tables, which made a few dozen records balloon to many thousands of result rows, with huge redundancy in the results. Think of something like a single conceptual result having details A, B, C from one table, X, Y from another table, and 1, 2, 3 from another table. Instead of having 8 result rows (or 9 if you include the top level one from the main table) you have 18 (AX1, AX2, AX3, AY1, ...). It gets exponentially worse with more tables.

We moved to separate queries for the different tables. Importantly, we were able to filter them all on the same condition, so we were not making multiple queries to child tables when there were lots of top-level results.

The result was much faster because the extra network overhead was overshadowed by the saving in query processing and quantity of data returned. And the application code was actually simpler, because it was a pain to pick out unique child results from the big JOIN. It was literally a win in every respect with no downsides.

(Later, we just stuffed all the data into a single JSONB in a single table, which was even better. But even that is an example of breaking the old normalisation rule.)

nicoburns · 6h ago

If you use CTEs and json_agg then you can combine your separate queries into one query without redundant data.

wongarsu · 8h ago

That reminds me of many cases of adhering to database normalisation rules even in views and queries, even in a case where you should break it. Aggregation functions like postgres's array_agg and jsonb_agg are incredibly powerful at preventing the number of rows from ballooning in situations like those

9rx · 9h ago

> which made a few dozen records balloon to many thousands of result rows

That doesn't really sound like a place where data is actually conceptually joined. I expect, as it is something commonly attempted, that you were abusing joins to try and work around the n+1 problem. As a corollary to the above, you also shouldn't de-join in application code.

kccqzy · 8h ago

It's a join. A join without any ON or USING clause or any filtering is a Cartesian product which is what's happening here.

magicalhippo · 7h ago

I think it's more like avoid doing a "limiting" join in the application, ie where the join is used to limit the output to a subset or similar.

As a somewhat contrived example since I just got out of bed, if your software has a function that needs all the invoice items from invoices from this year which invoice address country is a given value, use a join rather than loading all invoices, invoice addresses and invoice items and performing the filtering on the client side.

Though as you point out, if you just need to load a given record along with details, prefer fetching detail rows independently instead of making a Cartesian behemoth.

victorbjorklund · 6h ago

Not sure I agree. First of all it can be more performant. Say you fetch 1000 records. And we need to join on a table where these 1000 records just got 2 different foreign keys. Instead of joing in db and fetching a lot more data we can do two queries and join in app instead. Secondly, makes it easier to cache data. Lets say the thing we joing with almost never changes (like some country info) we can cache that and just join it with the data from the db.

Not saying this should always be the case, but sometimes it is the right call.

teraflop · 5h ago

But as a counterpoint to that, (a) the database has its own caching built in, which you don't have to implement, and (b) the database knows when to invalidate its cache.

To quote Douglas Adams: "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair."

Likewise, if you cache a piece of data in your application because you assume that it won't change, that just makes it likely that if and when it does change, you'll have bugs. Moving the cache to the database layer so that it can be properly invalidated fixes this.

It's true that an application-side join can still be more performant if the DB cache isn't good enough, but IMO you should only take that step after actually profiling your queries.

Ozzie_osman · 5h ago

There are definitely examples of when you want to do joins in the application.

For example, you may want to (or have the option to) vertically partition your database, or use different data stores. The app layer is usually stateless and can scale perpetually, but the database might be a bottleneck.

Joining in the database over the application is a great default. But I wouldn't say "never join in the application code".

bencornia · 2h ago

Views are great. Stored procedures are cursed.

hk1337 · 8h ago

You should be careful with how much you lean into “doing it in the database” as well with how you implement it. Lest, you get the situation where your application inserts as one value and it gets saved completely different.

mattmanser · 7h ago

I'm not sure if this is what you mean, but I think a big thing missing from the article is how you should isolate you business logic.

A great software design will separate all business logic into its own layer. That might be a distinct project, module, or namespace, depending on what your language supports. Keep business logic out of SQL and out of web server code (controllers, web helpers, middleware, etc.).

Then you're treating SQL as the data store it is designed to be. When you embed application logic in SQL, you're hiding core functionality in a place where most developers won't expect to find it. This approach also creates tight coupling between your application and your database provider, making it hard to switch as needs change/the application grows.

dondraper36 · 7h ago

That also depends on what you would consider "business logic in the database".

What would you say about CHECK constraints, though? I don't think it's something few developers expect to see, and having these checks is very convenient.

I know that there are even opponents of foreign keys (which makes sense sometimes), but in general, I don't understand why I would ever throw away the nice features of Postgres that can enforce correctness.

hk1337 · 2h ago

Pretty much yeah. I think you need to stay somewhat vigilant because it can creep in under the guise of validating data.

Triggers is something I was thinking about that is grossly misused.

tossandthrow · 9h ago

Views make good sense when you can check them in - and DB migrations are a poor way of doing it due to their immutable nature.

Depending on the ecosystem the code base adopts a good orm might be a better choice to do joins.

scarface_74 · 6h ago

I have a very strict rule about any system I design or that I’m over against using stored procedures.

When I use to interview to be a developer at a company, it was always an automatic no for me if a company kept business logic in stored procedures and had a separate team of “database developers”.

As far as not doing joins in code, while I agree for the most part. GitHub itself has a rule against joining tables using sql that belong to different domains.

https://github.blog/engineering/infrastructure/partitioning-...

luckylion · 6h ago

I feel like the biggest question to ask is: how expensive is it exactly, how often do you need to do it, and how important is the speed of it?

If you have some complex queries on every page load with a huge number of users, put it in the DB as much as possible.

If you need to iterate over a bunch of records and do something based on some combination of values, and it's for a weekly reporting thing, I'd much rather see 3 nested foreach loops with a lot of early exits to skip the things you don't care about than a multi-kb SQL-statement that took two days to develop and nobody every dares to touch again because it's hard to handle.

nurettin · 9h ago

for me, it is orm -> schema bound views -> views -> table functions -> stored procedure as a last resort (hopefully it doesn't come to that)

CafeRacer · 9h ago

I came here to say an exactly opposite things. There were a few instances where a relatively heavy join would not perform well, no matter what I tried. And it was faster to load/stitch data together with goroutines. So I just opted to doing it that way.

Also SQL is easy, but figuring out what's up with indexes and planner is not.

nvarsj · 7h ago

> Paradoxically, good design is self-effacing: bad design is often more impressive than good.

Rings very true. Engineers are rated based on the "complexity" of the work they do. This system seems to encourage over-engineered solutions to all problems.

I don't think there is enough appreciation for KISS - which I first learned about as an undergrad 20 years ago.

anal_reactor · 6h ago

This is unfortunately true. People love complex solutions, and suggesting a simple one usually comes across as incompetent, while the reality is, simple solutions are easy to manage, which ensures the success of the project as a whole.

Sure, there are problems that are inherently complex and require complex solutions. But most likely yours isn't one of them, most likely you have a basic web app.

chrisweekly · 3h ago

One of the smartest engineers I've encountered in my 27 year career advised me to strive to do "the simplest thing that could possibly work" - not just to get unblocked on something new, but as a guiding principle. It resonated (and goes beyond "KISS", for me), and IME is real wisdom.

jdlshore · 1h ago

That’s a slogan from Extreme Programming! Coined by Ron Jeffries, I think, along with YAGNI (You Aren’t Gonna Need It), as a way of reminding people not to overengineer for features that in the plan.

msiyer · 5h ago

> Avoid having five different services all write to the same table. Instead, have four of them send API requests (or emit events) to the first service, and keep the writing logic in that one service.

The ideal solution: Avoid having five different services all write to the same table.

If five different services have to write to the same table, there is a major overlap of logic too. Are the five services really different or one would suffice?

Taking practical realities into consideration, we can do what the author says. However, we risk implementing a lot of orchestration logic. We introduce a whole new layer of problems. Is that time not better spent refactoring the services: either give them their own DB tables or merge them into one servic?

KronisLV · 9h ago

> Schema design should be flexible, because once you have thousands or millions of records, it can be an enormous pain to change the schema. However, if you make it too flexible (e.g. by sticking everything in a “value” JSON column, or using “keys” and “values” tables to track arbitrary data) you load a ton of complexity into the application code (and likely buy some very awkward performance constraints). Drawing the line here is a judgment call and depends on specifics, but in general I aim to have my tables be human-readable: you should be able to go through the database schema and get a rough idea of what the application is storing and why.

I’m surprised that the drawbacks of EAV or just using JSON in your relational database don’t get called out more.

I’d very much rather have like 20 tables with clear purpose than seeing that colleagues have once more created a “classifier” mechanism and are using polymorphic links (without actual foreign keys, columns like “section” and “entity_id”) and are treating it as a grab bag of stuff. One that you also need to read the application code a bunch to even hope to understand.

Whenever I see that, I want to change careers. I get that EAV has its use cases, but in most other cases fuck EAV.

It’s right up there with N+1 issues, complex dynamically generated SQL when views would suffice and also storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic. Oh and also shared database instances and not having the ability to easily bootstrap your own, oh and also working with Oracle in general. And also putting things that’d be better off in the app inside of the DB and vice versa.

There are so many ways to decrease your quality of life when it comes to storing and accessing data.

dondraper36 · 9h ago

There's a great book SQL Antipatterns, by Bill Karwin where this specific antipattern is discussed and criticized.

That said, sometimes when I realize there's no way for me to come up even with a rough schema (say, some settings object that is returned to the frontend), I use JSONB columns in Postgres. As a rule of thumb, however, if something can be normalized, it should be, since, after all, that's still a relational database despite all the JSON(B) conveniences and optimizations in Postgres.

quibono · 7h ago

> storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic

What's the "proper" way to do this? Separate DB? Separate data store?

KronisLV · 7h ago

Typically you want your audit/log data to be immutable and kept in an append only data store.

Whether that's a typical relational DB or something more specialized (like a log shipping solution) that's up to you, but usually it would be separate from the main DB.

If you need some functionality that depends on events that have taken place, you probably want to store information about those events in the main data store (but only what's needed for that functionality, not a list of all mutations done to a table like audit data might include).

In general, it's nice to have such a clear boundary of where the business domain ends and where the aux. stuff to help you keep it running goes - your logs and audit data, analytics and metrics, tracing spans and so on.

Edit: as a critique of my own arguments here, I will admit that doing the above can introduce some complexity and that in simpler systems it might be overkill. But I've seen what happens when everything is just in one huge DB instance, where about 90% of the overall schema size is literally due to records in those audit tables and everyone is surprised why opening the "History" tab for a record takes a while (and anything else that references said history, e.g. visibility of additional records), and it's not great either.

tremon · 3h ago

Separate schema, no read permissions for the application identity is sufficient. It's not like "separate db" makes it magically unqueryable.

voidhorse · 4m ago

Good system design is boring, obvious, and completely uninteresting. This is why a lot of flashy or trendy techniques end up leading to bad systems—they rope people in because of their cleverness or intellectual content, which often is interesting, but stuff that's new/intriguing/intellectually stimulating is often not what you want in a system.

A good system needs to be as easy to understand and interpret as possible, A good system design is so mind-numbing my simple that a nincompoop can understand it. The only deviations from this policy should stem from other requirements like storage, performance, etc.

maest · 1h ago

> But in most cases replication lag can be worked around with simple tricks: for instance, when you update a record but need to use it right after, you can fill in the updated details in-memory instead of immediately re-reading after a write.

I get it, but that sounds very finicky code to get right and a good source of hard-to-debug bugs.

hn8726 · 1h ago

What's the best resource to learn those things in practice? Other than _just trying_ yourself? I'm a senior dev in another area who wants to get into the backend development, but unsurprisingly has limited time to spend on learning a completely new thing,

__turbobrew__ · 40m ago

There are some books out there like “Distributed Systems” by Tanenbaum. Then there are the various company publications like “The SRE Book” by google and eng blogs.

You could also contribute to an open source project like kubernetes or postgres to get your feet wet.

Like all things, the best way to get better is to do it.

ZYbCRq22HbJ2y7 · 10h ago

> You’re supposed to store timestamps instead, and treat the presence of a timestamp as true. I do this sometimes but not always - in my view there’s some value in keeping a database schema immediately-readable.

Seems overly negative of broad advice on a good pattern?

    is_on => true
    on_at => 1023030

Sure, that makes sense.

     is_a_bear => true
     a_bear_at => 12312231231

Not so much, as most bears do not become bears at some point after not being a bear.

grey-area · 9h ago

I’d see the booleans as a bad thing in almost all cases, instead of a boolean you can have a timestamp or an integer field (which can expand later).

In the is_a case almost always a type or kind is better as you’ll rarely just have bears even if you only start with bears, just as you rarely have just two states for a status field (say on or off), often these expand in use to include things like suspended, deleted and asleep.

So generally I’d avoid booleans as they tend to multiply and increase complexity partially when they cover mutually exclusive states like live, deleted and suspended. I have seen is_visible, is_deleted and is_suspended all on the same table (without a status) and the resulting code and queries are not pretty.

I’d use an integer rather than a timestamp to replace them though.

ZYbCRq22HbJ2y7 · 9h ago

Yeah, I mean, an integer can definitely hold more data than a boolean.

If your data was simple enough, you could have an integer hold the entire meaning of a table's row, if every client understood how it was interpreted. You could do bitwise manipulations, encodings, and so on.

Sometimes it is nice to understand what the data means in the schema alone. You can do that with enums, etc.

  ate_an_apple_in_may_2024
  saw_an_eclipse_before_30

These are more of the sort of things I don't see needing enums, timestamps, integers...

grey-area · 42m ago

I’ve never seen anything remotely like that in a schema and it seems inappropriate anyway - those should IMO be timestamps like saw_eclipse_at not booleans. You should not encode business rules in the schema (like certain magic dates) because those business rules always change over time.

setr · 9h ago

If you take the statement at face value — essentially storing booleans in the db ever is a bad smell - then he’s correct.

Although I’m not even sure it’s broadly a good principle, even in the on_at case; if you actually care about this kind of thing, you should be storing it properly in some kind of audit table. Switching bool to timestamp is more of a weird lazy hack that probably won’t be all that useful in practice because only a random subset of data is being tracked like that (Boolean data type definitely isn’t the deciding factor on whether it’s important enough to track update time on).

The main reason it’s even suggested is probably just that it’s “free” — you can smuggle the timestamp into your bool without an extra column — and it probably saved some effort accidentally; but not because it’s a broadly complete solution to the set of problems it tries to solve for

I’ve got the same suspicion with soft-deletes — I’m fairly positive it’s useless in practice, and is just a mentally lazy solution to avoid proper auditing. Like you definitely can’t just undelete it, and it doesn’t solve for update history, so all you’re really protecting against is accidental bulk delete caught immediately? Which is half the point of your backup

maxbond · 9h ago

Audit tables are a big ask both in terms of programming effort to design and support them, and in terms of performance hit due to write amplification (all inserts and updates cause an additional write to an audit table). Whereas making a bool into a timestamp is free. Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window.

setr · 42m ago

> Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window.

But that’s my point. You’re making an active decision to record timestamps on important events (and no bool was being converted here); bool —> timestamp everywhere is not the same thing — the bool data type is not a useful signal for whether this change needs to be timestamp-tracked.

Either think and choose to track these particular changes or not, and drop in the appropriate tracking. The mindless Bool—>timestamp change is only ever suggested because it’s a “why not”, not because it’s a good practice that often leads to good things.

The audit table is of course just deciding “every change ever is important”

mrkeen · 7h ago

Audit tables are a dumb concept because they imply bolting on an actual source of truth in addition to the regular not so source of truth tables, and only if the programmer gets around to it (like documentation or logging or whatever else falls along the wayside).

tremon · 3h ago

This doesn't make sense to me-- if the regular tables don't capture the true state, then the audit tables based on them will not magically become a source of truth either.

RaftPeople · 2h ago

> Audit tables are a dumb concept because they imply bolting on an actual source of truth in addition to the regular not so source of truth tables,

The regular table is the source of truth, the audit table is just a historical record of what changed and when.

valenterry · 8h ago

This. The mere fact that it's much easier to find deleted/impacted entities is worth it.

moebrowne · 9h ago

It's well documented that soft delete is more of a headache than it's worth

https://brandur.org/soft-deletion

spiddy · 9h ago

though why treat booleans as special case and keep timestamps for them when you don’t for integers with this pattern:

isDarkTheme: {timestamped} paginationItems: 50

I can see when dark theme was activated but not when pagination was set to 50.

also, i can’t see when dark theme is being deactivated either.

seems like a poor-man changelog. there maybe use cases for it but i can’t think of anything tbh.

oftenwrong · 7h ago

A boolean is smaller, which is a relevant consideration for some workloads. For example, you may be pre-aggregating a large amount of data to serve a set of analytical queries which do not care about the associated timestamp. The smaller data type is more efficient both in storage and in query execution.

Additionally, there are situations where it is logical to store a boolean. For example, if the boolean denotes an outcome:

    process_executed_at timestamp not null
    process_succeeded boolean not null

maxbond · 4h ago

It's unlikely the boolean will result in better utilization, the savings will probably be consumed by padding. Most people don't know how to use structure packing to create a row which is actually smaller after it's been padded (though it's not very hard, anyone could learn). Columns are generally ordered by which features were shipped first and not by alignment (as is necessary to minimize padding).

I do try my best to pack my columns, but it's a fragile and likely premature optimization. Better to opt for something defensive at a cost of like, 7 bytes per row (in Postgres).

seafoamteal · 9h ago

I think in that situation, you could have an enum value that contains Bear and whatever other categories you are looking at.

ZYbCRq22HbJ2y7 · 9h ago

Sure, but this was for demonstration purposes showing that some data has other meaning that doesn't have an instantiation state dependent on time.

Lionga · 9h ago

All this general advice is quite useless and needs millions of asterix.

Good system design is designing a system that works best for the problem at hand.

oftenwrong · 6h ago

Good system design is designing a system that works good.

FrankChalmers · 8h ago

That's even more general and requires another million asterisks.

urquhartfe · 8h ago

This is an utterly fatuous statement

gmm1990 · 2h ago

I seem to gravitate towards nosql type databases, defining tables in a ddl and then again in the code seems repetitive, and slows down changes. But the idea would be that the code is what defines the table. It'd be nice though to hear some of the drawbacks of this. Maybe for very relational things it makes sense to be able to write join queries so data is completely repeated, but my understanding would be that most data base engines would already compress that repeated info pretty well.

com · 10h ago

The advice about logging and metrics was good.

I had been nodding away about state and push/pull, but this section grabbed my attention, since I’ve never seen it do clearly articulated before.

dondraper36 · 10h ago

The logging part is spot on. It has happened so many times when I thought, "Oh, I wish I had logged this.", and then you face an issue or even an incident and introduce these logs anyways.

bravesoul2 · 10h ago

It is a balance. Too many logs cost money and slow down log searches both for the search and the human seeing 100 things on the same trace.

jillesvangurp · 10h ago

The trick here is to log aggressively and then filter aggressively. Logs only get costly if you keep them endlessly. Receiving them isn't that expensive. And keeping them for a short while won't break the bank either. But having logs pile up by the tens of GB every day gets costly pretty quickly. Having aggressive filtering means you don't have that problem. And when you need the logs, temporarily changing the filters is a lot easier than adding a lot of ad hoc logging back into the system and deploying that.

Same with metrics. Mostly they don't matter. But when they do, it's nice if it's there.

Basically, logging is the easy and cheap part of observability, it's the ability to filter and search that makes it useful. A lot of systems get that wrong.

bravesoul2 · 10h ago

Nice. I'm going to read up more about filtering.

dondraper36 · 10h ago

Yeah, absolutely. But the author's idea of logging all major business logic decisions (that users might question later) sounds reasonable.

bravesoul2 · 10h ago

Yes. I like the idea of assertions too. Log when an assertion fails. Then get notified to investigate.

bravesoul2 · 10h ago

Yes. Everyone should spent the small amount of time getting some logging/metrics going. It's like tests, getting from 0-1 test is psychologically hard in a org but 1-1000 then becomes "how did I live without this". Grafana has a decent free tier or you can self host.

bravesoul2 · 10h ago

He doesnt seem to mention Conway or team topology which is an important part of system design too.

dondraper36 · 10h ago

Well, as sad as it is, such advice is often applicable to new projects when you still have runway for your own decisions.

For mostly political reasons, if you are onboarded to a team with a billion microservices and a lot of fanciness, it's unlikely that you will ever get approval or time to introduce simplicity. Or maybe I just got corrupted myself by the reality where I have to work now.

bravesoul2 · 10h ago

There is definitely a wood for the trees issue at bigger companies. I doubt there is an architect who understands the full system to see how to simplify it. Hard to even know what "simpler" looks like.

jillesvangurp · 9h ago

You should adapt your team to the architecture, not the other way around.

My former Ph.D. supervisor who moonlights as a consultant on this topic uses a nice acronym to capture this: BAPO. Business, Architecture, Process, and Organization. The idea is to end up with optimal business, an optimal architecture & design for that business, the minimum of manual processes that are necessitated by that architecture, and an organization that is efficiently executing those processes. So, you should design and engineer in that order.

Most companies do this in reverse and then end up limiting their business with an architecture that matches whatever processes that their org chart necessitated years ago in a way that doesn't makes any logical sense whatsoever except in the historical context of the org chart. If you come in as a consultant to fix such a situation, it helps understanding that whatever you are going to find is probably wrong because of this reason. I've been in the situation where I come in to fix a technical issue and immediately see that the only reason the problem exists is the org chart is bullshit. That can be a bit awkward but lucrative if you deal with it correctly. It helps asking the right questions before you get started.

Turning that around means you start from the business end (where's the money coming from?, what value can we create?, etc.), finding a solution that delivers that and then figure out processes and organizational needs. Many companies start out fairly optimal and then stuff around them changes and they forget to adapt to that.

Having micro services because you have a certain team structure is a classic mistake here. You just codified your organizational inefficiency. Before you even delivered any business value. And now your organizational latency has network latency to match that. Probably for no good reason other than that team A can't be trusted to work with team B. And even if it's optimal now, is it going to stay optimal?

If you are going to break stuff into (micro) services, do so for valid business/technical reasons. E.g. processing close to your data is cheaper, caching for efficiency means stuff is faster and cheaper, physically locating chunks of your system close to the customer means less latency, etc. But introducing network latency just because team A can't work with team B, is fundamentally stupid. Why do you even have those teams? What are those people doing? Why?

ChrisMarshallNY · 9h ago

A lot of what I have done, is design subsystems —components, meant to be integrated into larger structures. I tend to take a modular approach (not microservices —modules).

The acronym I use is “S.Q.U.I.D”[0] (Simplicity, Quality, Unambiguity, Integrity, Documentation).

But most of the stuff I’ve done, is different from what he’s written about, so it’s probably not relevant, here.

[0] https://littlegreenviper.com/itcb-04/#more-4074

pphysch · 2h ago

> My former Ph.D. supervisor who moonlights as a consultant on this topic uses a nice acronym to capture this: BAPO. Business, Architecture, Process, and Organization.

It's funny and not surprising that this came from an academic/consultant combo.

Business/mission/product should come first, absolutely.

But prematurely designing your system architecture and automations before your processes emerge is a sure-fire way to overengineer and waste everyone's time, except the consultant you are paying by the hour.

tra3 · 59m ago

Lots of good advice that I could’ve used 20 years ago. And like all good advice I would’ve ignored it back then.

hks0 · 5h ago

The article starts by criticizing generic rules that come without any context:

> Even good system design advice can be kind of bad. I love Designing Data-Intensive Applications, but I don’t think it’s particularly useful for most system design problems engineers will run into.

But continues to do the same throughout the rest of its advices. It also says:

> ... Drawing the line here is a judgment call and depends on specifics,

And immediately mentions:

> but in general I aim to have my tables be human-readable ...

Which to me reads as "I'm going to ignore the difference of the context everywhere and instead apply mine for everyone, and I'm going to assume most of the wolrd face the same problems as me". It's even worse than the book being criticized in the beginning, as the book at least has "Data-Intensive" in its title.

This is quiet easily fixable. The author can describe the typical scenario they are working with on a day-to-day basis. Do they work with 10 users a day? 100? 10,000,000? What is the traffic? How many engineers? What's the situation of the team/company; do FIXMEs turn into fixes or they become it's a feature? And so on.

In the end, without setting a baseline, a lot of engineers will start pointing fingers at each other dismissing the opposite ideas because it doesn't fit their situation. The reasoning might be true, but before that, it is "irrelevant", hence any opposition to or defending of it.

lutzh · 5h ago

The only thing I know about “good system design” is that it doesn’t exist in the abstract. Asking whether an architecture is good or bad is the wrong question. The real question is: Is it fit for purpose? Does it help you achieve what you actually need to achieve?

I could nitpick individual points in the article, but that misses the bigger issue: the premise is off.

Don’t chase generic advice about good or bad design. First understand your requirements, then design a system that meets them.

msiyer · 5h ago

... that is how you achieve a good design (for the time being).

tetha · 9h ago

The distinction of stateful and stateless is one of the main criteria how we're dividing responsibilities between platform-infra and development.

I know it's a bit untrue, but you can't do that many things wrong with a stateless application running in a container. And often the answer is "kill it and deploy it again". As long as you don't shred your dataset with a bad migration or some bad database code, most bad things at this level can be fixed in a few minutes with a few redeployments.

I'm fine having a larger amount of people with a varying degree of experience, time for this, care and diligence working here.

With a persistence like a database or a file store, you need some degree of experience of what you have to do around the system so it doesn't become a business risk. Put plainly, a database could be a massive business risk even if it is working perfectly... because no one set backups up.

That's why our storages are run by dedicated people who have been doing this for years and years. A bad database loss easily sinks ships.

mrkeen · 7h ago

> but you can't do that many things wrong with a stateless application running in a container

> As long as you don't shred your dataset with a bad migration or some bad database code, most bad things at this level can be fixed in a few minutes with a few redeployments.

At some point between these statements you switched from stateless to stateful and I can't follow the rest of the argument.

tetha · 2h ago

If you mess up your application code in a stateless container, that's boring. Roll code back, and you're back where you want to be. This is stateless and easy.

If you introduce a migration like "UPDATE billing SET prices = 0 ; WHERE something < 5", that's an entirely valid migration, but you mess up your state and then everyone is in a world of pain. This could, however, still be caught by various code review strategies, incremental rollouts and a large number of good development practices.

This is still easy, you can catch it before it hits prod so you don't have to fix prod.

And prod could still be fixed if your database layer manages backups, just with a day or two of downtime. If you don't have backups, you may have permanently lost information, which could kill the company.

feyman_r · 4h ago

If you want to learn more about good system design at an abstract level (not just online), cannot recommend Systemantics[1] by John Gall enough. I wish all engineers get an opportunity to read it.

[1] https://en.m.wikipedia.org/wiki/Systemantics

dondraper36 · 4h ago

I enjoyed reading this book (it's a short one), even though the prose is very, well, special :)

0wis · 2h ago

It is exactly what makes the difference between good and bad experience, both for users and engineers. A well designed system is both easy to use and to maintain or improve. It looks simple, but it is not. It’s both leadership and craftsmanship at its peak.

pelagicAustral · 8h ago

I can definitely feel the "underwhelming" factor. I've been working for +10 years on government software and I really know what an underwhelming codebase looks like, first off, it has my fucking name on it.

agentultra · 4h ago

Great article. A lot of very standard practices. Or at least… should be.

One thing that I often add is the people interacting with the system. They’re a part of it too. Most people don’t operate in an atomically consistent world; a lot of business processes are eventually consistent. But you do need to know where you have to have atomic operations! It depends on where the user expects it.

Systems thinking is very useful. From how your software is deployed to how the people using it in their work. Always be thinking about these things.

jagged-chisel · 2h ago

Lots of comments here decrying unnecessary complexity and the depressing reality of job interviews around the subject.

And I’m wondering why investors tolerate the expense: it’s surprising how much you can get done with simplicity and a small focused team.

It must be something about perception when they’re ready to sell the company.

ramon156 · 3h ago

> I’m often alone on this. Engineers look at complex systems with many interesting parts and think “wow, a lot of system design is happening here!”

Whenever I read something like this I feel so confused. Who actually calls themselves an engineer when they have no idea what they're talking about. Ignorant confidence is such a useless personality trait.

mrkeen · 1h ago

Hackers can add weight to an aircraft as fast - if not faster - than formally-trained engineers.

As long as we keep measuring LOCs or features added, there will always be jobs for them.

dondraper36 · 3h ago

Unless it's encouraged by the modern technical interviewing culture, which it partly is.

codr7 · 5h ago

Replacing booleans with timestamps might be a good idea sometimes, presenting it as The Solution isn't very constructive imo.

Adding a separate table where the presence of a record means 'true' allows recording related state without complicating the main table.

And sometimes a boolean is exactly what you want.

wavemode · 5h ago

the article presents that as an example of bad advice

firesteelrain · 2h ago

As a system architect, software engineer and systems engineer, I see these posts and what is called system design seems to intermix systems design with software design (being that software described herein is a lower level component of the overall system)

StevenWaterman · 2m ago

I'm glad I'm not the only one thinking that. These are such minutiae. Where's the discussion about humans? They're probably the most important part of your system, and the most chaotic, and the part that needs the most careful design.

It's hinted at a little bit in the OP, with:

> What does good system design look like? I’ve written before that it looks underwhelming

This is because there are humans in your system! Other developers! You in the future! You have to resort to heuristics like "simple == good" because you're only looking at a small part of the whole system.

And zoom out even more, you get to the actual users. How do they interact with the system? If you implement a rate limiter, how do the users respond when they hit it? Do they just spam-refresh the page? Do they develop weird superstitions about it? Do they spam-call your phone support lines? Does your response to a thundering herd anticipate the second-order impact of your phone support lines being DDOSed?

magnio · 10h ago

I think it's a very good article. Even if you disagree with some of the individual points in it, the advice given are very concrete, pragmatic, and IMO tunable to the specifics of each project.

On state, in my current project, it is not statefulness that causes trouble, but when you need to synchronize two stateful systems. Every time there's bidirectional information flow, it's gonna be a headache. The solution is of course to maintain a single source of truth, but with UI application this is sometimes quite tricky.

klabb3 · 7h ago

Yes it’s total madness to synchronize and replicate complex state that logically belongs together. This is why microservices are such hot garbage. Well, how people tend to use them anyway.

Monotonic state is also better than mutable state. If you must distribute state, think ownership. Who owns it? Eg Theres nothing necessarily wrong with having state owned by eg a mobile client that can be adjusted by the user. Then you can sync it to the backend if you want, but they are only a reader/listener, and should never try to control it directly.

vishnugupta · 5h ago

I highly recommend Boring Technology[1]. It is an enjoyable read and most of the advices are actionable.

[1] https://boringtechnology.club

QuadrupleA · 3h ago

Also be careful not to reach for system design when you only need software design: https://lukerissacher.com/blog/optimizing_your_web_app

nasretdinov · 6h ago

I agree with most of the stuff written in the article (quite a rare thing I must admit :)). But one thing I'd say is a bit outdated: in general whether or not to read from replica is the same decision as whether or not to use caching: it's a (pretty significant) tradeoff. Previously you didn't have much of a choice due to hardware being quite limited. Now, however, you can have literally hundreds of CPU cores, so all those CPUs can very much be busy at work doing reads. Writes obviously do have an overhead, _but_ note that all writes are eventually serialised, _and_ replica needs to handle them as well anyway

thisbeensaid · 7h ago

Since the author praises proper use of databases and talks about event bus, background jobs and caching, I highly recommend to check out https://dbos.dev if you have Python or TypeScript backends. DBOS nicely solves common challenges in simple and complex systems and can eliminate the need for running separate services such as Kafka, Redis or Celery. The best: DBOS can be used as a dependency and doesn't require deploying a separate service.

Very recently discussed here a week ago: https://news.ycombinator.com/item?id=44840693

jpitz · 1h ago

>You have two options: fail open and let the request through, or fail closed and block the request with a 429.

If the metaphor of a software circuit breaker is meant to emulate an electrical circuit breaker, then it seems to me that these two are inverted. Whenever a physical circuit breaker is open, it is not dangerous and not passing current.

necessary · 3h ago

Excellent article. In this vein, are there any books, articles, or other media that we can learn more of these sorts of principles from?

gethly · 7h ago

Actually event-sourcing solves most of the pains - events, schema, push/pull, caching, distribution... whatever. The downside is that it is definitely not suitable for small projects and the overhead is substantial(especially during the development stage when you want to ship the product as soon as possible). On the other hand, once you get it going, it's an unstoppable beast.

mexicocitinluez · 6h ago

There are some tools that try to solve this (MartenDb, for instance), but I wish there was an easier to way to integrate a system where some parts us ES and some parts don't.

Almost all the tools I've seen are either fully event-sourced or have nothing to do with event-sourcing. There aren't a ton of in-betweens.

gethly · 4h ago

ES is at the core of the system where it is being used, so there really is no in-between option here. I've built two production ES systems and I was toying with few ideas to make it more DX friendly by using json as core of any entity/object and using json patch for events but in the end it made no sense because ES must be absolutely strict on schema, which evolves over time, and data types. You must be able to process events that might be a decade old and for objects that no longer exist. There is no wiggle room. Hence the aforementioned overhead. I do not know MartenDb but in essence every db today uses ES as that is how transactions work, except the event log is discarded after the commit. But either way, ES on db level is meaningless, except maybe being able to use it as actual log for auditing purposes but you won't be able to process it by any means as schema changes over time and db schema has little to do with the application itself anyway.

bubblebeard · 7h ago

Very good article, right on point!

I do wonder about why the author left out testing, documentation and qa tool design though. To my mind, writing a proper phpcs or whatever to ensure everyone on the team writes code in a consistent way is crucial. Without documentation we end up forgetting why we did certain things. And without tests refactors are a nightmare.

dondraper36 · 7h ago

Especially given that generating documentation and tests (of course, with manual revision) is so much faster with, say, Claude Code.

lysecret · 4h ago

The designing data intensive Applications we actually needed (nothing against the original but this one is definitely more practical)

rekabis · 48m ago

I found myself truly confused by this one - does this actually need stating? Do people actually re-read immediately after a write? Provided you got confirmation that a write was successful and the data doesn’t have anything that an SQL trigger would change, what would be the point of an immediate read instead of just using the DB “successfully written” response as a go-ahead to just update the in-memory data?

StevenWaterman · 7h ago

What do you call system design, when it's referring to the design of systems in general, and not just computer services?

As in:

- writing a constitution

- designing API for good DX

- improving corporate culture

I intuitively want to call all of those system design, because they're all systems in the literal sense. But it seems like everyone else uses "system design" to mean distributed computer service design.

Any ideas what word or phrase I could use to mean "applying systems thinking to systems that include humans"

jdlshore · 1h ago

Take a look at Systems Thinking: https://en.m.wikipedia.org/wiki/Systems_thinking

esafak · 5h ago

Organization design. https://en.wikipedia.org/wiki/Organizational_architecture

dennisy · 9h ago

This post has some good concepts, but I do not feel it helps you design good systems. It iterates options and primitives, but good design is when and how you apply them, which the post does not provide.

dondraper36 · 9h ago

But isn't that type of advice the best we can have? Having read Designing Data Intensive Applications (DDIA) and some system design interview-focused books (like those from Alex Xu), I have noticed two types of resources:

* Fundamental books/courses on distributed systems that will help you understand the internals of most distributed systems and algorithms (DDIA is here, even though it's not even the most theoretical treatment)

* Hand-wavy cookbooks that tend to oversimplify things, and (I am intentionally exaggerating here) teach to reason like "I have assumed a billion users, let's use Cassandra"

I liked the article for its focus on real systems and the sensible rules of thumb instead of another reformulation of the gossip protocol that very few engineers will ever need to apply in practice themselves.

rekabis · 1h ago

> you’re a terrible engineer if you ever store booleans in a database

Anything like this is trivially dismissible as absolute hogwash. It’s a shame that titles like this actually get the clicks needed to encourage more bullshite in the same vein.

mgaunard · 6h ago

Seems biased towards websites, which are mostly easy CRUD.

mattlondon · 7h ago

There was an article here recently about how to write good design docs: the TL;DR for that was basically your design doc should make your design seem obvious. I think that is the same conclusion here - good design is simple, straightforward design with no real surprises.

Wholly agree.

sama004 · 6h ago

https://news.ycombinator.com/item?id=44779428

robsalasco · 3h ago

anyone can recommend me a good book about systems design?

hungryhobbit · 3h ago

This is nonsense masquerading as advice. "Add indexes ... but don't add too many" is a perfect example. It's 100% correct ... and also 100% something no one can actually change their actions based on ... which means it's also 100% worthless advice.

ninetyninenine · 3h ago

A lot of backend engineers are obsessed with infrastructure.

I've seen engineers have servers spin up lambdas to do async jobs that are just database calls.

So the server essentially waits for lambda which waits for a database. Why? Why can't you just have the server wait for the database?

It's like I'm going to pay a person to wait in line for me while I wait for him. Why? You're waiting anyway!? And you just paid to involve an additional person to unnecessarily wait with you for what?

When I told the engineer that you can just spin up a coroutine or like maybe you can allocate some cores before you spin up a new server... he looked at me like I was crazy. He said I was doing things so low level it was like assembly language programming. Going to low level and that lambdas were so cheap it was inconsequential.

If you're reading this and you're thinking, wow that other engineer is right, well this quote from the article refers to you:

"I’m often alone on this. Engineers look at complex systems with many interesting parts and think “wow, a lot of system design is happening here!” In fact, a complex system usually reflects an absence of good design."

usernamed7 · 9h ago

One thing i would add, is that a well designed system is often one that is optimized for change. It is rare that a service remains static and unchanging; browsers and libraries are regularly updated, after all. Thus if/when a developer takes on a feature ticket to add or change XYZ, it should be easy to reason about and have predictable side-effects of how that change will impact the system, and ideally be easy to change as well.

bbkane · 6h ago

"optimized for change" really only works well if you can predict the incoming changes.

Common tools used for this "optimization" often raise the complexity and lower the performance of the system.

For example, a db with a single table with just a key and a value is very flexible and "optimized for change" but it offers lower performance (in most cases) and is harder to reason about.

I also frequently see people (me too) prematurely make abstractions (interfaces, extra tables, etc) because they're "optimizing for change". Then that part never changes OR it changes in a way that their abstraction doesn't abstract over OR they figure out a better abstraction later on when the app has matured a bit. Then that part of the code is at best wasted space (usually it needs to be rewritten yet no one gets time to do that).

Of course, it's also foolish to say "never abstract". I almost always find it worth it to abstract over I/O, just so I can easily add logging, dual writes, or mock it. And when a change is obviously coming down the line it makes sense to plan for it.

But usually I'm served best by trying to keep most of my computation pure functions (easy to test), doing as little as possible in the I/O path (it should just persist or print stuff so I can mock it) and otherwise write obvious "deletable" code that does one thing so I can debug it and, only if necessary, replace with a better abstraction if I need to.

ninetyninenine · 3h ago

>"optimized for change" really only works well if you can predict the incoming changes.

functional programming is the paradigm most optimized for modularity and therefore change. It's the best we have but it's limited in scope.

mrkeen · 7h ago

And you get this part somewhat for free if you're actually testing as you're going.

When my service wants to store and retrieve as part of its behaviour, of course I'm going to back it with a hashmap first.

Once I know it fulfills its business logic I'll start fiddling with hard-to-change stuff like DB schemas and migrations.

And having finished and tested the logic, I'll have a much better idea of the actual access patterns so I can design good tables & indexes.

paintistjksgf · 8h ago

While I think one shouldn't paint themselves into a complete corner, but optimizing for changeability signals a lot of abstract complexity to me, which takes time, which in turn takes money.

We can usually have many more cheaper dedicated services for doing a thing that accounts for more good than a single service that grows to become more and more omnipotent. It also means you're much likely to win contracts because you can price yourself competitively

setnone · 7h ago

Some systems are designed to last vs. designed to adapt

whodidntante · 7h ago

Never write an article about good system design.

In all seriousness, this is an extraordinary subtle and complex area, and there are few rules.

For example, "if you need data from multiple tables, JOIN them instead of making separate queries and stitching them together in-memory" may be useful in certain circumstances. For highly scalable consumer systems, the rule of "avoid joins as much as possible" can work a lot better.

There is also no mention of how important it is to understand the business - usage patterns, the customers, the data, the scale of data, the scale of usage, security, uptime and reliability requirements, reporting requirements, etc.

graphviz · 5h ago

And by "system" we mainly meant "transactional website."

whodidntante · 3h ago

And this is the point. You need to narrow the scope to make something like this useful. Writing a paper on "Good transportation design" is kind of meaningless. Do you mean cars, trucks, boats, planes, spacecraft, scooters, fighters, tanks ? Do you mean roadways that can accommodate some subset ?

If you mean "transactional websites", and assuming you mean something like product catalogs and being able to purchase, that narrows it down quite a lot.

Or does it ?

For the majority of use cases, Craigs list, ebay, Amazon are the best fit.

Next in number of use cases are Wix/Square/etc where you design your UI.

Then comes all in one systems with UI/ORM based on Python/Ruby/etc where you need to design your own DB schema and UI, but the "design" is already done for you.

The next step is custom designed systems like the one the article talks about, where complete off the shelf is not suitable

And then there are the highly scalable systems

The article is perfectly fine if we are discussing custom designed but not necessarily the highest in scalability.