The database is not your domain model, it is the storage representation of your domain model on disk. Your REST/grpc/whatever API also isn’t your domain model, but the representation of your domain model on the wire.
These tools (database, protocols) are not the place to enforce making invalid states un-representable for reasons the article mentions. You translate your domain model into and out of these tools, so code your domain model as separately and as purely as you can and reject invalid states during translation.
lock1 · 5h ago
+1
In cases like electronics & protocols, it's very often a good idea to add an extra "reserved & unused" section for compatibility reasons.
These representations need not be 1:1 with the domain model. Different versions of the model might reject previously accepted representation in case of breaking changes. It's up to the dev to decide on which conflict reconciliation strategy they should take (fallback values, reject, compute value, etc).
Working with a precise domain model (as in, no representable invalid states) is way more pleasant than stringly-typed/primitives mess. We can just focus on domain logic without continuously second-guessing whether a string contains valid user.role values or it contains "cat".
Zanfa · 4h ago
I disagree completely.
You’re already paying the cost of abstraction for using a certain database or protocol, so get the most bang for your buck. If you can encode rules in a schema or a type, it’s basically free, compared to having to enforce it in code, hoping that future developers (yourself or others) will remember to do the same. It just eliminates and entire universe of problems you have to deal with otherwise.
Also, while relaxing constraints is usually easy or at least doable, enforcing new constrains on existing data is impossible in practice. Never seen it done successfully.
The only exception to this rule I typically make is around data state transitions. Meaning that even when business rules dictate a unidirectional transition, it should be bidirectional in code, just because people will click the wrong button and will need a way to undo “irreversible” actions.
rich_sasha · 4h ago
I'm not sure I agree.
My perfunctory reading is thus: first you couple your state representation to the business logic, and make some states unrepresentable (say every client is category A, B or C). Maybe you allow yourself some flexibility, like you can add more client types.
Then the business people come and tell you some new client is both A and B, and maybe Z as well, quickly now, type type type.
And that there's a tradeoff between:
- eliminating invalid states, leading to less errors, and
- inflexibility in changes to the business logic down the way.
Maybe I misunderstood, but if this is right, then it's a good point. And I would add: when modelling some business logic, ask yourself how likely it is to change. If it's something pretty concrete and immovable, feel free to make the representation rigid. But if not, and even if the business people insist otherwise, look for ways to retain flexibility down the line, even if it means some invalid states are indeed representable in code.
lock1 · 4h ago
IMO, rather than focusing on flexibility vs inflexibility when deciding "tight domain model" or not, it's much better to think about whether your program requirement can tolerate some bugs or not.
Say we have a perfect "make illegal states unrepresentable" model. Like what you said, it's kind of inflexible when there are requirement changes. We need to change affected codes before you can even proceed to compile & run.
On the other hand, an untyped system is flexible. Just look at Javascript & Python ecosystem, a function might even contain a somewhat insane and gibberish statement, yet your program might still run but will throw some error at runtime.
Some bugs in programs like games or average webapp don't matter that much. We can fix it later when users report the bug.
While it's probably better to catch whether an user can withdraw a negative balance or not at compile time, as we don't want to introduce "infinite money glitch" bug.
rich_sasha · 3h ago
Types are part of the picture, sure. But there's more. Essentially, I'd say, if your whole business logic representation needs a major refactor every time the underlying changes, then you better enjoy refactoring.
lmm · 5h ago
Yep. "Make invalid states unrepresentable" pairs well with "parse, don't validate"; the states that are valid for your business domain are (maybe) not the same as the states that are valid for your storage or wire format, so have different representations for these things.
motorest · 5h ago
> To me, this article misses the mark.
Yes, I agree. The blogger shows a fundamental misunderstanding of what it means to "make invalid states unrepresentable". I'll add that the state machine example is also pretty awful. The blogger lists examples of usecases that the hypothetical implementation does not support, and the rationale to not implement it was that "this can dramatically complicate the design". Which is baffling as the scenario was based on complicating the design with "edge cases", but it's even more baffling when the blogger has an epiphany of "you need to remain flexible enough to allow some arbitrary transitions". As if the whole point was not to allow some transitions and reject all other that would be invalid.
The foreign key example takes the cake, though. Allowing non-normalized data to be stored in databases has absolutely no relation with the domain model.
I stopped reading after that point. The blog post is a waste of bandwidth.
lock1 · 4h ago
To be fair to Sean (post author), it does kind of make sense if you view "make invalid states unrepresentable" from distributed system perspective (Sean's blog tends to cover this topic) as it way more painful to enforce there.
kqr · 4h ago
The thing about making invalid states unrepresentable is that we are often overconfident in what counts as invalid. What counts as valid and invalid behaviour is given by the requirements specification, but if there's anything that changes frequently throughout development, it's the requirements. What's invalid today might be desirable tomorrow.
Thus, we have to be careful at which level we make invalid states unrepresentable.
If we follow where Parnas and Dijkstra suggested we go[1], we'll build software with an onion architecture. There, the innermost layers are still quite primitive and will certainly be capable of representing states considered invalid by the requirements specification! Then as we stack more layers on top, we impose more constraints on the functionality that can be expressed, until the outermost layer almost by accident solves the problems that needed to be solved. The outermost layers are where invalid states should be unrepresentable.
What often happens in practice when people try to make invalid states unrepresentable is they encode assumptions from the requirements specification into the innermost layers of the onion, into the fundamental building blocks of the software. That results in a lot of rework when the requirements inevitably change. The goal of good abstraction should be to write the innermost layers such that they're usable across all possible variations of the requirements – even those we haven't learned yet. Overspecifying the innermost layers runs counter to that.
In the example of the app market place, any state transitions that are well-defined should be allowed at the inner layer of the abstraction that manages state transitions, but the high-level API in which normal transitions are commanded should only allow the ones currently considered valid by the requirements.
Good insight and goes along with keeping abstraction layers from leaking in any direction best you can.
slowking2 · 6h ago
Making invalid states unrepresentable may be a great idea or terrible idea depending on what you are doing. My experience is all in scientific simulation, data analysis, and embedded software in medical devices.
For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.
For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.
I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.
Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)
shepherdjerred · 3h ago
> For scientific simulations, I almost always want invalid state to immediately result in a program crash.
If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.
Otherwise, you're just talking about failing fast
cherryteastain · 2h ago
> _impossible_ for your program to transition into an invalid state at runtime
Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].
The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.
I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.
danpalmer · 6h ago
One of the main pushbacks in this article is on the difficulty of later edits once the domain changes. The "make invalid states unrepresentable" mantra really came out of the strongly typed functional programming crowd – Elm, F#, Haskell, and now adopted by Rust. These all have exceptionally strong compilers, a main advantage of which is _easier refactoring_.
Which side of the argument one falls on is likely to be heavily influenced by which language they're writing. The mantra is likely worth sticking to heavily in, say, Haskell or Rust, and I've had plenty of success with it in Swift. Go or Java on the other hand? You'd probably want to err on the side of flexibility because that suits the language more and you can rely on the compiler less during development.
b_e_n_t_o_n · 6h ago
Perhaps it's not really language but the type of programs developers use these languages for? Open vs closed systems, heavy/light interactions with the outside world, how long they're maintained, how much they change, etc.
frizlab · 5h ago
I can tell you the language is a huge part of it. I used to code in ObjC and now I’m using Swift; refactoring is easy in Swift and was a pain in ObjC.
I know I can trust my Swift code. Usually when it compiles it works because I try to, and often can, make invalid states unrepresentable. My ObjC code was always full of holes because doing so was not so easy (or not possible at all)…
danpalmer · 5h ago
That might be another factor. You could say that a program in a stable ecosystem doesn't need changing so should priorities strictness over flexibility. However even in a changing ecosystem, rather than building in flexibility that allows for incorrect states, you can raise the level of abstraction and build in extension points that still retain nearly the same strictness while still giving you the flexibility to change in the future in the ways that you will need.
NooneAtAll3 · 4h ago
easier refactoring - for the cost of having much more of it
flexibility saves the effort and allows doing more of the Actual Things
taberiand · 4h ago
The Actual Things being mostly fixing technical debt introduced over the years by not making invalid states unrepresentable
Mikhail_Edoshin · 2h ago
The accounting practice in Russia (and, I guess, in other countries) has a concept of a correcting transaction ("storno"): it is an entry in the books made specifically to undo a mistake. To stand out these entries are made in red ink. (Obviously, it is an old rule.) So yes, a data model needs a way to make an arbitrary change to correct a mistake, that is right. But that is about it. To expand it to "let's place no restrictions" is too far.
A type by definition is a set of restrictions, and normally we go by making these restrictions more and more elaborate. The "Parsing Techniques" book has a nice analogy about the expressive power of different grammars: more powerful ones do not describe larger sets of sequences, but carve out more and more specific subsets of valid sequences out of the pool of all possible sequences (which itself is trivial to define).
A type by definition is a set of possible states, if we "allow invalid states", then we've just defined a different type, a wider one. Whether it is what you need depends on the situation. E.g. we add entries to a database and place restrictions on them. But the user may take a long time to compose an entry. Fine; let's add a new type, "draft", that is similar to the primary entry type but does not have their restrictions. These drafts are stored separately so the user may keep editing them until they are ready. But they do not go into the part of the system that needs proper entries.
jeschiefer · 4h ago
"some invalid states" - what does this mean, please? How do I constrain the "some" part? If you can't, you might as well say "mostly invalid states", which is what tends to happen in practice.
The whole point of state machines/type constraints/foreign key constraints/protocol definitions is that there is a clear boundary between what is allowed and what is not. And I would argue that this is what makes life easier, because you can tell one from the other. And with the right tooling, the compiler or some conformance tool will tell you that the change you just introduced, breaks your implementation in 412 places.Because this allows me to stop and think whether I can make a different change that only creates 117 problems. Or estimate the cost of the changes and compare it with the business benefit, and have appropriate conversations with stake holders. Or find out that my domain model was completely wrong to begin with and start refactoring my way out of this. And all of this before the first successful compile.
For me, this gives me maximum flexibility, because I have a good idea of the impact of any changes on the first day of development. This does require appropriate tooling though, like you would find in Ocaml, F#, Rust, State Machine Compilers, ... <insert your favourite tool here>.
pyrale · 3h ago
Aside from the flaws of the article, one provided example annoys me:
> What happens when you need to account for “official” apps, which are developed internally and shouldn’t go through the normal review process?
There is a reason devs are advised to eat their own dogfood. Building a process-bypass for the users that are also the ones responsible for fixing the process is the easiest way to get a broken process.
000ooo000 · 3h ago
Yep. It's an example contrived to the point of silliness anyway. Like most problems, you would decompose it - not hamfist a bunch of new shit in, throw your hands in there because it's complex, then write a confused blog.
rapnie · 1h ago
I can recommend "Domain Modeling Made Functional" by Scott Wlaschin presented at KanDDDinsky 2019, that offers a very appealing example for making making invalid states unrepresentable. The example code is in F# but that is a perfect language to demonstrate the general idea. Best-practice I guess boils down to finding the right balance throughout the application design.
I mostly agree with the author but I think there's a very important distinction to make. It's important to think about this in two layers - the domain and the UI (whether that be a CLI, an API, a full blown GUI, etc). The author very clearly is talking about the domain layer, but the area where I see "Make invalid states unrepresentable" having the greatest benefit is in the UI layer. That is where you want fine grain control over what capabilities of your domain layer that you expose and is where you have a lot to gain by architecting your types/system/etc to be strict. It makes for easier to reason about code and reduces the amount of edge cases you need to QA. The constraints you put in the UI layer are product constraints, not technical ones. In my opinion, when you think about it that way, that adage isn't harmful but rather a very useful guide for technical folks to use when trying to mental model what product is asking you to build.
tialaramex · 3h ago
This article pulls a sleight of hand in which it tries to have states which are "invalid" and yet which are in fact properly handled - and so there's no reasonable sense in which they were in fact invalid. So it's just an appeal to think harder when designing a system.
A state enumeration becomes just free text, a count becomes just a number, and too bad now it's impossible to reason about the state of the system because everything is unknowable. Why is this app at the top of every chart? Oh, it has recorded negative one billion users and its state is just the URL of the app store, which causes mayhem in the scoring sub-routine.
danpalmer · 6h ago
The devil is, as always, in the details. I think "make invalid states unrepresentable" is a good thing to teach in order to push back against spaghetti code. State management is hard, and untangling bad state management is near impossible.
But of course, some flexibility is often (not always) needed in the long run, and knowing where to keep the flexibility and where to push for strictness is an important part of skills development as an engineer.
b_e_n_t_o_n · 6h ago
I've been thinking about this for a while. I don't use Clojure but I do follow Rich Hickey and have watched a few of his talks, where he makes similar points. Folks often talk about types as abstractions but they're actually concretions, they are concrete choices about what information you want to keep out of all possible choices. And like concrete in real life, it's costly to change the shape of something once it's set. This may or may not be an issue depending on where you program is running, in closed systems like a compiler it can be a great benefit, for open world systems which must use non-elegant models, must deal with state and time, must have effects and be affected, and must change in ways you can't predict, the cost is much higher.
I talk about types because most of the "make invalid states unrepresentable" dogma comes from proponents of languages with extensive static type systems, and the idea is to use the type system to define all forms of legal states.
ngruhn · 5h ago
> it's costly to change the shape of something once it's set
But in a dynamically typed language you also have that problem. Except there the shape is a hidden assumption and the type checker won't tell you all the places you have to change.
b_e_n_t_o_n · 1h ago
I think you generally make less assumptions about the shape of data in dynamic languages.
I find writing code in dynamic/statically typed languages very different from each other, and when I hear proponents of either complain about the other (types slow me down! / too many runtime errors!) I sort of assume they're writing code like they would in their preferred paradigm and running into walls.
yrand · 5h ago
The problem with making invalid states representable is that the app logic must be able to handle it, everywhere. Which means you have to always take it into account when reasoning about your app and implementing it, otherwise it will blow up in your face when the invalid state actually occurs. Which means your state machine is inherently larger and harder to reason about.
To illustrate what I mean, take the null pointer. It's essentially an Optional<T> and in a lot of languages it's involuntary, meaning any value is implicitly optional. What happens when something returns null, but in some place in your logic you don't take it into account? Null pointer dereference.
jraph · 5h ago
Yep. That's what I was going to comment.
I'm not convinced, because you have to deal with invalid state in some way or another in any case.
If you have to delete some record referenced by other records using foreign keys, you'll have to handle the complexity in any case, except if this is enforced by the database, you'll have no choice than to think about this upfront, which is good. It might lead you to handle this differently, for instance by not deleting the row but mark it as deleted, or by changing the id to some "ghost" user.
If you don't do this, all the complexity of not having to deal with the inconsistencies when you are creating them will have to be encoded in your code at write time, with null or exception checks everywhere.
Constraints encodes what safe assumptions your code make to be simpler. If an assumption needs to be relaxed, it's going to be hard to change things as you'll have to update all the code, but the alternative is to make no assumption at all anywhere, which is hard too and leads to code that's more complicated than necessary.
And then what does an invalid reviewer_id mean? Was this caused by a user deletion? Or a bug somewhere? Or some corruption?
Bonus: some commenters here write that it can depend on which programming language is used, I don't think it matters at all. You'll have to handle the null or dangling values in any case, whether the language guides you for this or not.
alkonaut · 3h ago
Let me rephrase it: "invalid states unrepresentable" should hold true, but the hard part is that these invalid states should can change. Your model should be flexible enough to allow changing what constitutes invalid states.
Any invalid state that cannot ever be a valid state (Not just that it's not valid today - you know it can't be valid tomorrow either) should be unrepresentable. A hard requirement.
Anything that is invalid today but could be valid tomorrow (One-off change, requirement change) should not be.
This makes sense to me and forces you to think up front about what "invalid" means. There are definitely hard errors you should make unrepresentable, even in domain models.
elevation · 6h ago
You also see on-the-wire protocols making invalid states unrepresentable through clever tricks. Consider RFC 3550 p.19:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| defined by profile | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| header extension |
| .... |
[...] The header extension contains a 16-bit length field that
counts the number of 32-bit words in the extension, excluding the
four-octet extension header (therefore zero is a valid length).
So for RTP extension headers, actual_num_bytes = (`length` + 1) * 4. A naive `length` field might indicate the number of bytes. But this would allow a header to indicate "zero" bytes (not possible.) So the '0' value is defined as 0 more than we already have, and since packets are supposed to only contain a multiple of 4 bytes, the units of the length field are defined as 32-bits.
While it isn't strictly harmful, one drawback of this approach is that if you perfectly bit-pack every field such that random noise can be interpreted as a well-formed packet in your protocol, it will be difficult to heuristically identify the protocol.
duskwuff · 5h ago
Another corollary is that, if you're designing a data compression format, you try to avoid having sequences in your compressed data which are invalid, or which a compressor would never emit. Either one is probably a waste of bits which you could use to make your representation more concise.
(Intentional redundancies like checksums are fine, of course.)
nine_k · 5h ago
> difficult to heuristically identify the protocol
This can as well be a desirable feature, depending on your design goals.
aaron_m04 · 6h ago
> With foreign key constraints, you have to delete all related records when a parent record is deleted.
I've also seen good advice that you should never delete anything from your DB, but rather put rows in a different soft-deleted state...
danpalmer · 6h ago
This was clearly not legal advice. Soft-deletes come with a lot of complexity at the application layer, more maintenance, more security risk, and require building out user data deletion processes.
Having a deleted data table is a slightly easier approach I've seen, but you still need to be aware about user and legal requirements around deleting data.
thayne · 5h ago
> Soft-deletes come with a lot of complexity at the application layer, more maintenance, more security risk, and require building out user data deletion processes.
That depends on your application and requirements. I've worked on situations where a soft delete, where any fields with sensitive customer data are overwritten with a placeholder or random data (for legal compliance reasons) was a lot simpler than doing a hard delete and leaving a bunch of dangling records with ids pointing to records that no longer exist.
And unless your data model is pretty simple, and you are ok with not having any kind of grace period before fully deleting user data, you'll probably need to build a user data deletion process either way.
danpalmer · 5h ago
That's fair, this can be an easier approach, however you do need to make sure that all fields get scrubbed, which can be hard to do as the codebase evolves over time and more fields or field semantics change. It may also leave a bunch of metadata – timestamps can be identifying, entity graphs can be identifying.
hobs · 6h ago
Exactly - there's a tension between keeping storage, performance, and cost.
Keep everything actually can make some things faster (though mostly slower) and give you a depth of information you didn't have, but has a big impact on storage and cost.
Most people pick some mix of tradeoffs - for the important things to audit keep every state change, for other ones get the backups out.
aabhay · 5h ago
I wish there were some actual stories here about tricky migrations and how they were handled, rather than defaulting to the naturally turf war framing here.
If my experience, writing code that is “future migration friendly” is one of the hallmarks of senior developer knowledge. It’s just the kind of thing that burrows into your brain, because most senior devs have seen, built, or maintained code with randomly cordoned off “WARNING NONSENSICAL THING HERE” style sections of massive codebases that are artifacts of various migration things.
For context about how we do it, we use a giant NoSQL single table design, with zero joins. All joining is handled in the application, which ends up forcing everyone to consider how to parallelize and lookahead rather than join. Additionally, any complex data is stored as row level binary blobs from a versioned schema, which also helps localize issues to a particular subtype rather than leaking it into all schemas (for example if you change what qualifies as an “update”, your “updatedAt” column doesn’t suddenly change meanings for all things.
DaiPlusPlus · 3h ago
The article is conflating tight-coupling with static-guarantees enforced by the type-system.
nine_k · 5h ago
The key part: "For most software, domain models are not real". Indeed, if your reality is ill-defined, you need miracles, aka invalid states, to handle certain practical cases. So it's more about admitting a design failure. One type of such failure may be putting wrong constraints on the model. You have to think hard about what can and cannot be, and often it's not a luxury your boss can afford. To allow representation of invalid states is to admit that you're going to be proven wrong eventually, which is not a wild exaggeration in a wide range of circumstances, alas. It's to plan for a mess, because a mess is inevitable.
(This whole approach reminds me of "I think therefore anything, due to the false antecedent".)
stared · 2h ago
I think the article confuses two things - constraints on states and on transitions.
If states are well-defined, it actually makes it EASIER to create an ad hoc transition that does not scramble data.
pluto_modadic · 5h ago
This reminds me that in protocol buffers you can deprecate and extend field names in a backwards compatible way and it's handled semi-sensibly. At the very least, it makes you think about schema migration (of HTTP fields in this case. DB fields and class fields are different...). If you aren't using practices that make e.g. adding another enum option to state sensible (e.g. in a language or linter that requires passing interface checks, class X must provide method Y)
throwaway81523 · 4h ago
The protobuf example seems best handled with version numbers in the messages. Designing protocols without them is almost an antipattern, especially when the designer says "there's no version number because I'm so smart, I'm going to get it right the first time". The examples that I remembered now seem to have been scrubbed from the web ;).
cadamsdotcom · 6h ago
There is another way that's a compromise: give corner cases a home.
The General Journal really only works because accounting entries are immutable: corrections are always new entries. So it's not clear if "give corner cases a home" works everywhere.
But if the principle does work for your use case, it lets the rest of the system be as strict as you like!
flanked-evergl · 2h ago
Why would you use such a smug title for your article if you want people to read it?
lykahb · 6h ago
It matters where the constraints live. Inside of a codebase they are easier to change. Updating the database schema would be harder. On the protocol level it may be impossible if not all parties can be updated. However, if the protocol is too loosely specified, it could create other problems.
000ooo000 · 3h ago
I have a feeling this article is just suggesting some invalid states should be valid states, but it doesn't know it.
zmgsabst · 5h ago
I’ve seen more errors from “pubilshed” strings than difficulties updating an enum.
The rest of it is arguing about what constitutes valid states, eg, commit messages when the user deletes or wrong-schema datagrams being partially utilized. Here’s the rub: if you don’t allow invalid states, you’re forced to actually have that discussion (“actually, messages don’t strictly require a user”; “only these fields are required, others are a maybe”) and those are then encoded in your software rules. They’re not things that happen informally, in undocumented ways.
Further, when you do allow invalid states, you have a typo disappear 1% of your messages into the “pubilshed” category.
pcthrowaway · 5h ago
Hmm this is interesting food for thought, but I mainly disagree with the "Foreign Key Constraints" section:
> A foreign key constraint forces user_id to correspond to an actual row in the users table. If you try to create or update a post with user_id 999, and there is no user with that id, the foreign key constraint will cause the SQL query to fail.
> This sounds great, right? A record pointing at a non-existent user is in an invalid state. Shouldn’t we want it to be impossible to represent invalid states? However, many large tech companies - including the two I’ve worked for, GitHub and Zendesk - deliberately choose not to use foreign key constraints. Why not?
> The main reason is flexibility. In practice, it’s much easier to deal with some illegal states in application logic (like posts with no user attached) than it is to deal with the constraint. With foreign key constraints, you have to delete all related records when a parent record is deleted. That might be okay for users and posts - though it could become a very expensive operation - but what about relationships that are less solid? If a post has a reviewer_id, what happens when that reviewer’s account is deleted? It doesn’t seem right to delete the post, surely. And so on.
So if you need to keep reviews by a user when the user can be deleted, you have a few options:
- Use a foreign key constraint but also allow nullable foreign key values (I believe DBMSes such as postgres and mysql support this)
- Create a record representing an orphaned user and move reviews to it when the user is deleted (generally, the reviews will then show as posted by user "DELETED", though you can also enforce different ways to display deleted user associations)
- Soft delete the users (in this case you also have to exercise caution around what ex-user data may be displayed where)
There are tradeoffs with each one; soft deletion of users would likely run afoul of Right to be Forgotten laws in some places. I'd tend to favour option 1 for anything that references something like a user account where the user may be deleted, but option 2 might be useful in situations as well.
I'd generally recommend against omitting foreign key constraints when it can be assumed databases backing your application will support them.
Sure, losing the FK constraint gives you more "flexibility" (I guess?) but introduces way too many footguns. The database and tooling are a massive help in avoiding so many other bugs.
Sure, maybe one day you decide reviews can have authorship associated with entities which are not users of your system (say you've acquired republication data rights from another review service whose users are not your service's users); in that case you'd need an application-level refactor and migration. You might need to add an author table, where the author can be a user of your service, or a reference to a user of another service . Then users of your service become possible authors, and all reviews need to be migrated to have an FK relation to the author table now rather than the user table.
In any case, the FK doesn't prevent you from changing how your data is structured, and I'd argue it greatly helps you avoid mistakes as you move through the migration.
LoganDark · 5h ago
The first example in the article of an application or whatever doesn't seem like a good example. I don't ever have some strict state machine in my data model that only has certain state transitions. I have a state, and certain transactions are available through API endpoints, and maybe internal apps have their own endpoints that can do things that the public cannot.
The database is not your domain model, it is the storage representation of your domain model on disk. Your REST/grpc/whatever API also isn’t your domain model, but the representation of your domain model on the wire.
These tools (database, protocols) are not the place to enforce making invalid states un-representable for reasons the article mentions. You translate your domain model into and out of these tools, so code your domain model as separately and as purely as you can and reject invalid states during translation.
In cases like electronics & protocols, it's very often a good idea to add an extra "reserved & unused" section for compatibility reasons.
These representations need not be 1:1 with the domain model. Different versions of the model might reject previously accepted representation in case of breaking changes. It's up to the dev to decide on which conflict reconciliation strategy they should take (fallback values, reject, compute value, etc).
Working with a precise domain model (as in, no representable invalid states) is way more pleasant than stringly-typed/primitives mess. We can just focus on domain logic without continuously second-guessing whether a string contains valid user.role values or it contains "cat".
You’re already paying the cost of abstraction for using a certain database or protocol, so get the most bang for your buck. If you can encode rules in a schema or a type, it’s basically free, compared to having to enforce it in code, hoping that future developers (yourself or others) will remember to do the same. It just eliminates and entire universe of problems you have to deal with otherwise.
Also, while relaxing constraints is usually easy or at least doable, enforcing new constrains on existing data is impossible in practice. Never seen it done successfully.
The only exception to this rule I typically make is around data state transitions. Meaning that even when business rules dictate a unidirectional transition, it should be bidirectional in code, just because people will click the wrong button and will need a way to undo “irreversible” actions.
My perfunctory reading is thus: first you couple your state representation to the business logic, and make some states unrepresentable (say every client is category A, B or C). Maybe you allow yourself some flexibility, like you can add more client types.
Then the business people come and tell you some new client is both A and B, and maybe Z as well, quickly now, type type type.
And that there's a tradeoff between:
- eliminating invalid states, leading to less errors, and
- inflexibility in changes to the business logic down the way.
Maybe I misunderstood, but if this is right, then it's a good point. And I would add: when modelling some business logic, ask yourself how likely it is to change. If it's something pretty concrete and immovable, feel free to make the representation rigid. But if not, and even if the business people insist otherwise, look for ways to retain flexibility down the line, even if it means some invalid states are indeed representable in code.
Say we have a perfect "make illegal states unrepresentable" model. Like what you said, it's kind of inflexible when there are requirement changes. We need to change affected codes before you can even proceed to compile & run.
On the other hand, an untyped system is flexible. Just look at Javascript & Python ecosystem, a function might even contain a somewhat insane and gibberish statement, yet your program might still run but will throw some error at runtime.
Some bugs in programs like games or average webapp don't matter that much. We can fix it later when users report the bug.
While it's probably better to catch whether an user can withdraw a negative balance or not at compile time, as we don't want to introduce "infinite money glitch" bug.
Yes, I agree. The blogger shows a fundamental misunderstanding of what it means to "make invalid states unrepresentable". I'll add that the state machine example is also pretty awful. The blogger lists examples of usecases that the hypothetical implementation does not support, and the rationale to not implement it was that "this can dramatically complicate the design". Which is baffling as the scenario was based on complicating the design with "edge cases", but it's even more baffling when the blogger has an epiphany of "you need to remain flexible enough to allow some arbitrary transitions". As if the whole point was not to allow some transitions and reject all other that would be invalid.
The foreign key example takes the cake, though. Allowing non-normalized data to be stored in databases has absolutely no relation with the domain model.
I stopped reading after that point. The blog post is a waste of bandwidth.
Thus, we have to be careful at which level we make invalid states unrepresentable.
If we follow where Parnas and Dijkstra suggested we go[1], we'll build software with an onion architecture. There, the innermost layers are still quite primitive and will certainly be capable of representing states considered invalid by the requirements specification! Then as we stack more layers on top, we impose more constraints on the functionality that can be expressed, until the outermost layer almost by accident solves the problems that needed to be solved. The outermost layers are where invalid states should be unrepresentable.
What often happens in practice when people try to make invalid states unrepresentable is they encode assumptions from the requirements specification into the innermost layers of the onion, into the fundamental building blocks of the software. That results in a lot of rework when the requirements inevitably change. The goal of good abstraction should be to write the innermost layers such that they're usable across all possible variations of the requirements – even those we haven't learned yet. Overspecifying the innermost layers runs counter to that.
In the example of the app market place, any state transitions that are well-defined should be allowed at the inner layer of the abstraction that manages state transitions, but the high-level API in which normal transitions are commanded should only allow the ones currently considered valid by the requirements.
[1]: https://entropicthoughts.com/deliberate-abstraction
For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.
For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.
I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.
Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)
If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.
Otherwise, you're just talking about failing fast
Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].
The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.
[1] https://en.m.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E...
I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.
Which side of the argument one falls on is likely to be heavily influenced by which language they're writing. The mantra is likely worth sticking to heavily in, say, Haskell or Rust, and I've had plenty of success with it in Swift. Go or Java on the other hand? You'd probably want to err on the side of flexibility because that suits the language more and you can rely on the compiler less during development.
I know I can trust my Swift code. Usually when it compiles it works because I try to, and often can, make invalid states unrepresentable. My ObjC code was always full of holes because doing so was not so easy (or not possible at all)…
flexibility saves the effort and allows doing more of the Actual Things
A type by definition is a set of restrictions, and normally we go by making these restrictions more and more elaborate. The "Parsing Techniques" book has a nice analogy about the expressive power of different grammars: more powerful ones do not describe larger sets of sequences, but carve out more and more specific subsets of valid sequences out of the pool of all possible sequences (which itself is trivial to define).
A type by definition is a set of possible states, if we "allow invalid states", then we've just defined a different type, a wider one. Whether it is what you need depends on the situation. E.g. we add entries to a database and place restrictions on them. But the user may take a long time to compose an entry. Fine; let's add a new type, "draft", that is similar to the primary entry type but does not have their restrictions. These drafts are stored separately so the user may keep editing them until they are ready. But they do not go into the part of the system that needs proper entries.
The whole point of state machines/type constraints/foreign key constraints/protocol definitions is that there is a clear boundary between what is allowed and what is not. And I would argue that this is what makes life easier, because you can tell one from the other. And with the right tooling, the compiler or some conformance tool will tell you that the change you just introduced, breaks your implementation in 412 places.Because this allows me to stop and think whether I can make a different change that only creates 117 problems. Or estimate the cost of the changes and compare it with the business benefit, and have appropriate conversations with stake holders. Or find out that my domain model was completely wrong to begin with and start refactoring my way out of this. And all of this before the first successful compile.
For me, this gives me maximum flexibility, because I have a good idea of the impact of any changes on the first day of development. This does require appropriate tooling though, like you would find in Ocaml, F#, Rust, State Machine Compilers, ... <insert your favourite tool here>.
> What happens when you need to account for “official” apps, which are developed internally and shouldn’t go through the normal review process?
There is a reason devs are advised to eat their own dogfood. Building a process-bypass for the users that are also the ones responsible for fixing the process is the easiest way to get a broken process.
https://www.youtube.com/watch?v=2JB1_e5wZmU
A state enumeration becomes just free text, a count becomes just a number, and too bad now it's impossible to reason about the state of the system because everything is unknowable. Why is this app at the top of every chart? Oh, it has recorded negative one billion users and its state is just the URL of the app store, which causes mayhem in the scoring sub-routine.
But of course, some flexibility is often (not always) needed in the long run, and knowing where to keep the flexibility and where to push for strictness is an important part of skills development as an engineer.
I talk about types because most of the "make invalid states unrepresentable" dogma comes from proponents of languages with extensive static type systems, and the idea is to use the type system to define all forms of legal states.
But in a dynamically typed language you also have that problem. Except there the shape is a hidden assumption and the type checker won't tell you all the places you have to change.
I find writing code in dynamic/statically typed languages very different from each other, and when I hear proponents of either complain about the other (types slow me down! / too many runtime errors!) I sort of assume they're writing code like they would in their preferred paradigm and running into walls.
To illustrate what I mean, take the null pointer. It's essentially an Optional<T> and in a lot of languages it's involuntary, meaning any value is implicitly optional. What happens when something returns null, but in some place in your logic you don't take it into account? Null pointer dereference.
I'm not convinced, because you have to deal with invalid state in some way or another in any case.
If you have to delete some record referenced by other records using foreign keys, you'll have to handle the complexity in any case, except if this is enforced by the database, you'll have no choice than to think about this upfront, which is good. It might lead you to handle this differently, for instance by not deleting the row but mark it as deleted, or by changing the id to some "ghost" user.
If you don't do this, all the complexity of not having to deal with the inconsistencies when you are creating them will have to be encoded in your code at write time, with null or exception checks everywhere.
Constraints encodes what safe assumptions your code make to be simpler. If an assumption needs to be relaxed, it's going to be hard to change things as you'll have to update all the code, but the alternative is to make no assumption at all anywhere, which is hard too and leads to code that's more complicated than necessary.
And then what does an invalid reviewer_id mean? Was this caused by a user deletion? Or a bug somewhere? Or some corruption?
Bonus: some commenters here write that it can depend on which programming language is used, I don't think it matters at all. You'll have to handle the null or dangling values in any case, whether the language guides you for this or not.
Any invalid state that cannot ever be a valid state (Not just that it's not valid today - you know it can't be valid tomorrow either) should be unrepresentable. A hard requirement.
Anything that is invalid today but could be valid tomorrow (One-off change, requirement change) should not be.
This makes sense to me and forces you to think up front about what "invalid" means. There are definitely hard errors you should make unrepresentable, even in domain models.
While it isn't strictly harmful, one drawback of this approach is that if you perfectly bit-pack every field such that random noise can be interpreted as a well-formed packet in your protocol, it will be difficult to heuristically identify the protocol.
(Intentional redundancies like checksums are fine, of course.)
This can as well be a desirable feature, depending on your design goals.
I've also seen good advice that you should never delete anything from your DB, but rather put rows in a different soft-deleted state...
Having a deleted data table is a slightly easier approach I've seen, but you still need to be aware about user and legal requirements around deleting data.
That depends on your application and requirements. I've worked on situations where a soft delete, where any fields with sensitive customer data are overwritten with a placeholder or random data (for legal compliance reasons) was a lot simpler than doing a hard delete and leaving a bunch of dangling records with ids pointing to records that no longer exist.
And unless your data model is pretty simple, and you are ok with not having any kind of grace period before fully deleting user data, you'll probably need to build a user data deletion process either way.
Keep everything actually can make some things faster (though mostly slower) and give you a depth of information you didn't have, but has a big impact on storage and cost.
Most people pick some mix of tradeoffs - for the important things to audit keep every state change, for other ones get the backups out.
If my experience, writing code that is “future migration friendly” is one of the hallmarks of senior developer knowledge. It’s just the kind of thing that burrows into your brain, because most senior devs have seen, built, or maintained code with randomly cordoned off “WARNING NONSENSICAL THING HERE” style sections of massive codebases that are artifacts of various migration things.
For context about how we do it, we use a giant NoSQL single table design, with zero joins. All joining is handled in the application, which ends up forcing everyone to consider how to parallelize and lookahead rather than join. Additionally, any complex data is stored as row level binary blobs from a versioned schema, which also helps localize issues to a particular subtype rather than leaking it into all schemas (for example if you change what qualifies as an “update”, your “updatedAt” column doesn’t suddenly change meanings for all things.
(This whole approach reminds me of "I think therefore anything, due to the false antecedent".)
If states are well-defined, it actually makes it EASIER to create an ad hoc transition that does not scramble data.
In accountancy there is the "General Journal" (https://en.wikipedia.org/wiki/General_journal) - a place to correct of accounting errors, enter adjustments, etc.
The General Journal really only works because accounting entries are immutable: corrections are always new entries. So it's not clear if "give corner cases a home" works everywhere.
But if the principle does work for your use case, it lets the rest of the system be as strict as you like!
The rest of it is arguing about what constitutes valid states, eg, commit messages when the user deletes or wrong-schema datagrams being partially utilized. Here’s the rub: if you don’t allow invalid states, you’re forced to actually have that discussion (“actually, messages don’t strictly require a user”; “only these fields are required, others are a maybe”) and those are then encoded in your software rules. They’re not things that happen informally, in undocumented ways.
Further, when you do allow invalid states, you have a typo disappear 1% of your messages into the “pubilshed” category.
> A foreign key constraint forces user_id to correspond to an actual row in the users table. If you try to create or update a post with user_id 999, and there is no user with that id, the foreign key constraint will cause the SQL query to fail.
> This sounds great, right? A record pointing at a non-existent user is in an invalid state. Shouldn’t we want it to be impossible to represent invalid states? However, many large tech companies - including the two I’ve worked for, GitHub and Zendesk - deliberately choose not to use foreign key constraints. Why not?
> The main reason is flexibility. In practice, it’s much easier to deal with some illegal states in application logic (like posts with no user attached) than it is to deal with the constraint. With foreign key constraints, you have to delete all related records when a parent record is deleted. That might be okay for users and posts - though it could become a very expensive operation - but what about relationships that are less solid? If a post has a reviewer_id, what happens when that reviewer’s account is deleted? It doesn’t seem right to delete the post, surely. And so on.
So if you need to keep reviews by a user when the user can be deleted, you have a few options:
- Use a foreign key constraint but also allow nullable foreign key values (I believe DBMSes such as postgres and mysql support this)
- Create a record representing an orphaned user and move reviews to it when the user is deleted (generally, the reviews will then show as posted by user "DELETED", though you can also enforce different ways to display deleted user associations)
- Soft delete the users (in this case you also have to exercise caution around what ex-user data may be displayed where)
There are tradeoffs with each one; soft deletion of users would likely run afoul of Right to be Forgotten laws in some places. I'd tend to favour option 1 for anything that references something like a user account where the user may be deleted, but option 2 might be useful in situations as well.
I'd generally recommend against omitting foreign key constraints when it can be assumed databases backing your application will support them.
Sure, losing the FK constraint gives you more "flexibility" (I guess?) but introduces way too many footguns. The database and tooling are a massive help in avoiding so many other bugs.
Sure, maybe one day you decide reviews can have authorship associated with entities which are not users of your system (say you've acquired republication data rights from another review service whose users are not your service's users); in that case you'd need an application-level refactor and migration. You might need to add an author table, where the author can be a user of your service, or a reference to a user of another service . Then users of your service become possible authors, and all reviews need to be migrated to have an FK relation to the author table now rather than the user table.
In any case, the FK doesn't prevent you from changing how your data is structured, and I'd argue it greatly helps you avoid mistakes as you move through the migration.