Does anyone think the current AI approach will hit a dead end?
16 points by rh121 8h ago 25 comments
Why the Technological Singularity May Be a "Big Nothing"
7 points by starchild3001 1d ago 8 comments
'Make invalid states unrepresentable' considered harmful
56 zdw 61 9/8/2025, 3:40:02 AM seangoedecke.com ↗
The database is not your domain model, it is the storage representation of your domain model on disk. Your REST/grpc/whatever API also isn’t your domain model, but the representation of your domain model on the wire.
These tools (database, protocols) are not the place to enforce making invalid states un-representable for reasons the article mentions. You translate your domain model into and out of these tools, so code your domain model as separately and as purely as you can and reject invalid states during translation.
In cases like electronics & protocols, it's very often a good idea to add an extra "reserved & unused" section for compatibility reasons.
These representations need not be 1:1 with the domain model. Different versions of the model might reject previously accepted representation in case of breaking changes. It's up to the dev to decide on which conflict reconciliation strategy they should take (fallback values, reject, compute value, etc).
Working with a precise domain model (as in, no representable invalid states) is way more pleasant than stringly-typed/primitives mess. We can just focus on domain logic without continuously second-guessing whether a string contains valid user.role values or it contains "cat".
You’re already paying the cost of abstraction for using a certain database or protocol, so get the most bang for your buck. If you can encode rules in a schema or a type, it’s basically free, compared to having to enforce it in code, hoping that future developers (yourself or others) will remember to do the same. It just eliminates and entire universe of problems you have to deal with otherwise.
Also, while relaxing constraints is usually easy or at least doable, enforcing new constrains on existing data is impossible in practice. Never seen it done successfully.
The only exception to this rule I typically make is around data state transitions. Meaning that even when business rules dictate a unidirectional transition, it should be bidirectional in code, just because people will click the wrong button and will need a way to undo “irreversible” actions.
My perfunctory reading is thus: first you couple your state representation to the business logic, and make some states unrepresentable (say every client is category A, B or C). Maybe you allow yourself some flexibility, like you can add more client types.
Then the business people come and tell you some new client is both A and B, and maybe Z as well, quickly now, type type type.
And that there's a tradeoff between:
- eliminating invalid states, leading to less errors, and
- inflexibility in changes to the business logic down the way.
Maybe I misunderstood, but if this is right, then it's a good point. And I would add: when modelling some business logic, ask yourself how likely it is to change. If it's something pretty concrete and immovable, feel free to make the representation rigid. But if not, and even if the business people insist otherwise, look for ways to retain flexibility down the line, even if it means some invalid states are indeed representable in code.
Say we have a perfect "make illegal states unrepresentable" model. Like what you said, it's kind of inflexible when there are requirement changes. We need to change affected codes before you can even proceed to compile & run.
On the other hand, an untyped system is flexible. Just look at Javascript & Python ecosystem, a function might even contain a somewhat insane and gibberish statement, yet your program might still run but will throw some error at runtime.
Some bugs in programs like games or average webapp don't matter that much. We can fix it later when users report the bug.
While it's probably better to catch whether an user can withdraw a negative balance or not at compile time, as we don't want to introduce "infinite money glitch" bug.
Yes, I agree. The blogger shows a fundamental misunderstanding of what it means to "make invalid states unrepresentable". I'll add that the state machine example is also pretty awful. The blogger lists examples of usecases that the hypothetical implementation does not support, and the rationale to not implement it was that "this can dramatically complicate the design". Which is baffling as the scenario was based on complicating the design with "edge cases", but it's even more baffling when the blogger has an epiphany of "you need to remain flexible enough to allow some arbitrary transitions". As if the whole point was not to allow some transitions and reject all other that would be invalid.
The foreign key example takes the cake, though. Allowing non-normalized data to be stored in databases has absolutely no relation with the domain model.
I stopped reading after that point. The blog post is a waste of bandwidth.
Thus, we have to be careful at which level we make invalid states unrepresentable.
If we follow where Parnas and Dijkstra suggested we go[1], we'll build software with an onion architecture. There, the innermost layers are still quite primitive and will certainly be capable of representing states considered invalid by the requirements specification! Then as we stack more layers on top, we impose more constraints on the functionality that can be expressed, until the outermost layer almost by accident solves the problems that needed to be solved. The outermost layers are where invalid states should be unrepresentable.
What often happens in practice when people try to make invalid states unrepresentable is they encode assumptions from the requirements specification into the innermost layers of the onion, into the fundamental building blocks of the software. That results in a lot of rework when the requirements inevitably change. The goal of good abstraction should be to write the innermost layers such that they're usable across all possible variations of the requirements – even those we haven't learned yet. Overspecifying the innermost layers runs counter to that.
In the example of the app market place, any state transitions that are well-defined should be allowed at the inner layer of the abstraction that manages state transitions, but the high-level API in which normal transitions are commanded should only allow the ones currently considered valid by the requirements.
[1]: https://entropicthoughts.com/deliberate-abstraction
For scientific simulations, I almost always want invalid state to immediately result in a program crash. Invalid state is usually due to a bug. And invalid state is often the type of bug which may invalidate any conclusions you'd want to draw from the simulation.
For data analysis, things are looser. I'll split data up into data which is successfully cleaned to where invalid state is unrepresentable and dirty data which I then inspect manually to see if I am wrong about what is "invalid" or if I'm missing a cleaning step.
I don't write embedded software (although I've written control algorithms to be deployed on it and have been involved in testing that the design and implementation are equivalent), but while you can't exactly make every invalid state unrepresentable you definitely don't punch giant holes in your state machine. A good design has clean state machines, never has an uncovered case, and should pretty much only reach a failure state due to outside physical events or hardware failure. Even then, if possible the software should be able to provide information to intervene to fix certain physical issues. I've seen devices RMA's where the root cause was the FPU failed; when your software detects the sort of error that might be hardware failure, sometimes the best you can do is bail out very carefully. But you want to make these unknown failures be a once per thousands or millions of device years event.
Sean is writing mostly about distributed systems where it sounds like it's not a big deal if certain things are wrong or there's not a single well defined problem being solved. That's very different than the domains I'm used to, so the correct engineering in that situation may more often be to allow invalid state. (EDIT: and it also seems very relevant that there may be multiple live systems updated independently so you can't just force upgrade everything at once. You have to handle more software incompatibilities gracefully.)
If you have actually made invalid states unrepresentable, then it is _impossible_ for your program to transition into an invalid state at runtime.
Otherwise, you're just talking about failing fast
Not the case for scientific computing/HPC. Often HPC codebases will use numerical schemes which are mathematically proven to 'blow up' (produce infs/nans) under certain conditions even with a perfect implementation - see for instance the CFL condition [1].
The solution to that is typically changing to a numerical scheme more suited for your problem or tweaking the current scheme's parameters (temporal step size, mesh, formula coefficients...). It is not trivial to find what the correct settings will be before starting. Encountering situations like a job which runs fine for 2 days and then suddenly blows up is not particularly rare.
[1] https://en.m.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E...
I don't think the article is referring to that sort of issue, which sounds fundamental to the task at hand (calculations etc). To me it's about making the code flexible with regards to future changes/requirements/adaptions/etc. I guess you could consider Y2K as an example of this issue, because the problem with 6 digit date codes wasn't with their practicality at handling dates in the 80's/90's, but about dates that "spanned" beyond 991231, ie 000101.
A type by definition is a set of restrictions, and normally we go by making these restrictions more and more elaborate. The "Parsing Techniques" book has a nice analogy about the expressive power of different grammars: more powerful ones do not describe larger sets of sequences, but carve out more and more specific subsets of valid sequences out of the pool of all possible sequences (which itself is trivial to define).
A type by definition is a set of possible states, if we "allow invalid states", then we've just defined a different type, a wider one. Whether it is what you need depends on the situation. E.g. we add entries to a database and place restrictions on them. But the user may take a long time to compose an entry. Fine; let's add a new type, "draft", that is similar to the primary entry type but does not have their restrictions. These drafts are stored separately so the user may keep editing them until they are ready. But they do not go into the part of the system that needs proper entries.
Which side of the argument one falls on is likely to be heavily influenced by which language they're writing. The mantra is likely worth sticking to heavily in, say, Haskell or Rust, and I've had plenty of success with it in Swift. Go or Java on the other hand? You'd probably want to err on the side of flexibility because that suits the language more and you can rely on the compiler less during development.
I know I can trust my Swift code. Usually when it compiles it works because I try to, and often can, make invalid states unrepresentable. My ObjC code was always full of holes because doing so was not so easy (or not possible at all)…
flexibility saves the effort and allows doing more of the Actual Things
The whole point of state machines/type constraints/foreign key constraints/protocol definitions is that there is a clear boundary between what is allowed and what is not. And I would argue that this is what makes life easier, because you can tell one from the other. And with the right tooling, the compiler or some conformance tool will tell you that the change you just introduced, breaks your implementation in 412 places.Because this allows me to stop and think whether I can make a different change that only creates 117 problems. Or estimate the cost of the changes and compare it with the business benefit, and have appropriate conversations with stake holders. Or find out that my domain model was completely wrong to begin with and start refactoring my way out of this. And all of this before the first successful compile.
For me, this gives me maximum flexibility, because I have a good idea of the impact of any changes on the first day of development. This does require appropriate tooling though, like you would find in Ocaml, F#, Rust, State Machine Compilers, ... <insert your favourite tool here>.
> What happens when you need to account for “official” apps, which are developed internally and shouldn’t go through the normal review process?
There is a reason devs are advised to eat their own dogfood. Building a process-bypass for the users that are also the ones responsible for fixing the process is the easiest way to get a broken process.
https://www.youtube.com/watch?v=2JB1_e5wZmU
A state enumeration becomes just free text, a count becomes just a number, and too bad now it's impossible to reason about the state of the system because everything is unknowable. Why is this app at the top of every chart? Oh, it has recorded negative one billion users and its state is just the URL of the app store, which causes mayhem in the scoring sub-routine.
But of course, some flexibility is often (not always) needed in the long run, and knowing where to keep the flexibility and where to push for strictness is an important part of skills development as an engineer.
I talk about types because most of the "make invalid states unrepresentable" dogma comes from proponents of languages with extensive static type systems, and the idea is to use the type system to define all forms of legal states.
But in a dynamically typed language you also have that problem. Except there the shape is a hidden assumption and the type checker won't tell you all the places you have to change.
I find writing code in dynamic/statically typed languages very different from each other, and when I hear proponents of either complain about the other (types slow me down! / too many runtime errors!) I sort of assume they're writing code like they would in their preferred paradigm and running into walls.
To illustrate what I mean, take the null pointer. It's essentially an Optional<T> and in a lot of languages it's involuntary, meaning any value is implicitly optional. What happens when something returns null, but in some place in your logic you don't take it into account? Null pointer dereference.
I'm not convinced, because you have to deal with invalid state in some way or another in any case.
If you have to delete some record referenced by other records using foreign keys, you'll have to handle the complexity in any case, except if this is enforced by the database, you'll have no choice than to think about this upfront, which is good. It might lead you to handle this differently, for instance by not deleting the row but mark it as deleted, or by changing the id to some "ghost" user.
If you don't do this, all the complexity of not having to deal with the inconsistencies when you are creating them will have to be encoded in your code at write time, with null or exception checks everywhere.
Constraints encodes what safe assumptions your code make to be simpler. If an assumption needs to be relaxed, it's going to be hard to change things as you'll have to update all the code, but the alternative is to make no assumption at all anywhere, which is hard too and leads to code that's more complicated than necessary.
And then what does an invalid reviewer_id mean? Was this caused by a user deletion? Or a bug somewhere? Or some corruption?
Bonus: some commenters here write that it can depend on which programming language is used, I don't think it matters at all. You'll have to handle the null or dangling values in any case, whether the language guides you for this or not.
Any invalid state that cannot ever be a valid state (Not just that it's not valid today - you know it can't be valid tomorrow either) should be unrepresentable. A hard requirement.
Anything that is invalid today but could be valid tomorrow (One-off change, requirement change) should not be.
This makes sense to me and forces you to think up front about what "invalid" means. There are definitely hard errors you should make unrepresentable, even in domain models.
While it isn't strictly harmful, one drawback of this approach is that if you perfectly bit-pack every field such that random noise can be interpreted as a well-formed packet in your protocol, it will be difficult to heuristically identify the protocol.
(Intentional redundancies like checksums are fine, of course.)
This can as well be a desirable feature, depending on your design goals.
I've also seen good advice that you should never delete anything from your DB, but rather put rows in a different soft-deleted state...
Having a deleted data table is a slightly easier approach I've seen, but you still need to be aware about user and legal requirements around deleting data.
That depends on your application and requirements. I've worked on situations where a soft delete, where any fields with sensitive customer data are overwritten with a placeholder or random data (for legal compliance reasons) was a lot simpler than doing a hard delete and leaving a bunch of dangling records with ids pointing to records that no longer exist.
And unless your data model is pretty simple, and you are ok with not having any kind of grace period before fully deleting user data, you'll probably need to build a user data deletion process either way.
Keep everything actually can make some things faster (though mostly slower) and give you a depth of information you didn't have, but has a big impact on storage and cost.
Most people pick some mix of tradeoffs - for the important things to audit keep every state change, for other ones get the backups out.
If my experience, writing code that is “future migration friendly” is one of the hallmarks of senior developer knowledge. It’s just the kind of thing that burrows into your brain, because most senior devs have seen, built, or maintained code with randomly cordoned off “WARNING NONSENSICAL THING HERE” style sections of massive codebases that are artifacts of various migration things.
For context about how we do it, we use a giant NoSQL single table design, with zero joins. All joining is handled in the application, which ends up forcing everyone to consider how to parallelize and lookahead rather than join. Additionally, any complex data is stored as row level binary blobs from a versioned schema, which also helps localize issues to a particular subtype rather than leaking it into all schemas (for example if you change what qualifies as an “update”, your “updatedAt” column doesn’t suddenly change meanings for all things.
(This whole approach reminds me of "I think therefore anything, due to the false antecedent".)
If states are well-defined, it actually makes it EASIER to create an ad hoc transition that does not scramble data.
In accountancy there is the "General Journal" (https://en.wikipedia.org/wiki/General_journal) - a place to correct of accounting errors, enter adjustments, etc.
The General Journal really only works because accounting entries are immutable: corrections are always new entries. So it's not clear if "give corner cases a home" works everywhere.
But if the principle does work for your use case, it lets the rest of the system be as strict as you like!
The rest of it is arguing about what constitutes valid states, eg, commit messages when the user deletes or wrong-schema datagrams being partially utilized. Here’s the rub: if you don’t allow invalid states, you’re forced to actually have that discussion (“actually, messages don’t strictly require a user”; “only these fields are required, others are a maybe”) and those are then encoded in your software rules. They’re not things that happen informally, in undocumented ways.
Further, when you do allow invalid states, you have a typo disappear 1% of your messages into the “pubilshed” category.
> A foreign key constraint forces user_id to correspond to an actual row in the users table. If you try to create or update a post with user_id 999, and there is no user with that id, the foreign key constraint will cause the SQL query to fail.
> This sounds great, right? A record pointing at a non-existent user is in an invalid state. Shouldn’t we want it to be impossible to represent invalid states? However, many large tech companies - including the two I’ve worked for, GitHub and Zendesk - deliberately choose not to use foreign key constraints. Why not?
> The main reason is flexibility. In practice, it’s much easier to deal with some illegal states in application logic (like posts with no user attached) than it is to deal with the constraint. With foreign key constraints, you have to delete all related records when a parent record is deleted. That might be okay for users and posts - though it could become a very expensive operation - but what about relationships that are less solid? If a post has a reviewer_id, what happens when that reviewer’s account is deleted? It doesn’t seem right to delete the post, surely. And so on.
So if you need to keep reviews by a user when the user can be deleted, you have a few options:
- Use a foreign key constraint but also allow nullable foreign key values (I believe DBMSes such as postgres and mysql support this)
- Create a record representing an orphaned user and move reviews to it when the user is deleted (generally, the reviews will then show as posted by user "DELETED", though you can also enforce different ways to display deleted user associations)
- Soft delete the users (in this case you also have to exercise caution around what ex-user data may be displayed where)
There are tradeoffs with each one; soft deletion of users would likely run afoul of Right to be Forgotten laws in some places. I'd tend to favour option 1 for anything that references something like a user account where the user may be deleted, but option 2 might be useful in situations as well.
I'd generally recommend against omitting foreign key constraints when it can be assumed databases backing your application will support them.
Sure, losing the FK constraint gives you more "flexibility" (I guess?) but introduces way too many footguns. The database and tooling are a massive help in avoiding so many other bugs.
Sure, maybe one day you decide reviews can have authorship associated with entities which are not users of your system (say you've acquired republication data rights from another review service whose users are not your service's users); in that case you'd need an application-level refactor and migration. You might need to add an author table, where the author can be a user of your service, or a reference to a user of another service . Then users of your service become possible authors, and all reviews need to be migrated to have an FK relation to the author table now rather than the user table.
In any case, the FK doesn't prevent you from changing how your data is structured, and I'd argue it greatly helps you avoid mistakes as you move through the migration.