This is interesting! A field being nullable because it's legitimately optional in the domain model is one thing, but for new fields which shouldn't be nullable in the domain model, unless you can pick a reasonable identity value, you need a concept of absence that's different from null. Luckily the intersection of "non-nullable fields" and "fields with no reasonable identity value" and "fields which didnt exist in v1 of the domain model" is normally pretty small, but it's painful when it happens.
This reminds me of frozen/nonfrozen enums in Swift. You can do exhaustive case analysis on frozen enums, but case analysis on nonfrozen enums requires adding an `@unknown default` case.
How would the database know whether the other app layers depend on that value or not? You could absolutely have an app that does not require data in a specific field to function, yet all records happen to have data. This is actually fairly common in single-tenant apps, where some tenants populate a field and others do not. You need to look at how the data is used across the entire stack to know whether or not it should be nullable, not whatever the current data happens to be.
cerved · 10h ago
It doesn't. That's why it's the responsibility of the application layer to correctly implement the data model of the database and not the other way around
deepsun · 10h ago
I've seen worse. Some teams use JSON for their data. Not only each field can be missing (aka NULL), it can also be "null". Or a different type.
I envy your team who's only mistake is to forget setting NULLABLE. Rainbows and unicorns ;)
toledocavani · 1h ago
Worked in an enterprise project where they use a BPMN engine for all business logic (designed by non-technical people).
Each process can either have predefined fields (take time to implement) or key-string value store (yes, they chose this).
Either the BPMN logic or some JSs (some, as not all people can code) in PBMN nodes or some Java hooks (a lot, wrote under those other people requests) edit those values.
So when something's wrong, we swam in the sea of key-value, search all the code for who update this field, and why it's value is not JSON nor null nor boolean but a string "Not used, waiting for policy 1234".
After that project, I'm happy every time I can use a DB to store data, more so if it can run in local, much more so if it's relational.
thomas-st · 10h ago
That's just the JSON equivalent of "we have data, and it's null" vs "data is missing", and consistency could be enforced using a constraint or by making it non-NULL in the first place.
It's more common in string fields, which in many cases just get rendered on a web form that doesn't differentiate blank (empty string) from null state, therefore in the database we should in most cases set it up as follows:
- Value required: field should be non-NULL + at least length 1 (via check constraint)
- Value optional: either field is non-NULL, or field is nullable + at least length 1
I'm curious if you prefer to store optional strings as non-NULL and rely on the length, or as nullable and have a length constraint.
lock1 · 10h ago
What if almost everything is NULLABLE? including the supposedly primary key and foreign keys of the table?
I've had the firsthand experience building a consumer for that kind of DB and it's hell to get anything running correctly without literally writing a dedicated layer to sanity check everything.
gopher_space · 6h ago
> without literally writing a dedicated layer to sanity check everything
It's not attractive to developers for a variety of reasons, but encapsulation is generally cheaper (in every sense) than the alternatives by orders of magnitude. If a system is hard to alter and other people rely on it to do their jobs, thinking about it as set in stone during my planning process will save me grief and heartache.
Starting with an independent client in mind makes it easier to plan around things like reconstructing tables from composed exports or dealing with authoritatively incorrect data. It's a good collection point for hard-coded edge case handling, and a great location for all of your discovery notes.
phplovesong · 9h ago
Sounds like a bad domain, and/or poor design.
antif · 4h ago
No shame on JSON.. XML can do this too!
mdavid626 · 11h ago
I completely agree. Unfortunately, this is one of those things that’s hard to convince people of. You often hear: ‘It doesn’t matter,’ ‘It works fine without it,’ or ‘Why overcomplicate things?’—but that mindset can be limiting.
danbruc · 11h ago
A column that is nullable but never null might indicate that it should be non-nullable but does not necessarily imply so. Say you have an optional comment field, it might just happen by accident that a comment was recorded for each row, but that of course becomes increasingly unlikely with each additional row in the table. There is probably no harm in checking your database for such columns, especially in tables with many rows, but in the end it is up to the semantics of the data model whether a column should be nullable or not. The absence of NULLs is an indicator but not a determiner.
Szpadel · 11h ago
there is also other scenario, field might obviously looks like it should have value, and in check it might event always have it, but it might be "lazy" value.
eg. you might have some bug CSV uploaded and your have number of rows in it, your app could insert record without this number and async process would fill that later.
there might be even some corner case where null value is possible
I believe solution here isn't to check what fields do not use null, but to improve process of creating such migration. either you should create second ticket for next release to update db or commit new migration to some kind of next branch.
marcosdumay · 10h ago
> There is probably no harm in checking your database for such columns
The harm is the same as any other unreliable linter rule.
Each one such rule is almost harmless. And on most places that use that kind of rule, they are extremely harmful.
danbruc · 9h ago
With that I agree, once you start to treat the output of any linter as the truth, you are actively lowering the quality of your codebase. At the very least if it has reasonably good quality, if it is in bad shape, blindly following linters might still increase code quality, you will just not reach the peak.
tudorg · 8h ago
For a tooling solution for this problem, and many others, pgroll (https://github.com/xataio/pgroll) automates the steps from the blog post in a single higher-level operation. It can do things like adding a hidden column, backfill it with data, then adds the constraint, and only then expose it in the new schema.
minkeymaniac · 7h ago
If you don't care for old data having null , you could add a check contraint with nocheck (this is sql server fwiw)
ALTER TABLE foo with nocheck
ADD CONSTRAINT CheckNotnull check (id IS NOT NULL)
insert foo values (null)
Msg 547, Level 16, State 0, Line 13
The INSERT statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database tempdb", table "dbo.foo", column 'id'.
The statement has been terminated.
However be aware that if you update an existing value to NULL, you will still get the error
update foo set id = null where id = 2
Msg 547, Level 16, State 0, Line 20
The UPDATE statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database "tempdb", table "dbo.foo", column 'id'.
PeterZaitsev · 10h ago
Should not we look for database to be able to do online, efficient non locking addition of column with any default value, not just NULL rather than application to have a complicated and fragile logic ?
avg_dev · 9h ago
I believe PostgreSQL does this since v11, which was released in 2018: (current is v17)
> Many other useful performance improvements, including the ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null column default
I think there is some restriction there, like the default can't be "volatile" - I can't remember the precise definition here but I think current_timestamp would be volatile, but any static value would not.
tudorg · 8h ago
That is correct, for non-volatile default values Postgres is quick, which means that it is generally a safe operation.
Also interesting, `now()` is non-volatile because it's defined as "start of the transaction". So if you add a column with `DEFAULT now()` all rows will get the same value. But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time. A bit of a subtle gotcha.
avg_dev · 7h ago
Thanks for the info. One minor point:
> But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time.
Perhaps the “not” was a typo?
lblume · 11h ago
> But a field that is nullable in the schema and never null in practice is a silent lie.
This seems to be the central claim. But just as lies require intent, so does database design to some degree.
A column that is nullable but never null does not conclusively say anything, really. That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.
A nullable column signals that the data may be left empty, which is entirely different from that column actually having to be empty with the current data. Is-ought distinction, the potentiality of null ("nullability") is not to be confused with the actuality of current vacancy ("null values"). The programmer extracting data from the database must check for null if the data is nullable, not just if is null now as a principle of robust fault-tolerant code.
jerf · 11h ago
I don't think the author is talking generally about fields that could be NULL but just happen to never be so in the production DB. The piece is specifically in the context of a new database field that is fully intended by its designer to be NOT NULL, which was NULL only for migration purposes, and which was never updated to be NOT NULL once the migration is complete. The point was not meant to be extended beyond that.
One could write a separate piece about maybe using that as a clue that the field could be NOT NULL'd in the future but that's not what this post is.
lblume · 11h ago
In that case we read it entirely differently — the last paragraph explicitly mentions inferring "Nullable fields that never hold nulls are often the result of incomplete migrations or forgotten cleanup steps". The author also proposes a script "to identify cases where the field could safely be changed to non-nullable". But alas, we cannot do that with full generality, due to there being a big difference in intent that cannot be captured purely by counting nulls, and surely not by only calculating percentages.
dataflow · 11h ago
> That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.
No it's not, because they specifically started with the premise that the field was initially intentionally non-null and was only temporarily made nullable for migration purposes. That is obviously not the situation you are describing, right?
darkwater · 11h ago
I think their point is that for all intents, the column IS not nullable. It's nullable as an artifact of making live schema migration easier, with no blocking/downtime. But as the data model is concerned, it should not be nullable.
lblume · 11h ago
Sure, if one just leaves a column nullable due to negligence, one should check the actual requirements and enforce them to make invalid states unrepresentable. The author still makes too strong of a claim that becomes detached from the migration aspect, insinuating that one can just empirically test the database to check whether this is the case, to which I disagree.
matsemann · 11h ago
Problem is you end up other places with the assumption thar it's never null. So in the future when you actually set it to null somewhere it will blow up.
tuyiown · 11h ago
People that assume that a (nullable) value is never null because the sample at hand does not contain a null value _will_ learn their lesson the hard way, the one you describe.
But even non-nullable does not always resist to time, I'd argue that use cases where the field _has_ to be null eventually emerges and somehow have to be mitigated. There is no easy solution to safely workaround that without either tons work that duplicates lots of things or taking risks by adapting the code base.
Trigg3r · 10h ago
Am I missing something here in my (MS) SQL world? if a new field is added as null, I do that to the (now) 20 year old system to we don't break 100's of stored procs - any (new) code that needs that field, has to check for it being null...
minkeymaniac · 7h ago
in sql server, you can simply add a not null check constraint with nocheck (see my comment with full code)
ALTER TABLE foo WITH NOCHECK ADD CONSTRAINT CheckNotnull CHECK (id IS NOT NULL)
any new values coming in cannot be null but the values already in the table with null are fine... then you can update them to not null over time
jonny_eh · 8h ago
I recommend using checklists for schema changes like this.
comrade1234 · 8h ago
I've never seen step 5 happen...
HocusLocus · 9h ago
I'm glad to see people discussing [zero,infinitesimals,false,emptystring,isnull,unpopulated,missing] as if each one is a thing. They've always been things! We've just been buried in compromises and shortcuts all these years.
There should also be a [the-asteroid-has-hit-y'all-are-so-stupid] and global data systems should just just pass that around after impact until the power goes out for good.
This reminds me of frozen/nonfrozen enums in Swift. You can do exhaustive case analysis on frozen enums, but case analysis on nonfrozen enums requires adding an `@unknown default` case.
https://docs.swift.org/swift-book/documentation/the-swift-pr...
I envy your team who's only mistake is to forget setting NULLABLE. Rainbows and unicorns ;)
Each process can either have predefined fields (take time to implement) or key-string value store (yes, they chose this).
Either the BPMN logic or some JSs (some, as not all people can code) in PBMN nodes or some Java hooks (a lot, wrote under those other people requests) edit those values.
So when something's wrong, we swam in the sea of key-value, search all the code for who update this field, and why it's value is not JSON nor null nor boolean but a string "Not used, waiting for policy 1234".
After that project, I'm happy every time I can use a DB to store data, more so if it can run in local, much more so if it's relational.
It's more common in string fields, which in many cases just get rendered on a web form that doesn't differentiate blank (empty string) from null state, therefore in the database we should in most cases set it up as follows:
- Value required: field should be non-NULL + at least length 1 (via check constraint)
- Value optional: either field is non-NULL, or field is nullable + at least length 1
I'm curious if you prefer to store optional strings as non-NULL and rely on the length, or as nullable and have a length constraint.
I've had the firsthand experience building a consumer for that kind of DB and it's hell to get anything running correctly without literally writing a dedicated layer to sanity check everything.
It's not attractive to developers for a variety of reasons, but encapsulation is generally cheaper (in every sense) than the alternatives by orders of magnitude. If a system is hard to alter and other people rely on it to do their jobs, thinking about it as set in stone during my planning process will save me grief and heartache.
Starting with an independent client in mind makes it easier to plan around things like reconstructing tables from composed exports or dealing with authoritatively incorrect data. It's a good collection point for hard-coded edge case handling, and a great location for all of your discovery notes.
eg. you might have some bug CSV uploaded and your have number of rows in it, your app could insert record without this number and async process would fill that later.
there might be even some corner case where null value is possible
I believe solution here isn't to check what fields do not use null, but to improve process of creating such migration. either you should create second ticket for next release to update db or commit new migration to some kind of next branch.
The harm is the same as any other unreliable linter rule.
Each one such rule is almost harmless. And on most places that use that kind of rule, they are extremely harmful.
for example
create table foo(id int) insert foo values (1), (2), (3)
insert foo values (null)
select * from foo
id
1
2
3
NULL
ALTER TABLE foo with nocheck ADD CONSTRAINT CheckNotnull check (id IS NOT NULL)
insert foo values (null)
Msg 547, Level 16, State 0, Line 13 The INSERT statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database tempdb", table "dbo.foo", column 'id'. The statement has been terminated.
However be aware that if you update an existing value to NULL, you will still get the error
update foo set id = null where id = 2
Msg 547, Level 16, State 0, Line 20 The UPDATE statement conflicted with the CHECK constraint "CheckNotnull". The conflict occurred in database "tempdb", table "dbo.foo", column 'id'.
> Many other useful performance improvements, including the ability to avoid a table rewrite for ALTER TABLE ... ADD COLUMN with a non-null column default
https://www.postgresql.org/docs/release/11.0/
I think there is some restriction there, like the default can't be "volatile" - I can't remember the precise definition here but I think current_timestamp would be volatile, but any static value would not.
Also interesting, `now()` is non-volatile because it's defined as "start of the transaction". So if you add a column with `DEFAULT now()` all rows will get the same value. But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time. A bit of a subtle gotcha.
> But `timeofday()` is not volatile, so `DEFAULT timeofday()` is going to lock the table for a long time.
Perhaps the “not” was a typo?
This seems to be the central claim. But just as lies require intent, so does database design to some degree.
A column that is nullable but never null does not conclusively say anything, really. That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.
A nullable column signals that the data may be left empty, which is entirely different from that column actually having to be empty with the current data. Is-ought distinction, the potentiality of null ("nullability") is not to be confused with the actuality of current vacancy ("null values"). The programmer extracting data from the database must check for null if the data is nullable, not just if is null now as a principle of robust fault-tolerant code.
One could write a separate piece about maybe using that as a clue that the field could be NOT NULL'd in the future but that's not what this post is.
No it's not, because they specifically started with the premise that the field was initially intentionally non-null and was only temporarily made nullable for migration purposes. That is obviously not the situation you are describing, right?
But even non-nullable does not always resist to time, I'd argue that use cases where the field _has_ to be null eventually emerges and somehow have to be mitigated. There is no easy solution to safely workaround that without either tons work that duplicates lots of things or taking risks by adapting the code base.
ALTER TABLE foo WITH NOCHECK ADD CONSTRAINT CheckNotnull CHECK (id IS NOT NULL)
any new values coming in cannot be null but the values already in the table with null are fine... then you can update them to not null over time
There should also be a [the-asteroid-has-hit-y'all-are-so-stupid] and global data systems should just just pass that around after impact until the power goes out for good.