Poor man's bitemporal data system in SQLite and Clojure

108 adityaathalye 23 9/3/2025, 5:47:50 PM evalapply.org ↗

Comments (23)

moomin · 5h ago
It's a pity that Clojure is kind of a hermetic space these days, because the concept of bitemporality really deserves much more attention. It's amazing how often you want to know "What was the P&L for March using the data available on Apr 4?" and how uncommon it is to find a database design that supports that kind of query.
0_gravitas · 1h ago
'Hermetic' is an interesting word-choice, considering Clojure has a habit/raison detre of attaching to other host languages/ecosystems i.e. Jank (C++), ClojureScript (JS), Basilisp (Python), Phel (PHP), Babashka (general scripting), and of course, Clojure itself on the JVM.
necubi · 1h ago
It's not particularly rare in analytical databases/query engines, nearly all of which support AsOf joins these days, including

  * Clickhouse
  * DuckDB
  * BigQuery
  * Snowflake
  * QuestDB
lemonberry · 2h ago
"It's a pity that Clojure is kind of a hermetic space these days"

If you don't mind sharing, I'm curious why you feel this way.

No comments yet

galkk · 3h ago
Spanner does that with ease (I worked there, so I’m clearly biased).

https://cloud.google.com/spanner/docs/timestamp-bounds#exact...

refset · 2h ago
That only covers the 'transaction time' axis though? And the page says retention is limited to 1 week. No doubt useful for some things, but probably not end-user reporting requirements.
nickpeterson · 4h ago
I don’t understand footnote 16? It sounds like he’s saying he’s ceo of htmx and datastar?
adityaathalye · 4h ago
Yes... This is a running joke in both communities. IYKYK :)
nickpeterson · 4h ago
Oh ok, I’m out of the loop on it ;) thought it was some sort of ai hallucination.
adityaathalye · 4h ago
All eleven thousand words were hand-typed. No AIs are used or abused in the making of any of my blog posts. (Because I write to think, because writing is nature's way of showing me how sloppy my thinking is. So it goes...)
lemonberry · 3h ago
"Because I write to think"

Hallelujah!!

andersmurphy · 4h ago
Some of us keep htmx around as a momento. Just for old times sake.
whalesalad · 4h ago
I've been absolutely waist deep in a bitemporal system on top of PostgreSQL using tstzrange fields. We manage an enormous portfolio of hundreds of thousands of domain names. Every time our core db is modified, before/after states are emitted to a change table. We've been doing this since 2022. Those changes get lightly transformed via trigger into a time travel record, with the valid from/to range and a gist index to make asking questions about the state of the world at a particular point in time easy. For perspective our change table has 90M rows.

All of it works quite well and is decently performant. We can ask questions like, how many domains did we own on March 13th, 2024? Or look at the entire lifecycle of a domains ownership (owned, released, re-acquired, transfered, etc).

The big challenge and core issue we discovered though is that our data sucks. QAing this new capability has been a moving target. Tons of mistakes over time that were partially undone or manually undone without proper audit trail. Ghost records. Rapid changes by our bulk editor tool a->b->a->b that need to get squashed into just a->b. The schema of our database has evolved over time, too, which has made this tough to view a consistent representation of things even if the fields storing that data were renamed. When the system was first introduced, we had ~5 columns to track. Now we have over 30.

Suffice to say if I were to do things over again, I would implement a much better change tracking system that bakes in tools to clean/erase/undo/soft-delete/hard-delete mistakes so that future me (now) wouldn't have so many edge cases to deal with in this time traveling system. I'd also like to just make the change tracking capable of time travel itself, versus building that as a bolt-on side table that tracks and works from the change table. Transitioning to an EAV (entity-attr-value) approach is on my spike list, too. Makes it easier to just reduce (key,val) tuples down into an up to date representation versus looking at diffs of before/after.

Really interesting stuff. I learned a lot about this from Clojure/Datomic and think its quite neat that so many Clojurists are interested in and tackling this problem. As the author notes in this post, XTDB is another one.

adityaathalye · 4h ago
The XTDB people did a bunch of interviews with people doing stuff in the temporal data systems universe.

Several are out on YouTube. One with Richard Snodgrass is yet to be published... waiting for that eagerly!

ref. topics and links to videos: https://github.com/orgs/xtdb/discussions/4419

  "Decades in the making: Bitemporality in SQL, with Rob Squire"

  "As-Of joins and database architecture, with Arjun Narayan (co-founded Materialize)"

  "Building an identity service on XTDB, with Andrew Maddock (Foundry OS)"  

  "Solving FinTech pains with Suresh Nageswaran (Senior Director, Grid Dynamics)"

  "A meeting with the one and only Richard T. Snodgrass."

  "Building a payments integration system on XTDB, with Edward Hughes"
refset · 4h ago
> waiting for that eagerly!

Wait no longer! I just updated the page with the unlisted video link: https://youtu.be/6Q_pAI20QPA

We are hoping to record another (more polished) session with Professor Snodgrass soon :)

e1g · 3h ago
> clean/erase/undo/soft-delete/hard-delete mistakes[...] make the change tracking capable of time travel itself [...] Transitioning to an EAV

I just finished building out all of that + more (e.g., data lineage, multi-verse, local overrides, etc), also on PG. Reach out if you want to chat and get nerd sniped!

adityaathalye · 4h ago
For my little system (blog author here) I've decided that all tables will be append-only logs of facts. In the post, I tried keeping traditional schema alongside a "main" facts table. Of course, the problem of audit logs comes up for any change made to traditional "current-database-view" tables. And then who audit logs the audit log?

I feel like "two systems" is the problem.

Writes should either be traditional schema -> standard log of all changes, OR, should be "everything is a log", and the system keeps the "current" view updated, which is just a special case of the "as of" query, where "as of" is always "now".

How badly my schema will behave (in my SQLite-based architecture) is to be discovered. I will hopefully be in a position to run a reasonable benchmark next week.

A follow-up blog post is likely :sweat-smile:

whalesalad · 3h ago
> OR, should be "everything is a log",

100%. This is a regret that I have in our lifecycle tracking. We effectively are updating a giant spreadsheet (table) and tracking side effects of those updates. I would much rather work in a log/event based system kinda flipping that on its head, where we track events like domain_renewed or domain_transferred.

As it stands we track that the renewed_at attribute changed, and have to infer (along with other fields) things like "were you a renewal event?" which has changed meaning over time.

Alas change tracking wasn't even part of the spec of this original feature... so I am glad I had the foresight to institute change tracking from 2022, as imperfect as it is currently.

adityaathalye · 3h ago
So much this!

Straight-up copy-paste from my development notes...

* what pivots in current db vs temporal db?

What is the difference between these pictures?

#+BEGIN_SRC text

  ("Current" DB)

   CRUD records
         ^
         |
         v
  [ current view ] -- update --> [ facts log ]

  ---------------------------------------------------------

  [ current view ] <-- update -- [ facts log ]
         |                             ^
         v                             |
    READ records             WRITE records (as facts)

  ("Temporal" DB)
#+END_SRC

- Hint: It is /not/ the schema. It is /what/ "pivots".

- In both cases the current view can be understood to be a pivot table of the facts log.

- BUT in the current DB, we must "pivot" the /process/, i.e. take a CRUD op and transform it into an audit log. This /must/ be done synchronously in in real-time. Whereas in the Temporal DB's case, we must "pivot" the stored data, which we can do at any point in query time, as of any point in time of the log.

- The complexity of Current DBs derives from /live process management/ challenges. Whereas the complexity of Temporal DBs derives from /retroactive process management/ challenges.

/now/ is /never/ cheap. It is the most expensive non-fungible thing. Once it's gone, it's gone. Fail to transact an audit trail for the CRUD operation? Too bad. Better luck next time. Whereas disk space is cheap, and practically infinite which affords Temporal DBs greater opportunity to find a better trade-off between essential complexity and DB capabilities. At least as long as disk space remains plentiful and cheap.

This is why if we are modeling a Temporal DB over a Current DB, it is preferable to write all tables as /fact/ tables and query their auto-generated 'current view' versions, for normal query needs. For audit / analysis needs, we can snapshot the facts tables and operate on those out-of-band (detached from the live app). Impedance mismatch occurs when trying to run /both/ current CRUD tables (writing to audit logs) for some parts of the schema, in parallel with a "main" facts table for all fact-records. In a given data system, it is better to do either one or the other, not both at the same time.

mamcx · 3h ago
The problem with `everything is a log` is that is very undisciplined, and trigger from the actual main table have this very serious advantages:

* Your main table has the correct shape, and the derived log too! * MOST of the queries are to that and for now

refset · 5h ago
tl;dr

  CREATE VIEW IF NOT EXISTS world_facts_as_of_now AS
  SELECT
    rowid, txn_time, valid_time,
    e, a, v, ns_user_ref, fact_meta
  FROM (
    SELECT *,
      ROW_NUMBER() OVER (
        PARTITION BY e, a
        ORDER BY valid_preferred DESC, txn_id DESC
      ) AS row_num
    FROM world_facts
  ) sub
  WHERE row_num = 1
    AND assert = 1
  ORDER BY rowid ASC;
...cool approach, but poor query optimizer!

It would be interesting to see what Turso's (SQLite fork) recent DBSP-based Incremental View Maintenance capability [0] would make of a view like this.

[0] https://github.com/tursodatabase/turso/tree/main/core/increm...

adityaathalye · 5h ago
It is a poor man's database, after all :)

I really need to complete this thing and run some data through it... like, how poor is poor really? Can it be just enough for me to make a getaway with smol SaaS apps?

thom · 3h ago
Perhaps because of the domains I work in I continue to find bitemporality completely uninteresting. Stop trying to make fetch-as-of happen.