Graphical Linear Algebra (graphicallinearalgebra.net)
248 points by hyperbrainer 18h ago 19 comments
eBPF: Connecting with Container Runtimes (h0x0er.github.io)
58 points by forxtrot 15h ago 7 comments
Postgres LISTEN/NOTIFY does not scale
447 davidgu 189 7/7/2025, 2:05:06 PM recall.ai ↗
I've landed on Postgres/ClickHouse/NATS since together they handle nearly any conceivable workload managing relational, columnar, messaging/streaming very well. It is also not painful at all to use as it is lightweight and fast/easy to spin up in a simple docker compose. Postgres is of course the core and you don't always need all three but compliment each other very well imo. This has been my "go to" for a while.
However, I've been in several situations where scaling the queue brings down the database, and therefore the app, and am thus of the opinion you probably shouldn't couple these systems too tightly.
There are pros and cons, of course.
This seems like another case where Postgres gets free marketing due to companies hitting its technical limits. I get why they choose to make lemonade in these cases with an eng blog post, but this is a way too common pattern on HN. Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business. OpenAI had a similar blog post a couple of months ago where they revealed they were probably spending more than quarter of a million a month on an Azure managed Postgres, and it had stopped scaling so they were having to slowly abandon it, where I made the same comment [1].
Postgres is a great DB for what you pay, but IMHO well capitalized blitzscaling startups shouldn't be using it. If you buy a database - and realistically most Postgres users do anyway as they're paying for a cloud managed db - then you might as well just buy a commercial DB with an integrated queue engine. I have a financial COI because I have a part time job there in the research division (on non-DB stuff), so keep that in mind, but they should just migrate to an Oracle Database. It has a queue engine called TxEQ which is implemented on top of database tables with some C code for efficient blocking polls. It scales horizontally by just adding database nodes whilst retaining ACID transactions, and you can get hosted versions of them in all the major clouds. I'm using it in a project at the moment and it's been working well. In particular the ability to dequeue a message into the same transaction that does other database writes is very useful, as is the exposed lock manager.
Beyond scaling horizontally the nice thing about TxEQ/AQ is that it's a full message queue broker with all the normal features you'd expect. Delayed messages, exception queues, queue browsing, multi-consumer etc. LISTEN/NOTIFY is barely a queue at all, really.
For startups like this, the amount of time, money and morale they are losing with all these constant stories of firefights just doesn't make sense to me. It doesn't have to be Oracle, there are other DBs that can do this too. But "We discovered X about Postgres" is a eng blog cliché by this point. You're paying $$$ to a cloud and GPU vendor anyway, just buy a database and get back to work!
[1] https://news.ycombinator.com/item?id=44074506
None of this means you have to or even should use stored procedures, triggers, or listen/notify. I'm just making the point that there is no clean separation between "data" and "business logic".
I'm unlikely to get it myself today, and by tomorrow I've probably already forgotten it :-(
You're self-aware and are writing about it, why not maintain and add it to your todo list if this is a recurring issue?
In databases where your domain is also your physical data model, coupling business logic to the database can work quite well, if the DBMS supports that.
https://medium.com/@paul_42036/entity-workflows-for-event-dr...
Then why bother with a relational database? Relations and schemas are business logic, and I'll take all the data integrity I can get.
I guess some will argue that their business logic is special and really is so tightly coupled to the data definition that it belongs in the database, and I’m not going to claim those use cases don’t exist, but I’ve seen over-coupling far more often than under-coupling.
This is why I say: Applications come and go, but data is forever.
I'm personally Code is King, and I have my reasons (like everyone else)
Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.
You can either execute SQL in your migration or use add_check_constraint.
I.e. use Kafka unless you have a explicit reason not to?
So why Nats?
It was particularly ironic because Elixir has a fantastic distribution and pubsub story thanks to distributed Erlang. That’s much more commonly used in apps now compared to 5 or so years ago when 40-50% of apps didn’t weren’t clustered. Thanks to the rise of platforms like Fly that made it easier, and the decline of Heroku that made it nearly impossible.
What did you replace them with instead?
Source: Dev at one of the companies that hit this issue with Oban
I wonder if that is fixable, or just inherent to its design.
I'm not sure if it should be salvaged?
Wasn't aware of this AccessExclusiveLock behaviour - a reminder (and shameless plug 2) of how Postgres locks interact: https://leontrolski.github.io/pglockpy.html
(Shameless plug [1]) I'm working on DBOS, where we implemented durable workflows and queues on top of Postgres. For queues, we use FOR UPDATE SKIP LOCKED for task dispatch, combined with exponential backoff and jitter to reduce contention under high load when many workers are polling the same table.
Would love to hear feedback from you and others building similar systems.
[1] https://github.com/dbos-inc/dbos-transact-py
Holding transactions open is an anti-pattern for sure, but it's occasionally useful. E.g. pg_repack keeps a transaction open while it runs, and I believe vacuum also holds an open transaction part of the time too. It's also nice if your database doesn't melt whenever this happens on accident.
I found this out the hard way when I had a simple query that suddenly got very, very slow on a table where the application would constantly do a `SELECT ... FOR UPDATE SKIP LOCKED` and then immediately delete the rows after a tiny bit of processing.
It turned out that with a nearly empty table of about 10-20k dead tuples, the planner switched to using a different index scan, and would overfetch tons of pages just to discard them, as they only contained dead tuples. What I didn't realize is that the planner statistics doesn't care about dead tuples, and ANALYZE doesn't take them into account. So the planner started to think the table was much bigger than it actually was.
It's really important for these uses cases to tweak the autovacuum settings (which can be set on a per-table basis) to be much more aggressive, so that under high load, the vacuum runs pretty much continuously.
Another option is to avoid deleting rows, but instead use a column to mark rows as complete, which together with a partial index can avoid dead tuples. There are both pros and cons; it requires doing the cleanup (and VACUUM) as a separate job.
I also found LISTEN/NOTIFY to not work well at this scale and used a polling based approach with a back off when no work was found.
Quite an interesting problem and a bit challenging to get right at scale.
Additional challenge if jobs comes in funny sizes
In my linked example, on getting the item from the queue, you immediately set the status to something that you're not polling for - does Postgres still have to skip past these tuples (even in an index) until they're vacuumed up?
- The batch size needs to be adaptative for performance, latency, and recovering smoothly after downtime.
- The polling timeouts, frequency etc the same.
- You need to avoid hysteresis.
- You want to be super careful about not disturbing the main application by placing heavy load on the database or accidentally locking tables/rows
- You likely want multiple distributed workers in case of a network partition to keep handling events
It’s hard to get right especially when the databases at the time did not support SKIP LOCKED.
In retrospect I wish I had listened to the WAL. Much easier.
https://www.pgflow.dev/concepts/how-pgflow-works
It both polls (configurable per queue) and supports listen/notify simply to inform workers that it can wake up early to trigger polling, and this can be turned off globally with a notifications=false flag.
[1]: https://github.com/tktech/chancy
Not to mention that pubsub allows multiple consumers for a single message, whereas FOR UPDATE is single consumer by design.
https://github.com/cpursley/walex?tab=readme-ov-file#walex (there's a few useful links in here)
Can’t find the function that does that, and I’ve not seen it used in the wild yet, idk if there’s gotchas
Edit: found it, it’s pg_logical_emit_message
[1] https://speakerdeck.com/gunnarmorling/ins-and-outs-of-the-ou...
[2] https://www.infoq.com/articles/wonders-of-postgres-logical-d...
[3] https://www.morling.dev/blog/mastering-postgres-replication-...
It'd be nice to have a method that would block for N seconds waiting for a new entry.
You can also use a streaming replication connection, but it often is not enabled by default.
Might be a bit tricky to get debezium to decode the logical event, not sure
pg_logical_slot_get_binary_changes returns the same entries as the replication connection. It just has no support for long-polling.
* It gives an indication of how much you need to grow before this Postgres functionality starts being a blocker.
* Folks encountering this issue—and its confusing log line—in the future will be able to find this post and quickly understand the issue.
Of course, you have the people that correctly use em-dashes, too.
For startups, Postgres is a fantastic first choice. But plan ahead: as your workload grows, you’ll likely need to migrate or augment your stack.
It’s unsurprising to me that an AI company appears to have chosen exactly the wrong tool for the job.
I like that. Sounds like a synonym for "Platform Engineering". :-)
I remember being amazed that lambda architecture was considered a kind of reference, when it looked to me more like a workaround.
We like to build IT cathedrals, until we have to run them.
There is work happening currently to make Kafka behave more like a queue: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...
SQS may have been a good "boring" choice for this?
I think it’s a reasonable assumption. Based on the second half of your comment, you clearly don’t think highly of “AI companies,” but I think that’s a separate issue.
…of course, you need dedup/support for duplicate messages on the notify stream if you do this, but that’s table stakes in a lot of messaging scenarios anyway.
This does sacrifice ordering and increases the risk of duplicates in the message stream, though.
In my experience, this means you make sure the polling solution is complete and correct, and the notifier gets reduced to a wake-up signal. This signal doesn't even need to carry the actionable change content, if the poller can already pose efficient queries for whatever "new stuff" it needs.
This approach also allows the poller to keep its own persistent cursor state if there is some stateful sequence to how it consumes the DB content. It automatically resynchronizes and the notification channel does not need to be kept in lock-step with the consumption.
That is tricky due to transactions and visibility. How do you write the poller to not miss events that were written by a long/blocked transaction? You'd have to set the poller scan to a long time (e.g. "process events that were written since now minus 5minutes") and then make sure transactions are cancelled hard before those 5minutes.
If you're not handling that, then whatever you're doing is unreliable either way.
You should split your system into specialized components: - Kafka for event transport (you're likely already doing this). - An LSM-tree DB for write-heavy structured data (eg: Cassandra) - Keep Postgres for queries that benefit from relational features in certain parts of your architecture
Recordings can and should be streamed to an object store. Parallel processes can do transcription on those objects; bonus: when they inevitably have a bug in transcription, retranscribing meetings is easy.
The output of transcription can be a single file also stored in the object store with a single completion message notification, or if they really insist on “near real-time”, a message on a queue for every N seconds. Much easier to scale your queue than your DB, eg Kafka partitions.
A handful of consumers can read those messages and insert into the DB. Benefit is you have a fixed and controllable write load into the database, and your client workload never overloads the DB because you’re buffering that with the much more distributed object store (which is way simpler than running another database engine).
Becomes a problem if you are inserting 40 items to order_items table.
Do you expect it to be faster to do the trigger logic in the application? Wouldn't be slower to execute two statements from the application (even if they are in a transaction) than to rely on triggers?
Opaque to who? If there's a piece of business logic that says "After this table's record is updated, you MUST update this other table", what advantages are there to putting that logic in the application?
When (not if) some other application updates that record you are going to have a broken database.
Some things are business constraints, and as such they should be moved into the database if at all possible. The application should never enforce constraints such as "either this column or that column is NULL, but at least one must be NULL and both must never be NULL at the same time".
Your database enforces constraints; what advantages are there to code the enforcement into every application that touches the database over simply coding the constraints into the database?
If each tenant gets an instance I would call that a “shard” but in that pattern there’s no need for cross-shard references.
Maybe in the analytics stack but that can be async and eventually consistent.
What I already know
- Unique indexes slow inserts since db has to acquire a full table lock
- Case statements in Where break query planner/optimizer and require full table scans
- Read only postgres functions should be marked as `STABLE PARALLEL SAFE`
An INSERT never results in a full table lock (as in "the lock would prevent other inserts or selects on the table)
Any expression used in the WHERE clause that isn't indexed will probably result in a Seq Scan. CASE expressions are no different than e.g. a function call regarding this.
A stable function marked as "STABLE" (or even immutable) can be optimized differently (e.g. can be "inlined"), so yes that's a good recommendation.
My other reference for a slightly different problem is https://www.thatguyfromdelhi.com/2020/12/what-postgres-sql-c...
Features that seem harmless at small scale can break everything at large scale.
However, in 2025 I'd pick Redis or MQTT for this kind of role. I'm typically in multi-lamg environments. Is there something better?
That's where we use it at my work. We have host/networking deployment pipelines that used to have up to one minute latency on each step because each was ran on a one-minute cron. A short python script/service that handled the LISTENing + adding NOTIFYs when the next step was ready removed the latency and we'll never do enough for the load on the db to matter
1) the Postgres documentation does not mention that Notify causes a global lock or lock of any sort (I checked). That’s crazy to me; if something causes a lock, the documentation should tell you it does and what kind. Performance notes also belong in documentation for dbs.
2) why the hell does notify require a lock in the first place? Reading the comment this design seems insane; there’s no good reason to queue up notifications for transactions that aren’t committed. Just add the notifications in commit order with no lock, you’re building a db with concurrency, get used to it.
The post author is too focused on using NOTIFY in only one way.
This post fails to explain WHY they are sending a NOTIFY. Not much use telling us what doesn’t work without telling us the actual business goal.
It’s crazy to send a notify for every transaction, they should be debounced/grouped.
The point of a NOTIFY is to let some other system know something has changed. Don’t do it every transaction.
Like if it needs to be very consistent I would use an unlogged table (since we're worried about "scale" here) and then `FOR UPDATE SKIP LOCKED` like others have mentioned. Otherwise what exactly is notify doing that can't be done after the first transaction?
Edit: in-fact, how can they send an HTTP call for something and not be able to do a `NOTIFY` after as well?
One possible way I could understand what they wrote is that somewhere in their code, within the same transaction, there are notifies which conditionally trigger and it would be difficult to know which ones to notify again in another transaction after the fact. But they must know enough to make the HTTP call, so why not NOTIFY?
They’re using it wrong and blaming Postgres.
Instead they should use Postgres properly and architect their system to match how Postgres works.
There’s correct ways to notify external systems of events via NOTIFY, they should use them.
What were the TPS numbers? What was the workload like? How big is the difference in %?
''' When a NOTIFY query is issued during a transaction, it acquires a global lock on the entire database (ref) during the commit phase of the transaction, effectively serializing all commits. '''
Am I missing something - this seems like something the original authors of the system should have done due diligence on before implementing a write heavy work load.
The documentation doesn’t mention any caveats in this direction, and they had 3 periods of downtime in 4 days, so I don’t think it’s a given that testing would have hit this problem.
cool writeup!
Use LISTEN/NOTIFY. You will get a lot of utility out of it before you’re anywhere close to these problems.
I feel like somebody needs to write a book on system architecture for Gen Z that's just filled with memes. A funny cat pic telling people not to use the wrong tool will probably make more of an impact than an old fogey in a comment section wagging his finger.
Databases can do a lot of stuff, and if you're not hurting for DB performance it can be a good idea to just... do it in the database. The advantage is that, if the DB does it, you're much less likely to break things. Putting data constraints in application code can be done, but then you're just waiting for the day those constraints are broken.
The people who design it walk away after a few years, so they don't give a crap what happens. The rest of us have to struggle to support or try to replace whatever the lumbering monstrosity is.
You'd have to at least accompany your memes with empirics. What is write-heavy? A number you might hit if your startup succeeds with thousands of concurrent users on your v1 naive implementation?
Else you just get another repeat of everyone cargo-culting Mongo because they heard that Postgres wasn't web scale for their app with 0 users.
Maybe I missed it in some folded up embedded content, or some graph (or maybe I'm probably just blind...), but is it mentioned at which point they started running into issues? The quoted bit about "10s of thousands of simultaneous writers" is all I can find.
What is the qualitative and quantitative nature of relevant workloads? Depending on the answers, some people may not care.
I asked ChatGPT to research it and this is the executive summary:
Maybe you also don't know what ChatGPT Research is (the Enterprise version, if you really need to know), or what Executive Summary implies, but here's a snippet of the 28 sources used:
https://imgur.com/a/eMdkjAh