Matrix.org – Database Incident

7 yabones 9 9/2/2025, 7:13:51 PM status.matrix.org ↗

Comments (9)

yabones · 3h ago
> So: the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We're trying to restore the primary DB FS (which could be fastish), while also doing a point-in-time backup restore from last night (which takes >10h). We believe the incremental DB traffic since last night is intact however. Apologies for the downtime; folks on their own homeserver are of course not impacted.

The stuff of absolute nightmares...

https://mastodon.matrix.org/@matrix/115136245785561439

Bender · 30m ago
10 hours seems like a long time for a db restore of a chat server. Matrix is still just a chat server, right? I have so many questions that maybe I should keep my nose out of.

[Edit] From another comment, 55TB?!? Holy wat-man...

q3k · 4h ago
> We are in the process of restoring the matrix.org database from a backup. The matrix.org homeserver will be offline until this has been completed

whoops

jMyles · 56m ago
Love y'all and love matrix. Thanks for the free matrix.org server. But maybe now is the time to research setting up a home server for mission critical stuff?

I don't immediately see an official doc on this; is it right under my nose?

Is this doc good? https://www.redpill-linpro.com/techblog/2025/04/08/matrix-ba...

Arathorn · 44m ago
Matrix.org itself doesn't publish an 'official' way to run a server, given there are multiple implementations and distros out there.

If you're happy using kubernetes, https://element.io/server-suite/community should be a good bet (or https://element.io/server-suite/pro if you are actually doing mission-critical stuff and want a version professionally supported by Element)

If you're happy using docker-compose, then https://github.com/element-hq/element-docker-demo is a very simple template for getting going.

Alternatively, https://github.com/spantaleev/matrix-docker-ansible-deploy is quite popular as a 3rd-party distro using ansible-managed docker containers.

Sorry all for the downtime on matrix.org - we're having to do a full 55TB db restore from backup which will take ~17 hours to run. :|

yabones · 4h ago
Best of luck to the team at Matrix/Element for restoring from a nasty outage.
mostlyk · 4h ago
Hope this gets done sooner, horrible outage
q3k · 4h ago
Now would be a good time to migrate to a different homeserver :).
bakugo · 42m ago
Looks like it's going to take a while to come back up.

> Sorry, but it's bad news: we haven't been able to restore the DB primary filesystem to a state we're confident in running as a primary (especially given our experiences with slow-burning postgres db corruption). So we're having to do a full 55TB DB snapshot restore from last night, which will take >10h to recover the data, and then >4h to actually restore, and then >3h to catch up on missing traffic.

https://mastodon.matrix.org/@matrix/115136866878237078