The Sad, Sad World of Tech Blogging During an Era of Technological Stagnation (freddiedeboer.substack.com)

4 points by PaulHoule 35m ago 1 comments

Orange Pi RV2 $40 RISC-V SBC: Friendly Gateway to IoT and AI Projects (riscv.org)

98 points by warrenm 3d ago 102 comments

Hypervisor 101 in Rust (tandasat.github.io)

159 points by pykello 18h ago 16 comments

M7.8 Earthquake – Kamchatka, Russia (earthquake.usgs.gov)

5 points by david927 19m ago 1 comments

Stepping Down as Libxml2 Maintainer (discourse.gnome.org)

161 points by zdw 19h ago 87 comments

Collaborative Text Editing Without CRDTs or OT

284 samwillis 79 5/21/2025, 5:05:45 PM mattweidner.com ↗

Comments (79)

Animats · 119d ago

That is very neat. The algorithm:

- Label each text character with a globally unique ID (e.g., a UUID), so that we can refer to it in a consistent way across time - instead of using an array index that changes constantly.

- Clients send the server “insert after” operations that reference an existing ID. The server looks up the target ID and inserts the new characters immediately after it.

- Deletion hides a character for display purposes, but it is still kept for "insert after" position purposes.

This might have potential outside text editing. Game world synchronization, maybe.

jahewson · 119d ago

This is literally a degenerate CRDT. Central server for tie-breaking goes back to Google Wave.

bikeshaving · 119d ago

Any algorithm that uses a central server as a tie-breaker could easily be replaced by one where client ids are used for the tie-breaker.

jonstokes · 119d ago

If you used UUIDv7 you get time-ordered UUID and could use that for a tie breaker.

yanowitz · 119d ago

Don’t you have to be confident the clocks are sufficiently synced across the system?

boxed · 119d ago

And you know.. simultaneity is not a thing in reality...

ChadNauseam · 119d ago

It's even less of a thing in distributed systems, but you have to pick something to show the user and clocks are often good enough :D

eiginn · 119d ago

I miss Wave a lot, very quirky in a good way imo. We ran a few D&D games over it. RIP

yubblegum · 119d ago

Is this really that novel? I mean using a central process for serializing a distributed system is like a no brainer -- didn't we start off from here originally? -- until you have to worry about network partitions, and CAP and all that jazz. You also now have a single point of failure. Also I skimmed the thing but was performance discussed?

motorest · 118d ago

> Is this really that novel? I mean using a central process for serializing a distributed system is like a no brainer -- didn't we start off from here originally?

Yes. This article reads like the adage "a month in a lab saves you an hour in a library".

bongodongobob · 119d ago

Yeah I have the same question. I'm not familiar with the problem space but this seems like my naive first idea so I'm wondering what the catch is.

mweidner · 119d ago

As the author, same.

My best guess is:

- Central-server collaborative editing work focuses on Operational Transformation (OT), likely due to inertia (studied since 1989) and the perception that storing an ID per character is inefficient. In fairness, it is, absent the optimizations introduced by RGASplit and Yjs (~2015).

- For decentralized editing, OT is very complicated, and CRDTs took over as the solution of interest (studied since 2005). Storing every operation permanently in a log - needed to use the linked approach without a server - feels inefficient, as does server reconciliation's undo/re-apply process. So CRDT research has focused on avoiding those inefficiencies, sacrificing simplicity along the way, instead of just embracing them as the easy way out.

To me, the "inefficiencies" seem quite manageable. Storage is cheap, text is small, and you probably want a complete op log anyway for auditing and document histories (cf. git). Server reconciliation's undo/re-apply process can be batched aggressively, e.g., only do it a few times per second; that just makes remote ops take a little longer to show up.

Granted, I have not built a complete app around server reconciliation or the linked approach, so perhaps there is a hidden catch. But I am encouraged by the success of Replicache (https://doc.replicache.dev/concepts/how-it-works), which is where I learned of server reconciliation.

kamranjon · 119d ago

I wonder if you could sidestep the inefficiencies of storing ID's for every character by using redis streams to store the characters with id's "represented as delta-compressed macro nodes that are linked together by a radix tree" where the ids are "monotonically incrementing and has two parts: <time>-<counter>. The time is in milliseconds and the counter increases for entries generated in the same milliseconds"

This would seem to avoid clashes and also compress the identifier storage in an efficient way.

https://antirez.com/news/128

mweidner · 118d ago

This sounds similar to the idea behind articulated (though with ids UUID-counter instead of time-counter): https://github.com/mweidner037/articulated

I will check out Antirez.

thebeardisred · 119d ago

I really want to believe you were trying to make a reference to cap'n jazz - https://en.m.wikipedia.org/wiki/Cap'n_Jazz

yubblegum · 119d ago

Never heard of them

sdeframond · 119d ago

What you describe is a CRDT, isn't it ?

ori_b · 119d ago

No; there is no single consistent final state that the system must converge to if the parts go offline. If you have this document:

   a{uuid=1}

and two clients send the following operations:

   b{uuid=2} insert-after{uuid=1}
   c{uuid=3} insert-after{uuid=1}

then the following two documents are both valid final states:

   abc
   acb

That's fine as long as you have an authoritative server that observes all events in a single order and a way to unwind misordered local state, but it means that it's not a CRDT.

YoumuChan · 119d ago

Like Raft is a "special case" of Paxos, this feels like a "special case" of CRDT.

It has all the flavor of CRDT, but adds a leader and a different way for the total ordering (basically using leader's local lamport clock to break tie).

Throw in leader reelection and some ledger syncing and then give everything some other names, I bet you can have "collaborative text editing on one page".

josephg · 119d ago

Yeah. It’s also quite inefficient by default - because you’re storing deleted items and you have a uuid per character. And you usually want the other parts from a crdt which this discards. Like, a consistent ordering rule for siblings. Doing so barely adds any code (like 10 lines or so) and it makes the system way more useful.

I don’t really understand the benefit of doing this when text CRDTs are small, simple and fast.

nextaccountic · 117d ago

> Like Raft is a "special case" of Paxos

Hey, can you elaborate? Googling this I can find this other HN comment https://news.ycombinator.com/item?id=32472189 that states the same thing, but nothing more

motorest · 119d ago

> No; there is no single consistent final state that the system must converge to if the parts go offline.

Sounds like a naive implementation of delta state CRDT. I mean, what if the author has this major epiphany that it's nice to sync states to get convergence?

sdeframond · 119d ago

What about ordering concurrent operations by id? Then "abc" is the only consistent final state.

I get what you mean though, having a central authority greatly relaxes the requirement.

3cats-in-a-coat · 119d ago

We could generalize having a tree of known authorities upstream with what a CRDT does, resulting in both being two special cases of a more general consistent event processing model. CRDT makes the order of events commutative, hence the "authority" becomes a property of the math itself, rather than a physical service.

rendaw · 119d ago

How do you determine which operations are concurrent?

tangjurine · 118d ago

if there are multiple valid final states?

anon291 · 119d ago

Yeah, so this is an operational transform where the transform operation is just identity.

Yoric · 119d ago

If you have the additional semantics that says "operation with lowest uuid gets applied first", don't you essentially get a CRDT?

I mean, a uuid is kind of a poor man's Lamport clock, isn't it?

ori_b · 119d ago

Yes, you can extend the algorithm that was described with additional semantics, and you can turn it into a CRDT.

pcthrowaway · 119d ago

ctrl+a

ctrl+x

ctrl+v

Good luck

apgwoz · 119d ago

Isn’t this a NOP?

I think you could batch these things though. You just need a slightly more advanced protocol. If B is the first id, and L is the last, create range(B, L) as R and insert R after L (assuming Ctrl-v a second time).

motorest · 118d ago

> Isn’t this a NOP?

I think that's the point: conceptually it's a co-op, but the implementation forces each character to be individually removed following a ctrl-x, and then individually inserted back (after issuing a new UUID) following a ctrl-v.

Now, the outcome of neither a Ctrl+x nor a Ctrl+v is atomic, which means communicating each character operation through the wire takes time. In turn this means that this is a scenario where conflicts will take place (someone is editing text midway through another user does ctrl-a,ctrl-x) and where the choice of operation will result in important user impacts.

This is a basic test scenario of CRDTs, and why research focuses on higher-level operations instead of atomic character-level operations.

It should be noted that ctrl-a,ctrl-x,ctrl-v is an extreme case of a very frequent operation: copying and pasting text. Editor-level undo/redo can manifest the same way, too.

hem777 · 120d ago

Cool to see a write up on this! Discovered the same method years ago and also wondered why it doesn’t show up in academic literature.

I implemented this in a decentralized context and as a CRDT though, so that the properties hold (commutative, idempotent and associative).

k__ · 120d ago

If the idea is to have an alternative to CRDT, what did you gain from building it as CRDT?

hem777 · 120d ago

It wasn’t really that we wanted to have a CRDT per se, but as it was implemented on top of an op-based append-only log CRDT, it turned out to hold those properties, which makes it a CRDT. We wanted to have the edits to be able to arrive in any order or after delays due to network partitions (this was for a p2p network).

Footkerchief · 120d ago

Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.

The motivating examples (update validation, partial loading, higher-level operations) are interesting, but I don't see a strong case that the reason Yjs etc. lack these features is the underlying CRDT implementation, as opposed to these features being intrinsically difficult to build.

filleokus · 119d ago

> Surprised to see no discussion of other data structures like dicts/maps, or arrays of arbitrary type. Hopefully they'd be a straightforward extension. IME, apps need collaborative data structures more often than they need pure collaborative text editing.

Totally agree. I guess an array of "atomic" objects, where the properties of the objects can't be changed can be done just by replacing the string with your own type. Changes inside of the object is probably trickier, but maybe it's just about efficiently storing/traversing the tree?

I've also always thoguth it should be possible to create something where the consumer of the helper library (per OP terminology) can hook in their own lightweight "semantic model" logic, to prevent/manage invalid states. A todo item can't both have isDone: true and state: inProgress at the same time. Similar to rich text formatting semantics mentioned in the linked article.

SkiFire13 · 119d ago

CRDTs essentially work by deterministically picking one side when a conflict arises. The issue is that in general this does not guarantee the lack of data loss nor data being valid (you can resolve the conflict between two pieces of valid data and get invalid data as a result).

Imagine if every git merge conflict you got was resolved automatically by picking one side. Most of the time it would do the wrong thing, sometimes even leading to code that fails to compile. Imagine then you were not there ready to fix the issue, it would lead to even more chaotic results!

This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.

And arguably they make this issue even worse because they restrict the ways you can solve these conflicts to only those that can be replicated deterministically.

josephg · 119d ago

> This is why CRDTs are not more widespread, because they only fix the problem you think you have, not the problem you actually have, which is to fix conflicts in a way that preserves data, its validity and meaning.

I’ve been saying this for years, but there’s no reason you couldn’t make a crdt which emitted conflict ranges like git does. CRDTs have strictly more information than git when merging branches. It should be pretty easy to make a crdt which has a “merge and emit conflicts” mode for merging branches. It’s just nobody has implemented it yet.

(At this point I’ve been saying this for about 5 years. Maybe I need to finally code this up if only to demonstrate it)

hansworst · 119d ago

Automerge has this: https://automerge.org/automerge/api-docs/js/functions/getCon...

josephg · 119d ago

Reading the docs, it looks like that only works for object keys that have been concurrently set to different values. Not text documents.

motorest · 118d ago

> I’ve been saying this for years, but there’s no reason you couldn’t make a crdt which emitted conflict ranges like git does.

I don't get your point. The C in CRDT stands for "conflict-free". Why would a CRDT have conflict ranges?

josephg · 118d ago

Because automatic merging isn’t always perfect. Especially when merging long lived changes to code in git, sometimes you need manual intervention to figure out the result. And we need to manually intervene sometimes.

SkiFire13 · 119d ago

That might be an improvement over git, but not automatically fixing the conflicts will be a dealbreaker for most people.

Fundamentally people want something that automatically fixes conflicts, and do so the way they expect it to, but this just doesn't exist yet.

practal · 119d ago

Ok, so the main point that makes it different from CRDTs seems to be: if you have a central server, let the server do the synchronization (fixing an order among concurrent events), and not the data structure itself via an a-priori order.

Because all communication is between client and server, and never between clients, when the client connects to the server, the server can make sure that it first processes all of the client's local operations before sending it new remote updates.

math_dandy · 120d ago

Is the take-home message of the post that the full complexity of CRDTs/OT is necessary only in the absence of a central server?

mweidner · 119d ago

Even in the absence of a central server, you can still avoid CRDT/OT complexity if you have a decentralized way to eventually total order operations & apply them in that order: https://mattweidner.com/2025/05/21/text-without-crdts.html#d...

As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.

math_dandy · 119d ago

Did you perhaps mean to write, “though not a fully general one”?

mweidner · 119d ago

A decentralized, eventually consistent total order on operations is a fully general CRDT, in the sense that you can put whatever (deterministic) operations you want in the total order and clients will end up in eventually consistent states.

Whether the converged result is at all reasonable is a different question.

math_dandy · 119d ago

Ok, now I understand your reply. In retrospect, you were being perfectly clear all along and my confusion was due to me not appreciating the precise definition of a CRDT. Thanks!

motorest · 118d ago

> Is the take-home message of the post that the full complexity of CRDTs/OT is necessary only in the absence of a central server?

That's the whole point of CRDTs: multiple replicas of the same data structure are managed throughout many nodes, each replica is updated independently, and they all eventually converge.

sampton · 119d ago

OT requires centralized server.

n_u · 119d ago

Some OTs do, some don't. OTs with the TP2 property do not require a central authority to order edits I believe.

In my experience if you are persisting your edits or document state, you have something that creates an ordering anyways. That thing is commonly an OLTP database. OLTPs are optimized for this kind of write-heavy workload and there's a lot of existing work on how to optimize them further.

But now even S3 has PUT-IF, so you could use that to create an ordering. https://docs.aws.amazon.com/AmazonS3/latest/userguide/condit...

hencq · 120d ago

I'm not an expert on this, but the main difference with a CRDT like Automerge seems to be the server reconciliation. See for example this article [1]. Automerge handles concurrent insertions by using a sequence number and relying on an agreed ordering of agent ids when insertions are concurrent, while this scheme relies on the server to handle them in the order they come in.

The article mentions this:

> This contrasts with text-editing CRDTs, in which IDs are ordered for you by a fancy algorithm. That ordering algorithm is what differs between the numerous text-editing CRDTs, and it’s the complicated part of any CRDT paper; we get to avoid it entirely.

I can buy the idea that many apps have a central server anyway, so you can avoid the "fancy algorithm", though the server reconciliation requires undoing and replaying of local edits, so it's not 100% clear to me if that's much simpler.

[1] https://josephg.com/blog/crdts-go-brrr/

hem777 · 120d ago

Agreed, the undoing and replying isn’t exactly non-fancy. Peristent B+Tree is also not very non-fancy.

mweidner · 120d ago

I believe that Automerge internally stores all operations in an eventually consistent total order, which you can use as a substitute for the server in server reconciliation (cf. https://mattweidner.com/2025/05/21/text-without-crdts.html#d...). However, Automerge does not actually do that - instead it processes text operations using a traditional CRDT, RGA, probably because (as you point out) undoing and replaying ops is not trivial to implement.

asdffdasy · 120d ago

so, an unoptimized crdt? just set max set size to 1 and yolo?

th0ma5 · 120d ago

It's very appealing as a kind of irreducible complexity. It feels close to what is actually happening and is simple hahah if like you say not optimized.

pshc · 119d ago

Use of server reconciliation makes me think client-side reconciliation would be tricky… how do you preserve smooth editor UX while applying server updates as they arrive?

For example, if your client-sent request to insert a character fails, do you just retry the request? What if an update arrived in the intervening time? (Edit: they acknowledge this case in the “Client-Side” section, the proposal is to rewind and replay, and a simpler proposal to block until the pending queue is flushed)

From a frontend vantage I feel like there may be a long tail of underspecified UI/UX edge cases, such that CRDT would be simpler overall. And how does the editor feel to use while riding the NYC subway where coverage is spotty?

straws · 119d ago

Both ProseMirror and the newer version of CodeMirror have a pretty elegant solution to this: each modification to the document is modeled as a step that keeps track of indices instead of node/text identities, and uses a data structure called a "position map" that the buffered steps can be mapped through to create steps with updated positions, which are then applied to your document.

In practice, it works quite well. Here's more info:

https://marijnhaverbeke.nl/blog/collaborative-editing.html https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...

hbbio · 119d ago

Someone should try to use a local LLM (maybe a 4b) to merge the diffs in case of conflict beyond straightforward cases...

Not energy efficient but should work surprisingly well without CRDT, OT, or anything else.

gregoriol · 119d ago

How is an LLM supposed to merge the example from the article "My name is", insert "Charlie" after "is", insert "Dave" after "is"?

coolcase · 119d ago

Is collaborative text editing with offline sync a nerd snipe [0]? I work for a big tech and write a lot and usually worse case someone else edits at the same time and the server can figure it out. yes it needs some kind of algo but most concurrent edits are on different parts of a huge doc.

Compare this to Git workflows. Git already handles merging most changes seamlessly.

[0] https://xkcd.com/356/

uh2010 · 120d ago

wonder if there would be a perf gain with UUIDv7

pshc · 119d ago

They do discuss optimization of the IDs at the bottom of the article.

poly2it · 120d ago

Why not just use an ever incrementing u64?

kortex · 120d ago

Then you need central coordination, either a single central server containing the counter, or something like Snowflake where you have multiple counters, each assigned orthogonal blocks ahead of time (that need to coordinate with a central server).

UUIDs/ULIDs/etc are fully distributed, you can have two clients assign an ID without coordinating with ~0% of collision.

poly2it · 120d ago

You could also split the u64 to have the first 24 bits be unique to the client, and the last 40 be unique to the produced character. This would still allow 1 TiB of data produced per user and session. The single mutex would be the user ID counter.

zamadatix · 120d ago

An incrementing u64 requires either atomic increments between concurrent clients or recalculation logic to consistently find the newly incremented ID after conflicting information syncs. UUIDs just spit out a unique ID without any complexity or associations with other clients.

ChadNauseam · 120d ago

There's closely related idea that might work, though. Each device editing text could be assigned a 32-bit ID by the server (perhaps auto-incrementing). Devices then maintain a separate 32-bit ID that they increment for each operation they perform. The ID used for each character is (device_id, edit_id), which should fit nicely in 8 bytes.

mweidner · 120d ago

Indeed, this is close to what Yjs (popular CRDT library) does: each client instance (~ browser tab) chooses a random 32-bit clientId, and character IDs combine this clientId with local counters. https://github.com/yjs/yjs/blob/987c9ebb5ad0a2a89a0230f3a0c6...

Any given collaborative document will probably only see ~1k clientIds in its lifetime, so the odds of a collision are fairly low, though I'd be more comfortable with a 64-bit ID.

apt-apt-apt-apt · 120d ago

Does this finally solve collaborative text editing and its friends?

Such an awesome approach.

criddell · 119d ago

Could it? If you and I are simultaneously editing a list of people and you are trying to order them by age and I'm ordering them alphabetically, we are still going to have to reconcile the result.

shermantanktop · 119d ago

By that definition, it's not a solvable problem, is it?

criddell · 119d ago

I don’t think so.

AFAIK, CRDTs don’t solve semantic conflict issues.

lewisjoe · 119d ago

This is technically a CRDT. It's just that the "order of operations" to apply over the doc is now resolved using a central server. For context, this is exactly how Google Docs and Zoho Writer works currently. Except that, they use OT with central-server based reconciliation and the proposal uses CRDT-istic approach.

I agree this is more practical if your service anyway run over centralised servers (aka cloud)

motorest · 118d ago

> This is technically a CRDT. It's just that the "order of operations" to apply over the doc is now resolved using a central server.

This is not true. One of the most basic traits of CRDTs is that updates are performed without coordination. Relying on a central server to coordinate conflict resolution rejects two of the main traits of CRDTs.

There is a name-drop, but there is no substance.

Learn Your Way: Reimagining Textbooks with Generative AI (research.google)

Nvidia buys $5B in Intel (tomshardware.com)

This map is not upside down (maps.com)

Configuration files are user interfaces (ochagavia.nl)

When Knowing Someone at Meta Is the Only Way to Break Out of "Content Jail" (eff.org)

TernFS – An exabyte scale, multi-region distributed filesystem (xtxmarkets.com)

Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com)

Flipper Zero Geiger Counter (kasiin.top)

tldraw SDK 4.0 (tldraw.dev)

KDE is now my favorite desktop (kokada.dev)

Luau – Fast, small, safe, gradually typed scripting language derived from Lua (luau.org)

American Prairie unlocks another 70k acres in Montana (earthhope.substack.com)

The quality of AI-assisted software depends on unit of work management (blog.nilenso.com)

OpenTelemetry Collector: What It Is, When You Need It, and When You Don't (oneuptime.com)

Aaron Levie: Startups win in the AI era [video] (youtube.com)

PostgreSQL Maintenance Without Superuser (boringsql.com)

OneDev – Self-hosted Git server with CI/CD, Kanban, and packages (onedev.io)

I Built an Event-Sourcing Database Engine: Meet Genesis DB (genesisdb.io)

Slack has raised our charges by $195k per year (skyfall.dev)

Chrome's New AI Features (blog.google)

Midcentury North American Restaurant Placemats (casualarchivist.substack.com)

Grief gets an expiration date, just like us (bessstillman.substack.com)

Show HN: One prompt generates an app with its own database (manyminiapps.com)

CERN Animal Shelter for Computer Mice (2011) (computer-animal-shelter.web.cern.ch)

Automatic differentiation can be incorrect (stochasticlifestyle.com)

The Day the Linter Broke My Code (blog.fillmore-labs.com)

CircuitHub (YC W12) Is Hiring Operations Research Engineers (UK/Remote) (ycombinator.com)

Rereading books (maxgirkins.com)

Pnpm has a new setting to stave off supply chain attacks (pnpm.io)

Cluely Thesis on Virality and Hype (cluely.com)

This website has no class (aaadaaam.com)

Fast Fourier Transforms Part 1: Cooley-Tukey (connorboyle.io)

Ask HN: Dark Mode for HN?

Linking to text fragments with a bookmarklet (alexwlchan.net)

Hard Disk Unexpectedly Not Dead (blog.dshr.org)

Mirror Life Worries (science.org)

Google confirms fraudulent account created in law enforcement portal (bleepingcomputer.com)

WASM 3.0 Completed (webassembly.org)

Keeping SSH sessions alive with systemd-inhibit (kd8bny.com)

Show HN: The text disappears when you screenshot it (unscreenshottable.vercel.app)

The Newest Face of Long-Term Unemployment? The College Educated (nytimes.com)

Employees Share Why They Quit on the First Day (buzzfeed.com)

Apple iPhone 17 Review (tomsguide.com)

Samsung forces ads onto fridges; is a bad sign for other appliances (arstechnica.com)

Meta Ray-Ban Display (meta.com)

The Sad, Sad World of Tech Blogging During an Era of Technological Stagnation (freddiedeboer.substack.com)

Orange Pi RV2 $40 RISC-V SBC: Friendly Gateway to IoT and AI Projects (riscv.org)

Hypervisor 101 in Rust (tandasat.github.io)

M7.8 Earthquake – Kamchatka, Russia (earthquake.usgs.gov)

Stepping Down as Libxml2 Maintainer (discourse.gnome.org)

Collaborative Text Editing Without CRDTs or OT

Comments (79)