Preserving Order in Concurrent Go Apps: Three Approaches Compared

69 destel 21 9/1/2025, 6:14:31 AM destel.dev ↗

Comments (21)

destel · 4h ago
UPD.

I've just made a small but important clarification to the article. While in many cases it's easier and even preferred to calculate all results, accumulate them somewhere, then sort; this article focuses on memory bound algorithms that support infinite streams and backpressure.

latchkey · 4h ago
Thanks, but I'd still use a queue over this solution.

Real-time Log Enrichment: perfect for my example [0], you're firing off endless tasks. RT logs have a timestamp.

Finding the First Match in a File List: Files tend to be static. I'd use a queue to first build an index and then a queue to process the index.

Time Series Data Processing: Break the data into chunks, you mention 600MB, which isn't that big at all given that Cloud Run memory maxes out at 32GB.

[0] https://news.ycombinator.com/item?id=45094387

tetraodonpuffer · 8h ago
Thanks for the write up! In my current application I have a few different scenarios that are a bit different from yours but still require processing aggregated data in order

1. Reading from various files where each file has lines with a unique identifier I can use to process in order: I open all the files and create a min heap reading the first line of each, then process by grabbing the lowest from the min-heap repeatedly, after reading a line from a file, I read another and put it in the min-heap again (the min heap cells contain the opened file descriptor for that file)

2. Aggregating across goroutines that service data generators with different latencies and throughputs. I have a goroutine each that interfaces with them and consider them “producers”. Using a global atomic integer I can quickly assign a unique increasing index to the messages coming in, these can be serviced with a min-heap same as above. There are some considerations about dropping too old messages, so an alternative approach for some cases is to index the min-heap on received time and process only up to time.Now()-some buffering time to allow more time for things to settle before dropping things (trading total latency for this).

3. Similar to the above I have another scenario where throughput ingestion is more important and repeated processing happens in-order but there is no requirement on all messages to have been processed every time, just that they are processed in order (this is the backing for a log viewer). In this case I just slab allocate and dump what I receive without ordering concerns but I also keep a btree with the indexes that I iterate over when it’s time to process. I originally had this buffering like (2) to guarantee mostly ordered insertions in the slabs themselves (which I simply iterated on) but if a stall happened in a goroutine then shifting over the items in the slab when the old items came in became very expensive and could spiral badly.

destel · 5h ago
Wow, that’s some seriously sophisticated stuff - it’s not that often you see a heap used in typical production code (outside of libraries)!

Your first example definitely gives me merge-sort vibes - a really clean way to keep things ordered across multiple sources. The second and third scenarios are a bit beyond what I’ve tackled so far, but super interesting to read about.

This also reminded me of a WIP PR I drafted for rill (probably too niche, so I’m not sure I’ll ever merge it). It implements a channel buffer that behaves like a heap - basically a fixed-size priority queue where re-prioritization only happens for items that pile up due to backpressure. Maybe some of that code could be useful for your future use cases: https://github.com/destel/rill/pull/50

tetraodonpuffer · 5h ago
Hah not sure about “production”, I am currently in between jobs and am taking advantage of that to work on a docker/k8s/file TUI log viewer.

I am using those techniques respectively for loading backups (I store each container log in a separate file inside a big zip file, which allows concurrent reading without unpacking) and for servicing the various log producing goroutines (which use the docker/k8s apis as well as fsnotify for files) since I allow creating “views” of containers that consequently need to aggregate in order. The TUI itself, using tview, runs in a separate goroutine at configurable fps reading from these buffers.

I have things mostly working, the latest significant refactoring was introducing the btree based reading after noticing the “fix the order” stalls were too bad, and I am planning to do a show hn when I’m finished. It has been a lot of fun going back to solo-dev greenfield stuff after many years of architecture focused work.

I definitely love golang but despite being careful and having access to great tools like rr and dlv in goland, it can get difficult sometimes to debug deadlocks sometimes especially when mixing channels and locks. I have found this library quite useful to chase down deadlocks in some scenarios https://github.com/sasha-s/go-deadlock

destel · 8h ago
Hi everyone, I’m the author of the article. Happy to answer any questions or discuss concurrency patterns in Go. Curious how others tackle such problems.
kunley · 2h ago
chan chan Foo seems like a cool trick, looking forward to use it in the code; thanks for the idea.

PS. I realize you present even better solution; still, first version seems like a thing nice enough to have in a toolbox

destel · 2h ago
Thanks. This replyTo pattern is very similar to promises in other languages.
Traubenfuchs · 7h ago
> Curious how others tackle such problems.

What do you think about the order preserving simplicity of Java?

  List<Input> inputs = ...;

  List<Output> results = inputs.parallelStream()
                             .map(this::processTask)
                             .collect(toList());
If you want more control or have more complex use cases, you can use an ExecutorService of your choice, handle the futures yourself or get creative with Javas new structured concurrency.
Groxx · 6h ago
Their planned semantics don't allow for that - there's no backpressure in that system, so it might race ahead and process up to e.g. item 100 while still working on item 1.

If everything fits in memory, that's completely fine. And then yeah, this is wildly overcomplicated, just use a waitgroup and a slice and write each result into its slice index and wait for everything to finish - that matches your Java example.

But when it doesn't fit in memory, that means you have unbounded buffer growth that might OOM.

destel · 7h ago
I haven’t used Java for about a decade, so I’m not very familiar with streams api.

Your snippet looks good and concise.

One thing I haven’t emphasized enough in the article is that all algorithms there are designed to work with potentially infinite streams

kamranjon · 7h ago
Often in go I’ll create some data structure like a map to hold the new value keyed by the original index (basically a for loop with goroutines inside that close over the index value) - then I just reorder them after waiting for all of them to complete.

Is this basically what Java is doing?

I think that maybe the techniques in this article are a little more complex, allowing you to optimize further (basically continue working as soon as possible instead of just waiting for everything to complete and reordering after the fact) but I’d be curious to know if I’ve missed something.

gleenn · 7h ago
It's a reasonable solution. The problem with this solution is mentioned in the article, you necessarily have the worst case memory usage because you have to store everything in the map first. If you don't have too much to store, it will work.
abtinf · 8h ago
Another scenario where order matters is in Temporal workflows. Temporal’s replay capability requires deterministic execution.
Groxx · 5h ago
That's a rather special case: they and Cadence control when calls into their code unblocks, and they use that to run your code as if it was a single-threaded event loop. That way, the stuff they do can be deterministic while simulating parallel execution (but only concurrency).
latchkey · 5h ago
For something like this, I would instinctively reach for an external queue mechanism instead of trying to work through the complexity of golangs concurrency.

Create a bunch of sequentially numbered jobs that then update their output into postgres database. Then have N number of workers process the jobs. Something like GCP's CloudTasks is perfect for this because the "workers" are just GCP Cloud Functions, so you can have a near infinite number of them (limited by concurrent DB connections).

This approach also buys you durability of the queue for free (ie: what happens when you need to stop your golang process mid queue?).

Then it is just a query:

  select * from finished_jobs order by job_num;
destel · 4h ago
I've just made a small but important clarification to the article. While in many cases it's easier and even preferred to calculate all results, accumulate them somewhere, then sort; this article focuses on memory bound algorithms that support infinite streams and backpressure.
candiddevmike · 6h ago
Personally, I've come to really hate channels in Go. They are a source of some seriously heinous deadlock bugs that are really hard to debug, and closing channels in the wrong spot can crash your entire app. I try using plain locks until it hurts before I reach for channels these days.
Groxx · 6h ago
Well over half of the code I've ever seen that uses three or more channels (i.e. two semantic ones plus a cancellation or shutdown) has had serious flaws in it.

Granted, that generally means they're doing something non-trivial with concurrency, and that correlates strongly with "has concurrency bugs". But I see issues FAR more frequently when they reach for channels rather than mutexes. It's bad enough that I just check absolutely every three-chan chunk of code proactively now.

I lay part of the blame on Go's "♥ safe and easy concurrency with channels! ♥" messaging. And another large chunk at the lack of generics (until recently), making abstracting these kinds of things extremely painful. Combined, you get "just do it by hand lol, it's easy / get good" programming, which is always a source of "fun".

__turbobrew__ · 4h ago
Agreed. I especially think it was common to overuse channels when golang was younger as that was “the go way”. I think people have started to realize that channels are complex and a sharp abstraction and they should not be used frivolously.

I can’t think of the last time I actually wrote code which directly created channels. Of course things like contexts, tickers, etc are implemented with channels and I think that is ideally how they should be used — in well defined and self contained library code.

kunley · 2h ago
Totally different perspective here. Never dissapointed with channels, can't stand async.