Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory

93 dariobalinzo 38 5/29/2025, 1:08:17 PM dariobalinzo.medium.com ↗

Comments (38)

koito17 · 17h ago

> Without any back-pressure, the program kept stuffing memory with pending results.

The key phrase in the whole article.

This is the reason I am not fully confident in the "pervasive async" mindset overtaking web development. In JavaScript, pervasive async makes sense, because all functions eventually return control to a central event loop anyway. In environments with real threading, the move from OS threads to "green threads and async everywhere" implicitly gives up the tried-and-tested scheduler of your OS and its mechanisms to handle back-pressure. Developers must now take full responsibility for avoiding buffer bloat.

> they fundamentally change how we think about concurrency limits and resource management.

Virtual threads make concurrency cheaper, but nothing about them (or any other concurrency abstraction) eliminates the need for flow control. By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler. It's now your responsibility to put safeguards back in.

As an aside: in Clojure's core.async, channel limits (like 1024 pending puts) are intentionally there as a warning sign that your async pipeline isn't handling back-pressure correctly. This is exactly the "hyperactive downloader with no brakes" problem described in the article. With core.async, you would have eventually hit an exception related to excessive puts on a channel, rather than silently expanding buffers until resource exhaustion.

pron · 17h ago

> By switching to virtual threads, you traded off the safeguards once provided by your (bounded!) thread pool and OS scheduler.

The OS scheduler provides no safeguards here (you can use the thread-per-task executor with OS threads - you'll just get an OOME much sooner), and as the virtual thread adoption guide says, you should replace bounded thread pools with a semaphore when needed because it's just a better construct. It allows you to limit concurrency for different resources in an easy and efficient way even when different threads make use of different resources.

In this case, by switching to virtual threads you get to use more appropriate concurrency constructs without needing to manage threads directly, and without sharing them among tasks.

never_inline · 11h ago

Stupid question: Why not provide a threadpool-like construct which may not necessarily keep threads around but limits their number?

pron · 6h ago

What you usually want to limit isn't the number of threads but the number of threads doing a particular operation, such as accessing some service, so it's more flexible (and efficient) to just guard that particular operation with a semaphore.

The only place where you may want to limit the number of threads is at the entry to the system, but since the semaphore is the construct for limiting the number of anything, you might as well just use that there, too. For example, if the number of requests currently being processed is too high (above the semaphore's number of leases), you don't accept new connections.

andersmurphy · 3h ago

Good, question. The simple answer is there already is such a construct on the JVM already: the semaphore. People just forget it exists. I wrote an article about it a while back[1].

1. Managing throughput with virtual threads (it's in Clojure, but it's using the underlying Java APIs)

https://andersmurphy.com/2024/05/06/clojure-managing-through...

jeroenhd · 10h ago

The amount of actual threads is limited, by default to the amount of CPU cores the system has. The problem isn't the amount of threads itself, but the fact that the threadpool will keep taking on more work, and that it's efficient enough to blow past the CPU+I/O bottleneck. You can achieve the same problem with a threadpool if your threadpool is efficient enough.

Standard concurrency tools like semaphores are capable of reducing the amount of work being done at the same time. You could also simulate classic thread pools by using concurrent lists/sets/etc to replicate traditional work queues, but you'd probably lose any advantage you'd gained from switching to green threads in the first place.

708145_ · 7h ago

Of course it possible to limit the number of virtual threads. A web server can have a limit on number of virtual threads too, and queue incoming request before dispatching to to workers (virtual threads).

As other have said, this can be achieved with a semaphore and the bulkhead pattern. You can also limit the number of number of connections.

I would expect any "production ready" web server using virtual threads having some sort of limiting. That must be the case right?

ndriscoll · 15h ago

Wouldn't there be significant overhead in waking tasks from IO just to put them back to sleep on a semaphore vs. there being fewer threads servicing the IO ring for a given type of task?

mdavidn · 14h ago

Presumably the semaphore would be used to limit the number of concurrent virtual threads.

Incoming tasks can be rejected proactively if a system would overflow its limit on concurrent tasks. This approach uses a semaphore’s non-waiting acquire and release operations. See the “bulkhead pattern.”

Alternatively, if a system has a fixed number of threads ingesting tasks, these would respond to backpressure by waiting for available permits before creating more virtual threads.

pron · 6h ago

It should be the other way around. The concurrency-limited resource is usually accessed by IO (some service), so a thread would wait and then acquire the semaphore before doing the IO.

ndriscoll · 3h ago

I might just be taking the specific use-cases you're talking about for granted? I tend to think the reason you'd use task-specific thread pools in the first place is because the contended resource is CPU and you want different priorities for different types of tasks. E.g. if you have different types of long-lived connections (e.g. websockets) and want to make sure that even if you have 100 type A connections and 50,000 type B connections, you want ~even CPU to be split between A and B. You could use semaphores, but then you're usually waking a B (in response to some socket IO you were waiting on) just to put it to sleep. It seems like it'd make more sense to use two platform thread pools (with the different thread groups waiting on different sockets)?

Is your advice more about things like object pools or connection pools?

pron · 2h ago

The CPU is only contended once it's at 100%. I am not aware of any interactive server (as opposed to batch processing) that behaves well at 100% CPU (beyond short bursts), so I think that the hope you can actually balance the CPU using the OS scheduler (which isn't all that good) to produce good outcomes at 100% CPU is more myth than reality. So yes, I'm talking about resources accessed via IO, because there isn't all that much you can do about a CPU that's at 100% for long durations other than getting more CPU (horizontally or vertically).

However, when you have some background batch computational tasks of some limited concurrency (otherwise you'd be in trouble) then you should have some low-priority platform threads servicing that. Virtual threads work due to Little's law by allowing a high number of threads. For such low-priority background tasks you probably want a very low number of threads (often one), so the ability to have tens of thousands, or millions, of virtual threads won't help you for those background tasks anyway.

tombert · 17h ago

At my last job I made a thing that took in tens of millions of records from Kafka, did some processing, plopped stuff into Postgres, and moved along. It wasn't hard, but getting throughput was difficult to achieve my throughput goals with regular blocking JDBC.

Eventually I moved to an async-non-blocking model, and for my initial tests it went great, stuff returned almost immediately and I could grab the next one. The problem is that, sort of by definition, non-blocking stuff doesn't really know when it's going to complete right away, so it just kept enqueuing stuff until it ate all my memory.

I figured out pretty early on what was happening, but the other people on my team really seemed to struggle with the concept of backpressure, I think because they weren't really used to a non-blocking model. You kind of get backpressure for free with blocking code.

Eventually I was able to solve my problem using a BlockingQueue,

PaulKeeble · 17h ago

It's not really any different to the limitations of memory that we have always dealt with, it just consumes it faster than we might expect. The techniques we use for ensuring we don't run out of memory also apply to anything that can make threads and we have to put limitations on how many can exist, especially since they consume quite a bit of memory even in their virtual thread variant.

wyago · 4h ago

This feels analogous to saying that static memory management might be superior, because if dynamic memory is available developers can trivially produce an OOM.

It's possible we'll start borrowing more patterns from Erlang/OTP since it's been living in a post-greenthreads world for as long as many languages have existed, and has developed patterns around it.

jandrewrogers · 12h ago

Proper async architecture requires designing a scheduler appropriate for your workload. Many (most?) async applications ignore or skip this part and try to delegate it to a runtime or the OS, usually with mediocre results. Designing a robust scheduler usually requires excellent visibility into the real-time distribution of available resources so that you can dynamically schedule load away from the resources under the most pressure at every instant in time. If you aren’t going to do that work, async may not be the best architecture choice.

wahern · 14h ago

The OOM error is the back pressure. This is why some engineers working on highly concurrent frameworks insist on pervasive OOM recovery.

I haven't done any Java in years, but I always thought OOM errors were recoverable in the JVM. Or is this just a case where the author never thought to catch OOM exceptions? My instinct would be to catch OOM early (i.e. in entry function) in a virtual thread and re-queue the task. In theory re-queueing might fail, too, I guess, but in practice probably not.

This is why I like to code in C (and Lua) for this kind of thing, as in C my core data structures like lists and trees don't require additional allocations for insert. My normal pattern here would be to have, say, three job lists--todo, pending, complete; a job would always exist in one of those lists (presuming it could be allocated and initialized, of course), and no allocations would be required to move a job back from pending to todo on an OOM error during processing.

koito17 · 13h ago

In Java, there are two kinds of Throwable instances[1]: Error and Exception. As the name suggests, OutOfMemoryError is a subclass of Error. In contrast to Exception, an Error "indicates serious problems that a reasonable application should not try to catch"[2]. For this reason, it's considered bad practice in Java to catch all Throwable instances (or catch Error instances explicitly).

> My instinct would be to catch OOM early

OutOfMemoryError subclasses VirtualMachineError, and when a VirtualMachineError is thrown, your program seems to be firmly in undefined behavior territory. Quoting the JVM specification [3]:

  A Java Virtual Machine implementation throws an object that is an instance of a subclass of the class VirtualMachineError when an internal error or resource limitation prevents it from implementing the semantics described in this chapter. This specification cannot predict where internal errors or resource limitations may be encountered and does not mandate precisely when they can be reported.

For context: "this chapter" refers to the whole chapter describing the behavior of each JVM instruction. The specification seems to suggest that all bets are off on any guarantees the JVM makes by the time a VirtualMachineError is thrown.

[1] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[2] https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

[3] https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-6.h...

fulafel · 10h ago

Is this a hole in the safety guarantees?

PhilipRoman · 7h ago

Safety can mean many things. I'm fairly sure JVM will keep the C-level memory safety guarantee after an OOM, but you can still end up with Java objects in inconsistent states that will start throwing exceptions when you poke them.

The reason why these exceptions are so "unsafe" is that they can occur at any moment. Maybe you were in the middle of modifying some data structure which will enter an infinite loop the next time you touch it. It's a bit like signals in C where you have to be extremely careful about the APIs you use, except worse because exceptions unwind the stack. At least in C you can still safely call kernel syscalls regardless of userspace state.

fulafel · 4h ago

Yep, it sounds ambiguous. So if there's potentially unsafety allowed in the semantics it would already be interesting. If it turns out that the specification permits "all bets are off" in OOM, eg if it can create conditions where an adversary can gain control of execution in some way, like can happen with synchronisation bugs on some (eg Go) runtimes.

unscaled · 13h ago

You can catch OutOfMemoryErrors in Java, but it wouldn't be as simple as you describe. In a real-world program, you may be doing other things besides downloading URLs and feeding the URLs to the download scheduler. Even though the URL downloader is what's causing the memory pressure, an OutOfMemoryError may very well be thrown by an allocation anywhere else. _You only have one shared heap_, so unless there is only one single thing that is being allocated on the heap you cannot use it as.

Coming from C, you might think you can just avoid heap allocations for anything that you don't manage with backpressure, but that doesn't work in Java, since Java can only store primitives and references on the stack (at least until project Valhalla is delivered[1]). Almost everything you do triggers a heap allocation.

Virtual Threads make this even worse, since their "stack" is also stored on the heap and it can grow dynamically and require more heap allocations even if you just add a primitive object on the stack!

tl;dr: No, you cannot rely on catching OutOfMemoryError in Java in anything but the simplest scenarios and things break down when you have multiple threads and especially virtual threads.

[1] https://openjdk.org/projects/valhalla/

3cats-in-a-coat · 17h ago

Everything, always, returns to a main event loop, even the actual hardware threads on your CPU. So JavaScript developers don't get a pass on this.

The problem is you keep creating async requests, they need to be stored somewhere, and that's in memory. Thread or no thread, it's in memory.

And backpressure is essential, yes. Except for basic tasks.

But there is another solution, which is coalescing requests. I.e. in my current project, I combine thousands of operations, each of which could've been its own async request, into a single "unit of work" which then results in a single async request, for all operations combined.

If we treat IO seriously, we won't just fire off requests into the dark by the hundreds. We'd think about it, and batch things together. This can be done manually, but it's best done at a lower level, so you don't have to think about it at the business logic level.

jsd1982 · 2h ago

The obvious solution to me is to implement streaming/buffered processing of the content while downloading it instead of downloading the entire content into memory to be processed in a single contiguous byte[].

Buffered IO would have the CPU processing acting as a natural backpressure mechanism to prevent downloading too much content and also prevent unbounded memory allocation. Each CPU worker only needs to allocate a single small buffer for what it processes and it can refill that same buffer with the next IO request. Your memory usage becomes entirely predictable and will only scale with how many concurrent threads you can actually execute at once.

Also, no matter how you artificially rate limit the virtual thread scheduling (e.g. via semaphore), if you still insist on downloading the entire content into memory before starting processing then obviously you cannot process any single piece of content larger than what can fit into available memory at any given time.

Copenjin · 11h ago

Async, green threads, etc... can be useful tools but are never the out of the box solution for not having to think about concurrency and program flow. Even if people keep thinking they are. Learn how to reason about concurrency, there is no way to avoid it.

hocuspocus · 5h ago

Jox provides buffered channels with a bounded capacity by default, if you want a more high-level API to implement this kind of things using virtual threads.

https://github.com/softwaremill/jox

fjasdfwa · 17h ago

> Virtual Threads Ate My Memory

From what I see, It's a developer trade off.

If you want to save on memory then you need to drop down to callbacks, promise chains, or async-await sugar that is a compiler transform to a state machine.

But if you do that, then you will write a blog complaining about the boilerplate!

I think Zig had an ability to kind of give you the best of both worlds. A zig expert would need to correct me on this.

written-beyond · 7h ago

I thought rusts async compiles into a state machine.

marginalia_nu · 6h ago

I think it's a mistake to think of virtual threads as "threads, but magically better". They have real drawbacks, and aren't suitable in every scenario.

A crawler is a textbook case where you get more out of a bounded (but large) threadpool than virtual threads, because the boundedness is a feature. Even with infinite RAM, crawling from as many connections as possible leads to a ton of network congestion, and ironically, will slow down your crawling speed.

kburman · 10h ago

Its a design problem not virtual threads problem.

CactusRocket · 8h ago

Yes, I am a bit surprised too. The author changed their solution from a bounded resource limit to unbounded. Of course you're going to run into problems sooner or later then. Regardless of how lightweight the (virtual) threads implementation is.

kburman · 7h ago

Exactly. It's Virtual Thread, not Magic Thread. Just because they’re lightweight doesn’t mean you can throw out all notions of flow control or resource management.

geodel · 17h ago

I used VThreads recently with good results I should say. In my case I limited concurrency by a blocking queue of concurrent tasks. So number of virtual threads does not keep rising based of millions of input records I was processing.

I found for me keeping VThreads about ~100X of concurrent call to external resources was kind of sweet spot for overall throughput.

unscaled · 12h ago

This article focuses about memory, but even if memory wasn't an immediate issue (e.g. if you had smaller pages and didn't restrict the heap size to 1GB), you still want backpressure in this case.

What really happens here is that we've got two operations:

1. An I/O-bound operation (downloading URLs). 2. A CPU-bound operation (processing downloaded URLs).

Memory is a limited resource that requires you to introduce backpressure into the system, but so is the CPU. If downloads were fast and memory usage was relatively low (compared to the heap size), this program would have run into another issue: Thousands of lightweight threads all competing for the same small number of CPU cores when processing their downloaded data.

In practice, in most cases CPU-bound code in virtual threads would just introduce a new bottleneck, without proper back pressure. If the processing doesn't have any further I/O operations involved, the virtual threads will keep queuing up until the physical threads will finish processing everything (Java virtual threads, unlike goroutines in Go, cannot be preempted by the scheduler). This will just create more opportunities for an OOM, with virtual threads that keep piling up and waiting to be processed.

But even if you introduce a backpressure mechanism like a semaphore or BlockingQueue when queuing up the URLs, things can go bad for you here if your app has to do other things besdies processing URLs.

For instance, assume you've got a batch of 3000 URLs to process and each of these URLs takes 10 seconds to process. Let's say you're running everything on a CPU with 8 logical cores. Even if fully utilizing all of your CPU cores, you'll need about an hour to process everything. During that time any other I/O bound task that you do may be starved, because each time a URL processing task gets a hang of a carrier thread (the actual OS thread running virtual threads), it blocks it until processing is finished, and no other virtual threads can be scheduled on it! So in the scenario described here, you might quickly run out of carrier threads and all other I/O-bound operations would starve out waiting for available carrier threads.

If you've got many long-running CPU-bound tasks, it's often better to keep a traditional dedicated thread pool (ExecutorService.newFixedThreadPool()) just for these tasks.

ivolimmen · 4h ago

Nice that people can now switch to this threading model so that programs can easily process things in paralell but if you do a lot of memory intensive processing it is not that usefull. Focus on less memory usage and then use more threads.

swsieber · 17h ago

Good timing. I was actually reading the official docs the other day on virtual threads, and it has a big section decidicated to rate limiting virtual threads:

"The hardest thing to internalize about virtual threads is that, while they have the same behavior as platform threads they should not represent the same program concept."

...

"Sometimes there is a need to limit the concurrency of a certain operation. For example, some external service may not be able to handle more than ten concurrent requests. Because platform threads are a precious resource that is usually managed in a pool, thread pools have become so ubiquitious that they're used for this purpose of restricting concurrency, "

...

"But restricting concurrency is only a side-effect of thread pools' operation. Pools are designed to share scarce resources, and virtual threads aren’t scarce and therefore should never be pooled!

"When using virtual threads, if you want to limit the concurrency of accessing some service, you should use a construct designed specifically for that purpose: the Semaphore class."

...

"Simply blocking some virtual threads with a semaphore may appear to be substantially different from submitting tasks to a fixed thread pool, but it isn't. Submitting tasks to a thread pool queues them up for later execution, but the semaphore internally (or any other blocking synchronization construct for that matter) creates a queue of threads that are blocked on it that mirrors the queue of tasks waiting for a pooled thread to execute them. Because virtual threads are tasks, the resulting structure is equivalent"

"Even though you can think of a pool of platform threads as workers processing tasks that they pull from a queue and of virtual threads as the tasks themselves, blocked until they may continue, the underlying representation in the computer is virtually identical. _Recognizing the equivalence between queued tasks and blocked threads will help you make the most of virtual threads._" (emphasis mine)

So it was cool seeing him arrive at the same conclusion (or he read the docss). Either way it was a nice timely article.

Link to the docs: https://docs.oracle.com/en/java/javase/21/core/virtual-threa...

geodel · 15h ago

Well 6 hours of debugging can save 5 min of reading the docs.

palmfacehn · 6h ago

There's also the comfort which exclusively comes from knowing with certainty how your tools work, are intended to be used and the conditions which they will fail. This sentiment is lost on the AI coding proponents.