Show HN: A PSX/DOS style 3D game written in Rust with a custom software renderer (totenarctanz.itch.io)

> I’ve been interested in speeding up allocations for quite some time. We know that calling a C function from Ruby incurs some overhead, and that the overhead depends on the type of parameters we pass.

> it seemed quite natural to use the triple-dot forwarding syntax (...).

> Unfortunately I found that using ... was quite expensive

> This lead me to implement an optimization for ... .

That’s some excellent yak shaving. And speaking up … in any language is good news even if allocation is not faster.

tenderlove · 117d ago

Thank you! Wish it had panned out for Class#new, but I don't feel bad about doing it. :)

hinkley · 117d ago

Giving new its own instruction makes sense.

N_A_T_E · 116d ago

Ruby keeps on getting better. I wouldn't hesitate to start new projects in ruby.

Alifatisk · 116d ago

Don't hesitate, I reach out for Ruby whenever I want to glue things together

alberth · 118d ago

Can someone explain, is YJIT being abandoned over the new ZJIT? [0]

And if so, will these YJIT features likes Fast Allocations be brought to ZJIT?

https://railsatscale.com/2025-05-14-merge-zjit/

tenderlove · 117d ago

It's not being abandoned, we're just shifting focus to evaluate a new style of compiler. YJIT will still get bug fixes and performance improvements.

ZJIT is a method based JIT (the type of compiler traditionally taught in schools) where YJIT is a lazy basic block versioning (LBBV) compiler. We're using what we learned developing and deploying YJIT to build an even better JIT compiler. IOW we're going to fold some of YJIT's techniques in to ZJIT.

> And if so, will these YJIT features likes Fast Allocations be brought to ZJIT?

It may not have been clear from the post, but this fast allocation strategy is actually implemented in the byte code interpreter. You will get a speedup without using any JIT compiler. We've already ported this fast-path to YJIT and are in the midst of implementing it in ZJIT.

ysavir · 117d ago

Thanks for all the work you all are putting into Ruby! The improvements in the past few years have been incredible and I'm excited to see the continuous efforts in this area.

FooBarWidget · 117d ago

Why is a traditional method based JIT better than an LBBV JIT? I thought YJIT is LBBV because it's a better fit for Ruby, whereas traditional method based JIT is more suitable for static languages like Java.

tenderlove · 117d ago

One reason is that we think we can make better use of registers. Since LBBV doesn't "see" all blocks in a particular method all at once, it's much more challenging to optimize register use across basic blocks. We've added type profiling, so ZJIT can "learn" types from the runtime.

whizzter · 117d ago

Are you piggybacking on LLVM for registerallocation and things like that or building from scracth?

I can imagine combining BBV type branching info with register tracing would add a lot of complicated structures, but I thought the BB's of BBV was more or less analogous to SSA blocks so no middle ground to be found? (consdering if there are many megamorphic sites in the standard library?)

pjmlp · 117d ago

Usual caveat that while Java is mostly static, is has dynamic runtime semantics inherited from Smalltalk and Objective-C, with dynamic class loading, bytecode generation, proxy classes, reflection, hence why the research work on Smalltalk and Strongtalk ended up being so useful for Hotspot.

pusewicz · 117d ago

But Aaron, what do you actually do here?!

I’m so glad to see your work, and it’s always such a treat to read any of your new posts. Hope to see upcoming ones more often!

strzibny · 117d ago

Awesome, thanks for all the good work on Ruby!

whizzter · 117d ago

Yjit is based on Maxime's basic block vesioning PHD work (a JS JIT), the approach both has taken is very dynamic-type-focused, it's a clever way to basically propagate type-info as the code is built. The main benefit is that you get a sane JIT fairly quickly and it should behave well in most dynamic typing scenarios.

They're pivoting (successfully?) to a more traditional way, letting the interpreter first profile the code (to figure out the types) and THEN produce entire methods with heavier optimizations that can do better register allocation.

The BBV approach is sane out of the box but kinda unfamiliar for many compiler writers (problems hiring?) and probably has some performance ceilings without much complexity.

The major question as to what method will win out depends on this question, how "monomorphic" or "polymorphic" is Ruby code in real life?

Monomorphic basically means that only one "real type"(from the compilers point of view) will ever pass a codepath (and thus extra machinery to allow multiple types won't bring much benefit).

ksec · 118d ago

>For this reason, we will continue maintaining YJIT for now and Ruby 3.5 will ship with both YJIT and ZJIT. In parallel, we will improve ZJIT until it is on par (features and performance) with YJIT.

I guess YJIT will always be faster in warmup and minimal increase of memory usage. ZJIT being more traditional should bring more speedup than YJIT.

But most of the speedup right now is still coming from rewriting C into Ruby.

uticus · 117d ago

> But most of the speedup right now is still coming from rewriting C into Ruby.

Quick glance, this statement seems backwards - shouldn't C always be faster? or maybe i'm misunderstanding how the JIT truly works

molf · 117d ago

C itself is fast; it's calls to C from Ruby that are slow. [1]

Crossing the Ruby -> C boundary means that a JIT compiler cannot optimize the code as much; because it cannot alter or inline the C code methods. Counterintuitively this means that rewriting (certain?) built-in methods in Ruby leads to performance gains when using YJIT. [2]

[1]: https://railsatscale.com/2023-08-29-ruby-outperforms-c/ [2]: https://jpcamara.com/2024/12/01/speeding-up-ruby.html

vidarh · 117d ago

Unless your JIT can analyse the full code, a transition between byte code and native code is often costly because the JIT won't be able to optimize the full path. Once your JIT generates good enough code, it then becomes faster to avoid that transition even in cases when in isolation native code might still be faster.

EDIT: Note that this isn't an inherent limit. You could write a JIT that could analyze the compiled C code too. It's just that it's much harder to do.

ksec · 117d ago

And that is what TruffleRuby did. I had wished there is a subset of Ruby that could be compiled to C. And then all gems should be written in that instead. I remember a few people tried but failed though. Have to dig up the old HN threads again.

vidarh · 117d ago

Compiling a subset of Ruby to C wouldn't be that hard, but making it compile to C that is fast enough to be worth it is. Not because the Ruby VM is particularly fast, but because the "naive" way of compiling Ruby to C still incurs almost all of the overhead.

E.g. TruffleRuby is fast in part because it will do things like try to avoid method calls for built in types where the standard operations haven't been overridden, but that requires a lot of extra machinery...

So I'm not sure how much compiling to C would help for gems that use C to speed things up.

I think maybe an easier target would be to compile C to a slightly augmented Ruby bytecode. If you control the C compiler you could do things like make C code follow the Ruby calling convention until/unless calling external C code, and avoid a lot of stack overhead.

pjmlp · 117d ago

Not everyone failed, see RubyMotion.

However they decided it was more useful as a commercial product.

nightpool · 117d ago

The sibling comments mention that C is used in a lot of places in Ruby that incur cross-language overheads, which is true, but it's also just true that in general, even ignoring this overhead, JIT'd functions are going to be faster then their comparable C functions, because 1) they have more profiling information to be able to work from, 2) they have more type information, and (as a consequence of 1&2) 3) they're more likely to be monomorphized, and the compiler is more able to inline specialized variants of them into different chunks of the code. Among other optimizations!

Jweb_Guru · 117d ago

If you give the JIT compiler unlimited time with the code, then maybe. For real large applications, optimized JIT compiled code tends to lag behind AOT optimized C or Rust code, though I guess you could argue that these differences are language / runtime related rather than compiler related.

uticus · 117d ago

> ...they have more profiling information to be able to work from... more type information... more likely to be monomorphized, and the compiler is more able to inline specialized variants of them into different chunks of the code.

this is fascinating to me. i always assumed C had everything in the language that was needed for the compiler to use. in other words, the compiler may have a lot to work through, but the pieces are all available. but this makes it sound like JIT'd functions provide more info to the compiler (more pieces to work with). is there another language besides C that does have language features to indicate to the compiler how to make things as performant as possible?

dhruvrajvanshi · 117d ago

A very simple way to think about is that if an intrinsic is written in C, the JIT can't easily inline it, whereas all ruby code can be inlined. Inlining is the most important optimization that enables other optimizations.

It's not necessarily the fact that C doesn't have enough information, it's just that the JIT can reason about Ruby code better than it can about C code. To the JIT, C code is just some function which does things and the only thing it can do with it is to call it.

On the other hand, a Ruby function's bytecode is available to the jit, so if it sees fit, it can copy paste the function body into the call site and eliminiate the function call overhead. Further, after the inlining, it can apply a lot of further optimizations across what was previously a function boundary.

In theory, you could have a way to "compile" the C intrinsics into the JIT's IR directly and that would also give you similar results.

foobazgt · 117d ago

JITs have runtime information that static compilers do not. Sometimes that's not a huge benefit, but it can often have massive performance implications. For example, a JIT can inline dynamically loaded code into your own code. That sounds unusual, but it's actually ultra-common in practice. For example, this shows up in something as mundane and simple as configurable logging.

MobiusHorizons · 117d ago

The c code in question is most likely interpreter code that is incredibly generic meaning it is very branchy based on data that is only known at runtime, and therefore can’t be optimized at compile time. Jit has the benefit of running the compiler at runtime when the data is known.

adgjlsfhk1 · 117d ago

C is actually a pretty hard language to compile well. C is a very weakly typed language (e.g. malloc returns a void* that the user manually casts to the type they intended), and exposes raw pointers to the user, which makes analysis for compilers really annoying.

Jweb_Guru · 116d ago

C also has lots of undefined behavior that lets compilers make assumptions they have a very hard time proving in safe languages. C++ takes this even further with stuff like TBAA. Sure it doesn't give the compiler as much to work with as something like Rust's pervasive restrict or Haskell's pervasive immutability, but on the other hand the compiler assuming things like "every array index is in bounds and infallible" exposes tons of opportunities for autovectorization etc. I think people overexaggerate how hard C is to optimize, at least compared to languages like Java and especially compared to languages like Ruby which let users do things like iterate through all the GC roots.

adgjlsfhk1 · 116d ago

UB is very much a double edged sword for compilers. On the one hand, it makes lots of simple optimizations much easier, but on the other, it makes lots of inter-procedural optimizations much harder (since the compiler must be incredibly careful not to introduce UB that the user didn't introduce themself).

There is no compiler that actually uses all of the things that the standard allows them to do (especially wrt atomics), because if they did, everyone's code would break, and figuring out which code transforms were legal would be ridiculously difficult.

> at least compared to languages like Java and especially compared to languages like Ruby

I hope you didn't take from my previous comment that I think Java is a good language from this perspective. The fact that Java gets even gets half decent performance is a miracle given how bad the JVM model is. Ruby is a language I'm really interested to try out since IMO it was the language that first managed a modicum of optimization with python-like expressiveness.

steveklabnik · 116d ago

C also has TBAA, by the way. Lots of people disable it though.

nightpool · 117d ago

It doesn't sound like YJIT is being abandoned at all. Reading between the lines, it sounds like they want to invest most of their new development right now into a less-experimental architecture that's closer to other JITs and is easier to develop, but that they consider this a somewhat risky endeavor and aren't sure whether this investment is going to pan out in the long run. So they're going to try ZJIT out, but YJIT and the ideas behind it are by no means abandoned. They're just taking a pause to see if a rewrite will make it easier to maintain or produce better results in the long term.

firemelt · 118d ago

after reading your source I'd say YJIT still there up until ZJIT is ready and on par with YJIT

and the features is there when its there

90s_dev · 118d ago

It seems to me like all languages are converging towards something like WASM. I wonder if in 20 years we will see WASM become the de facto platform that all apps can compile to and all operating systems can run near-natively with only a thin like WASI but more convenient.

berkes · 117d ago

Wasn't this the idea of the JVM?

hueho · 117d ago

Java bytecode was originally never intended to be used with anything other than Java - unlike WASM it's very much designed to describe programs using virtual dispatch and automatic memory management. Sun eventually added stuff like invokedynamic to make it easier to implement dynamic languages (at the time, stuff like Ruby and Python), but it was always a bit of round peg in square hole.

By comparison, WASM is really more like traditional assembly, only running inside a sandbox.

pjmlp · 117d ago

Just like CLR bytecode, IBM i TIMI bytecode and many others since 1958.

For some reason when people advocate for WASM outside of the browser, they only remember of the JVM.

90s_dev · 117d ago

I think so, but that was the 90s where we needed a lot more hindsight to get it right. Plus that was mostly just Sun, right? WASM is backed by all browsers and it looks like MS might be looking at bridging it with its own kernel or something?

bgwalter · 117d ago

I don't know. The integration of Java applets was way smoother than WASM.

Security wise, perhaps a different story, though let's wait until WASM is in wide use with filesystem access and bugs start to appear.

berkes · 115d ago

I understand why, but still lament that java applets where dropped like a hot potato, rather than solving the (fundamental) issues.

Back then, I learned Java, just to have fancy menus, quirky gimmicks and such. Until flash came along, nothing could do this. Where Java was rather open, free/libre, flash was proprietary and even patented. A big step back. And it took decades before JavaScript reached parity in possibilities to create such interactive, multimedia experiences in a cross-browserish way.

I can only imagine how much further along something like videoconferencing, realtime collaboration or gaming on the web would've been if this Java applet tech had been ever improving since inception.

(edit: I'm all for semantic, accessible ,clean HTML/CSS/JS in my web apps. But there's lots of use cases for gimmicks, fancy visuals, immersive experiences etc. and no, that's not the hotel-reservation-form or the hackers-forum. But art. Or fun. Or ?)

lloeki · 117d ago

> that was the 90s

In the meantime the CLR happened too.

And - to an extent - LLVM IR.

foldr · 117d ago

And of course the ill-fated Parrot VM associated with the Perl 6 project.

rhdjsjebshjffn · 117d ago

I think that was more of a language-oriented effort rather than runtime/abi oriented effort.

foldr · 117d ago

Parrot was intended to be a universal VM. It wasn’t just for Perl.

https://www.slideshare.net/slideshow/the-parrot-vm/2126925

rhdjsjebshjffn · 117d ago

Sure, I just think that's a very odd way to characterize the project. Basically anything can be universal vm if you put enough effort to reimplementing the languages. Much of what sets Parrot aside is its support for frontend tooling.

foldr · 117d ago

“The Parrot VM aims to be a universal virtual machine for dynamic languages…”

That’s how the people working on the project characterized it.

rhdjsjebshjffn · 117d ago

I certainly think the humor in parrot/rakudo (and why they come up today still) is how little of their own self image the proponents could perceive. The absolute irony of thinking that perl's strength was due to familiarity with text-manipulation rather than the cultural mass....

taf2 · 117d ago

It’s not a bad idea. Lot of the same people who worked on JVM were around while the asm - wasm ideas emerged

pjmlp · 117d ago

This has been an idea since UNCOL was discussed as idea back in 1958.

https://en.wikipedia.org/wiki/UNCOL

There are countless bytecode based platforms since 1958, including all famous Xerox PARC systems (the CPUs were microcoded and loaded the related translation code on boot), yet WASM is doing it first keeps being brought up.

Do you know what I call WASI containers running on a Kubernetes cluster, or serverless cloud vendors?

Application Server, https://en.wikipedia.org/wiki/Application_server

writebetterc · 117d ago

Multi-tenancy on the JVM makes me shudder, though. The general point you're making, thumbs up to that.

zerd · 117d ago

Like predicted in 2014 here: https://www.destroyallsoftware.com/talks/the-birth-and-death...

hinkley · 117d ago

> It’s very rare for code to allocate exactly the same type of object many times in a row, so the class of the instance local variable will change quite frequently.

That’s dangerous thinking because constructors will be a bimodal distribution.

Either a graph of calls or objects will contain a large number of unique objects, layers of alternating objects, or a lot of one type of object. Any map function for instance will tend to return a bunch of the same object. When the median and the mean diverge like this your thinking about perf gets muddy. An inline cache will make bulk allocations in list comprehensions faster. It won’t make creating DAGs faster. One is better than none.

munificent · 117d ago

> Any map function for instance will tend to return a bunch of the same object.

Yes, but if it ends up creating any ephemeral objects in the process of determining those returned objects, then the allocation sequence is still not homogeneous. In Ruby, according to the article, even calling a constructor with named arguments allocates, so it's very easy to still end up cycling through allocating different types.

At the same time, the callsite for any given `.new()` invocation will almost always be creating an instance of the exact same class. The target expression is nearly always just a constant name. That makes it a prime candidate for good inline caching at those callsites.

tenderlove · 117d ago

> Yes, but if it ends up creating any ephemeral objects in the process of determining those returned objects, then the allocation sequence is still not homogeneous.

Yes! People might do `map` transformations, but it's very common to do other stuff at the same time. Any other allocations during that transformation would ruin cache hit rate.

> At the same time, the callsite for any given `.new()` invocation will almost always be creating an instance of the exact same class. The target expression is nearly always just a constant name. That makes it a prime candidate for good inline caching at those callsites.

Yes again!

titzer · 117d ago

This is why it's imperative that inline caches learn and adapt to the observed behavior. As long as learning is cheap, identifies profitable cases effectively, and backs off for polymorphic and megamorphic scenarios, it's a win.

VM implementer intuition only goes so far, and as the internet is the greatest fuzzer invented, you're definitely going to encounter programs that break your best laid plans.

munificent · 116d ago

> This is why it's imperative that inline caches learn and adapt to the observed behavior.

True, but if you only have a single bottleneck cache site for all constructor invocations across the program, the only reasonable thing that callsite can learn is "wow, every single constructed class goes through here".

That's why it makes sense to have a separate cache at every `.new()` location.

titzer · 116d ago

Yeah, then you want context-sensitive ICs which are indexed by the callsite. JSC gets some of this by profiling in higher tiers, where inlining might have occurred.

masklinn · 117d ago

> One is better than none.

Not necessarily. An inline cache is cheap but it's not free, even less so when it also comes with the expense of moving Class#new from C to Ruby. It's probably not worth speeding up the 1% at the expense of the 99%.

> An inline cache will make bulk allocations in list comprehensions faster.

Only if such comprehensions create exactly one type of object, if they create two it's going to slow them down, and if they create zero (just do data extraction) it won't do anything.

hinkley · 117d ago

> Only if such comprehensions create exactly one type of object,

We just had this conversation maybe a month ago. If it’s 50-50 then you are correct. However if it’s skewed then it depends. I can’t recall what ratio was discovered to be workable, it was more than 50% and less than or equal to 90%.

firemelt · 118d ago

did it means more speeds to all rails/active records collections?

ksec · 118d ago

I know I may be jumping the gun a little here but I wonder what percentage speedup could we expect on typical rails applications. Especially with Active Record.

GGO · 117d ago

so far no diff here (https://speed.yjit.org/). But the build is from May 14 so maybe it will show up in new build?

tempest_ · 117d ago

At this point from the outside looking in Ruby is Rails at this point.

Lio · 117d ago

A well known counter example would be Stripe, who use Ruby but not Rails.

GGO · 117d ago

dup of https://news.ycombinator.com/item?id=44057476

bdcravens · 117d ago

There's no discussion there, so not much value other than imaginary internet points

Show HN: A PSX/DOS style 3D game written in Rust with a custom software renderer (totenarctanz.itch.io)

Show HN: I built a platform for long-form media recs (books, articles, etc.) (rhomeapp.com)

Show HN: STT –> LLM –> TTS pipeline in C (github.com)

Show HN: A store that generates products from anything you type in search (anycrap.shop)

Show HN: Coding AI Agent API for Developers (workser.ai)

Show HN: Pyproc – Call Python from Go Without CGO or Microservices (github.com)

Show HN: I reverse engineered macOS to allow custom Lock Screen wallpapers (cindori.com)

Show HN: Daffodil – Open-Source Ecommerce Framework to connect to any platform (github.com)

Show HN: I wrote a from-scratch OS to serve my blog (github.com)

Show HN: Omarchy on CachyOS (github.com)

Show HN: AI-powered web service combining FastAPI, Pydantic-AI, and MCP servers (github.com)

Show HN: HuMo AI – Create Realistic Videos with Text, Image, and Audio Inputs (humoai.co)

Show HN: Semlib – Semantic Data Processing (github.com)

Show HN: AI Code Detector – detect AI-generated code with 95% accuracy (code-detector.ai)

Show HN: Dagger.js – A buildless, runtime-only JavaScript micro-framework (daggerjs.org)

Show HN: Ghostpipe – Connect files in your codebase to user interfaces (github.com)

Show HN: Small Transfers – charge from 0.000001 USD per request for your SaaS (smalltransfers.com)

Show HN: npm-daycare, an NPM proxy that filters out recent & small packages (github.com)

Show HN: Quizquestions.org – A free library for quiz questions (quizquestions.org)

Show HN: MCP Server Installation Instructions Generator (hyprmcp.com)

Show HN: Scientific Calculator for Android (play.google.com)

Show HN: Datadef.io – Canvas for data lineage and metadata management (datadef.io)

Show HN: Drop-in Redis replacement in Rust with 5M+ GET/s (github.com)

Show HN: I built a decentralized protocol for predicting interest rate movement (kairosswap.com)

Show HN: Vicinae – A native, Raycast-compatible launcher for Linux (github.com)

Show HN: I built a tool to visually manage my LLM prompt templates and save them (promptcanvas.ml4den.com)

Show HN: I Collected Every Emoticon I Could Find – All Mood and Generator (emoticonhub.com)

Show HN: I built an app store for open-source financial plans (on spreadsheets) (finfam.app)

Show HN: I made a generative online drum machine with ClojureScript (dopeloop.ai)

Show HN: Ruminate – AI reading tool for understanding hard things (tryruminate.com)

Show HN: Term.everything – Run any GUI app in the terminal (github.com)

Show HN: Blocks – Dream work apps and AI agents in minutes (blocks.diy)

Show HN: Clean Clode – Clean Messy Terminal Pastes from Claude Code and Codex (cleanclode.com)

Show HN: Alyx, a caffeine tracker with no accountability (alyxcaffeinetracker.com)

Show HN: Universal single-letter project commands to speed up your CLI workflow (github.com)

Show HN: Ultraplot – A succint wrapper for matplotlib (github.com)

Show HN: Building a Deep Research Agent Using MCP-Agent (thealliance.ai)

Show HN: CLAVIER-36 – A programming environment for generative music (clavier36.com)

Show HN: HN Term – browse HN using the terminal (github.com)

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container (github.com)

Show HN: Bottlefire – Build single-executable microVMs from Docker images (bottlefire.dev)

Show HN: Making a cross-platform game in Go using WebRTC Datachannels (pion.ly)

Show HN: C++ Compiler Support Page (cppstat.dev)

Show HN: Haystack – Review pull requests like you wrote them yourself (haystackeditor.com)

Show HN: InfiniteTalk AI – AI Lip-Sync Video Generator for Long Videos (infinitetalk.net)

Show HN: Should v0.2.0 – debugging Go tests made easier (github.com)

Show HN: An MCP Gateway to block the lethal trifecta (github.com)

Show HN: Open Line Protocol – a minimal wire for AI agents (MIT) (github.com)

Show HN: Aris – a free AI-powered answer engine for kids (aris.chat)

Show HN: PaperSync, making ArXiv papers collaborative (hackcmu25.vercel.app)

Fast Allocations in Ruby 3.5

Comments (67)