Show HN: Brainfuck to RISC-V JIT compiler written in Zig

5 0x000xca0xfe 3 5/25/2025, 12:34:09 PM github.com ↗

Hi everybody,

this was my project to learn Zig and RISC-V+x86_64 assembly.

Not sure if anybody is actually interested in yet another Brainfuck compiler, so I'll just write up some random things I learned while building it!

- A primitive assembly stitching compiler is 10x faster than the interpreter. Did not expect that.

- The generated x86 code is really bad (e.g. it always uses 6 or 7 byte sized instructions with 32-bit immediates when there are much smaller ones) but it doesn't really matter. Good code generated by GCC and clang for transpiled Brainfuck->C is not much faster as it's bottlenecked by memory accesses anyways.

- Zig is pretty far along actually. You can make serious projects with it!

- But the community seems to like self-punishment. Unused parameters and variables are hard errors and there is no way to disable that even for debug builds. Makes quickly commenting out part of the code a real PITA.

- I've had a miscompilation due to std.mem.span being broken and two source code breaks going from Zig 0.13 to 0.15 (std.mem.page_size got removed and ArrayList.popOrNull as well).

- But arbitrary size integers are fantastic! And well-defined two's complement behaviour!

Here is for example the code that encodes the c.beqz instruction:

  /// Branch if Equal to Zero (compressed): c.beqz rs1', offset -> beq rs1, x0, offset
  pub fn c_beqz(text: *std.ArrayList(u8), rs1: RV_X, offset: i9) !void {
      std.debug.assert(is3BitReg(rs1));
      std.debug.assert(@mod(offset, 2) == 0);
      const imm: u9 = @bitCast(offset);
      const RV_CB = packed struct(u16) {
          op: u2,
          offset5: u1,
          offset1_2: u2,
          offset6_7: u2,
          rsd_rs1_: u3,
          offset3_4: u2,
          offset8: u1,
          funct3: u3,
      };
      const ins = RV_CB {
          .op = 0x1,
          .offset5 = @truncate(imm >> 5),
          .offset1_2 = @truncate(imm >> 1),
          .offset6_7 = @truncate(imm >> 6),
          .rsd_rs1_ = @truncate(@intFromEnum(rs1) - 8),
          .offset3_4 = @truncate(imm >> 3),
          .offset8 = @truncate(imm >> 8),
          .funct3 = 0x6,
      };
      try appendInstruction(text, u16, @bitCast(ins));
  }

This is really nice as all the exotic integer sizes are actually checked, too.

- Zig support for Windows is good. Porting the project to Windows was very easy.

- When the RISC-V registers are carefully chosen, almost all instructions could be compressed in this projects.

- Compressed instructions and good branching code (using the branch instructions directly when the jump range is small enough instead of branching over a larger jump instruction) did not noticeably change performance on real hardware (OrangePi RV2).

- But somehow QEMU got a massive boost from that. Not sure why exactly.

So, that's about it!

I hope at least something was interesting...

Comments (3)

sylware · 9h ago

thumbs up for this project (everything RISC-V is usually).

I write rv64 assembly (nearly core only, without memory reservation instructions) and run it on x86_64 with a very small (x86_64 assembly written) interpreter.

And your are right, I have had thoughts about a "RISC-V" x86_64 compiler (but it will probably require some runtime unfortunately).

Hopefully, rv22+ hardware with ultra-performant µ-architecture and with the latest silicon process will happen sooner than we expect. One less PI toxic lock and cleaner, _really standard_ assembly (the end game of much software).

0x000xca0xfe · 9h ago

Yeah I can't wait for a performant RISC-V core. Runtime code generation is so easy for RISC-V. I have many ideas or projects where I'd like to use it but it feels kinda pointless when JITed RISC-V machine code on current hardware gets destroyed by any half-decent x86 PC or Mac running naive C code.

sylware · 7h ago

Well, here are the tricks: interpreted rv64 assembly will be "slow"... actually "slower" than x86_64 native code... but in many execution contexts, for many pieces of software, here the first trick: the "slow" interpreted rv64 assembly machine code will be... "fast" enough... The 2nd trick: I have control on my rv64 machine interpreter, and I can write native x86_64 acceleration assembly along side of a rv64 reference implementation (I planned to do just that for my CPU renderer in my wayland compositor... actually I have already AVX2 code for some of that, even though the sweet spot is AVX512, but don't have the hardware for this, yet).

And once we have this rv64 shiny hardware, certainly won't be a drop-in, but the distance to code will be minimal.

One important SDK thing: I am careful at using the smallest number of rv64 machine instructions (we tend to forget 'R' in "RISC-V" means 'R'educed...), and I use basic, really basic, C preprocessors instead of the assembler preprocessor in order to decouple the assembly code from a specific assembler preprocessor. I don't even use assembler pseudo-instructions, or ABI register names, neither compressed machine instructions.

On top of that: I don't use ELF, I use a super minimal executable/system interface dynamic shared library format of my own, omega idiotically simple, which I wrap in ELF binaries for transparent support. People have to come to realize, ELF complexity, for a executable/system interface dynamic shared library is utterly and completely obsolete, even a liability once you are looking for binary stability in time (cf games), proven over more than the last decade.