TPDE: A Fast Adaptable Compiler Back-End Framework

60 npalli 15 6/2/2025, 1:17:54 AM arxiv.org ↗

Comments (15)

npalli · 1d ago
Source code for the framework

https://github.com/tpde2/tpde

MaskRay · 13h ago
Build instructions

In the llvm/llvm-project repository

    git switch origin/release/19.x
    cmake -GNinja -S. -B/tmp/out/custom -DLLVM_TARGETS_TO_BUILD='X86;AArch64' -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_PLUGINS=off -DCMAKE_BUILD_TYPE=Release -DLLVM_LINK_LLVM_DYLIB=on
    # consider -DCLANG_ENABLE_OBJC_REWRITER=off -DCLANG_ENABLE_STATIC_ANALYZER=off -DCLANG_ENABLE_ARCMT=off -DCLANG_PLUGIN_SUPPORT=off
    ninja -C /tmp/out/custom clang LLVM FileCheck   # build clang and libLLVM.so and test utilities

In the tpde repository

    git submodule update --init
    cmake -GNinja -S. -Bout/debug -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=on -DCMAKE_PREFIX_PATH=/tmp/out/custom -DCMAKE_CXX_COMPILER=$HOME/Stable/bin/clang++ -DCMAKE_C_COMPILER=$HOME/Stable/bin/clang
/Stable/bin/clang

There are some failures:

``` % /tmp/out/custom/bin/llvm-lit out/debug/tpde/test/filetest ... Failed Tests (5): TPDE FileTests :: codegen/eh-frame-arm64.tir TPDE FileTests :: codegen/eh-frame-x64.tir TPDE FileTests :: codegen/simple_ret.tir TPDE FileTests :: codegen/tbz.tir TPDE FileTests :: tir/duplicate_funcs.tir ```

aengelke · 12h ago
These are tests that use some more LLVM tools (llvm-objdump, llvm-dwarfdump, not). Could you try after building these tools in addition to FileCheck? Do the TPDE-LLVM tests, which use the same tools, pass with this setup?
BarakWidawsky · 1d ago
If this is a faster backend for LLVM, does it potentially obviate the niche Cranelift is optimizing for?
npalli · 1d ago
While they used Cranelift IR itself (amongst others, not just LLVM) to show performance improvements (thus making it complementary and not a replacement) you raise a good point. Quite possible it is not as full-featured yet so perhaps in the future, if at all.

The TPDE-based back-end compiles 4.27x faster than Cranelift and 2.68x faster than Cranelift with its fast register allocator, but is 1.74x slower than Winch

cfallin · 23h ago
They're hitting another design point on the compile time vs. code-quality tradeoff curve, which is interesting. They compile 4.27x faster than Cranelift with default (higher quality) regalloc, but Cranelift produces code that runs 1.64x faster (section 6.2.2).

This isn't too surprising to me, as the person who wrote Cranelift's current regalloc (hi!) -- regalloc is super important to run-time perf, so for Wasmtime's use-case at least, we've judged that it's worth the compile time.

TPDE is pretty cool and it's great to see more exploration in compiler architectures!

fooker · 1d ago
What makes this 'adaptable' and what makes this a 'framework'?

Seems like a pretty neat fast compiler backend for LLVM. Why the extra buzzwords?

t0b1 · 1d ago
TPDE is a framework for writing a back-end for various SSA IRs. TPDE-LLVM is an LLVM back-end written using TPDE, but TPDE itself is independent of LLVM. The paper also mentions back-ends written for Cranelift's IR and Umbra IR using TPDE.
xiphias2 · 1d ago
It's great start, but what would be cooler if they really went through the boring part, which is putting it into LLVM as the new default -O0 compiler.

Edit: LLM to LLVM

npalli · 1d ago
You mean LLVM, cause I was confused why you would put into an LLM (which one?)
xiphias2 · 1d ago
Sure, I meant LLVM
vlovich123 · 1d ago
> Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance

Wait - it’s 8-24x faster than O0 while producing code on par with O3???

ummonk · 1d ago
No, the generated code is on par with LLVM -O0. It's slower than LLVM -O1, never mind LLVM -O3.
wiz21c · 1d ago
I guess it doesn't include linking ? (which takes quite some time)
andyferris · 1d ago
One thing I never understood in this context here (fast JIT/debug builds/hot reloads/-O0) is why you would need much static linking. Generally your modules are going to have a DAG relationship. Even code inside a large compilation unit could potentially be factored out (automatically) into smaller modules. Could you not just generate a bunch of small dynamically linked libraries? Would the system dynamic loader become the speed bottleneck? Even if so, wouldn't reloading just a portion of the DAG in a hot-reload context be much faster than linking everything beforehand?