These are tests that use some more LLVM tools (llvm-objdump, llvm-dwarfdump, not). Could you try after building these tools in addition to FileCheck? Do the TPDE-LLVM tests, which use the same tools, pass with this setup?
BarakWidawsky · 1d ago
If this is a faster backend for LLVM, does it potentially obviate the niche Cranelift is optimizing for?
npalli · 1d ago
While they used Cranelift IR itself (amongst others, not just LLVM) to show performance improvements (thus making it complementary and not a replacement) you raise a good point. Quite possible it is not as full-featured yet so perhaps in the future, if at all.
The TPDE-based back-end compiles 4.27x faster than Cranelift and 2.68x faster than Cranelift with its fast register allocator, but is 1.74x slower than Winch
cfallin · 23h ago
They're hitting another design point on the compile time vs. code-quality tradeoff curve, which is interesting. They compile 4.27x faster than Cranelift with default (higher quality) regalloc, but Cranelift produces code that runs 1.64x faster (section 6.2.2).
This isn't too surprising to me, as the person who wrote Cranelift's current regalloc (hi!) -- regalloc is super important to run-time perf, so for Wasmtime's use-case at least, we've judged that it's worth the compile time.
TPDE is pretty cool and it's great to see more exploration in compiler architectures!
fooker · 1d ago
What makes this 'adaptable' and what makes this a 'framework'?
Seems like a pretty neat fast compiler backend for LLVM. Why the extra buzzwords?
t0b1 · 1d ago
TPDE is a framework for writing a back-end for various SSA IRs.
TPDE-LLVM is an LLVM back-end written using TPDE, but TPDE itself is independent of LLVM.
The paper also mentions back-ends written for Cranelift's IR and Umbra IR using TPDE.
xiphias2 · 1d ago
It's great start, but what would be cooler if they really went through the boring part, which is putting it into LLVM as the new default -O0 compiler.
Edit: LLM to LLVM
npalli · 1d ago
You mean LLVM, cause I was confused why you would put into an LLM (which one?)
xiphias2 · 1d ago
Sure, I meant LLVM
vlovich123 · 1d ago
> Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance
Wait - it’s 8-24x faster than O0 while producing code on par with O3???
ummonk · 1d ago
No, the generated code is on par with LLVM -O0. It's slower than LLVM -O1, never mind LLVM -O3.
wiz21c · 1d ago
I guess it doesn't include linking ? (which takes quite some time)
andyferris · 1d ago
One thing I never understood in this context here (fast JIT/debug builds/hot reloads/-O0) is why you would need much static linking. Generally your modules are going to have a DAG relationship. Even code inside a large compilation unit could potentially be factored out (automatically) into smaller modules. Could you not just generate a bunch of small dynamically linked libraries? Would the system dynamic loader become the speed bottleneck? Even if so, wouldn't reloading just a portion of the DAG in a hot-reload context be much faster than linking everything beforehand?
https://github.com/tpde2/tpde
In the llvm/llvm-project repository
In the tpde repository /Stable/bin/clangThere are some failures:
``` % /tmp/out/custom/bin/llvm-lit out/debug/tpde/test/filetest ... Failed Tests (5): TPDE FileTests :: codegen/eh-frame-arm64.tir TPDE FileTests :: codegen/eh-frame-x64.tir TPDE FileTests :: codegen/simple_ret.tir TPDE FileTests :: codegen/tbz.tir TPDE FileTests :: tir/duplicate_funcs.tir ```
The TPDE-based back-end compiles 4.27x faster than Cranelift and 2.68x faster than Cranelift with its fast register allocator, but is 1.74x slower than Winch
This isn't too surprising to me, as the person who wrote Cranelift's current regalloc (hi!) -- regalloc is super important to run-time perf, so for Wasmtime's use-case at least, we've judged that it's worth the compile time.
TPDE is pretty cool and it's great to see more exploration in compiler architectures!
Seems like a pretty neat fast compiler backend for LLVM. Why the extra buzzwords?
Edit: LLM to LLVM
Wait - it’s 8-24x faster than O0 while producing code on par with O3???