Show HN: (bits) of a Libc, Optimized for Wasm
To compile SQLite, I use wasi-sdk, which uses wasi-libc, which is based on musl. It's been said that musl is slow(er than glibc), which is true, to a point.
musl uses SWAR on a size_t to implement various functions in string.h. This is fine, except size_t is just 32-bit on Wasm.
I found that implementing a few of those functions with Wasm SIMD128 can make them go around 4x faster.
Other functions don't even use SWAR; redoing those can make them 16x faster.
Smooth sort also has trouble pulling its own weight; a Shell sort seems both simpler and faster, while similarly avoiding recursion, allocations and the addressable stack.
I found that using SIMD intrinsics (rather than SWAR) makes it easier to avoid UB, but the code would definitely benefit from more eyeballs.
See this for some benchmarks on both x86-64 and Aarch64: https://github.com/ncruces/go-sqlite3/actions/runs/145169318...
string.h is missing strstr(), there's an algorithm of similar complexity you might consider: http://0x80.pl/notesen/2016-11-28-simd-strfind.html
If there's interest, the set of implemented functions can definitely be extended.
I've also only really tested wazero. I can't know for sure that this is a straight improvement for other runtimes and architectures.
For instance, the code delays using wasm_i8x16_bitmask as much as possible, because on Aarch64 it can be slower than not using SIMD at all, whereas it's plenty fast on x86-64.
One of the nice things about Go is how much that's a solved issue out of the box, compared to almost everything else; certainly compared to C.
Pinging them in an issue: https://github.com/WebAssembly/wasi-libc/issues/580
It took me a lot longer than it should have to put together this basic module, and even then there's this shared library I had to download to build it, and I couldn't figure out why this requires a libc:
https://github.com/cedws/wasm-wit-test
To answer your question, it needs a libc because you're including stdlib.h, and exporting and allocator (even if you're not otherwise using it). You need a libc for malloc.
This is generally a good idea, if you need to send anything beyond numbers across the API (e.g. you need an allocator if you want to send strings as pointers).
I never used WIT, so I have no idea if this a requirement for WIT.
it's kinda frustrating to compile sqlite for wasm. can be done but quite troublesome.
It's generally used for techniques that apply SIMD principles within general-purpose registers and instructions.
Assume you've loaded a 64-bit register (a uint64_t) with 8 bytes (unsigned char) of data. Can you answer the question “is any of these 8 bytes zero (the NUL terminator)?”
If you find a cheap way to do it, you can make strlen go faster by consuming 8 bytes at a time.
Et voilà:
1. Client side browser polyglot "applets" (Java applets were ahead of their time IMO)
2. Server side polyglot "servlets" (Node.js, embedded runtimes, etc.)
3. Language interop/FFI (Lang A -> WASM -> Lang B, like wasm2c)
Why is #3 so interesting? The hardest thing in language conversion is the library calls. WASI standardizes that, so all the proprietary libs will eventually compile down to WASI as a sort of POSIX/libc like layer. In addition, WASM standardizes calling convention. The resulting new source code may not look like much, but it will solve the FFI calling convention/marshalling/library issues nicely.
C calling conventions are already the standard for FFI in native code, and that means dropping down to what can be expressed in C if you want to cross that boundary.
It's not a panacea, though; it introduces other issues.