This is a strange and dangerous thing to try to do from assembly. In particular, all these details about write barriers being hand-coded in the assembly are subject to change from release to release.
Better to structure your code so that you do the pointer manipulation (and allocation) in Go code instead, and leave assembly only for what is absolutely necessary for performance (usually things like bulk operations, special instructions, and so on).
titzer · 2h ago
> Better to structure your code so that you do the pointer manipulation (and allocation) in Go code instead, and leave assembly only for what is absolutely necessary for performance (usually things like bulk operations, special instructions, and so on).
While I generally agree with this, one way to mitigate the maintenance issue is to offer a macro assembler instruction that performs a write barrier that is kept up to date with what the Go compiler emits. If it the compiler itself uses that macro assembler method, it's already pretty easy.
After many years of racking my brain on how to make each of the things "blessed and bulletproof", I've realized that systems languages will inevitably be used by people building high performance systems. People doing this will go to crazy lengths and low-level hackery that you can't anticipate, e.g. generating their own custom machine code[1], doing funky memory mapping tricks, etc. The boundaries between that machine code and your language are always going to run into representational issues like write barriers and object layouts.
[1] Pretty much all I do is language VMs, and there is no way around this. It will have a JIT compiler.
corsix · 3h ago
In this case, "atomic 128-bit store" is the special instruction, with the twist that half of those 128 bits contain a pointer.
My understanding is that for crypto specifically it constant-time algorithms matter due to security implications, and those are only available when you use specific branchless assembly instructions, so it's not just performance
foobiekr · 4h ago
Just to pick nits, the important thing is basically no secret-dependent { branches, loop bounds checks, memory accesses }. This is a lot more complex than simple "constant time."
charcircuit · 4h ago
CPUs do not guarantee that branchless instructions always take the same amount of time.
Arnavion · 4h ago
CPUs guarantee what they guarantee, and if they guarantee that a certain instruction takes an operand-independent time then it does.
For example a RISC-V CPU implementing the Zkt extension is required to implement a whole bunch of logical and arithmetic integer operations with operand-independent timing. This includes the two "branchless move" instructions from the Zicond extension, czero.eqz and czero.nez.
nu11ptr · 5h ago
That is my understanding. It lets you bypass CGo overhead, but I'd be lying if I said I fully understood it.
MangoToupe · 2h ago
Yes, to a certain extent if you have knowledge about the code you're calling into, you can "cheat" and stay on the go stack rather than switching to the C one. This take pretty solid knowledge of both runtimes to manage.
pjmlp · 5h ago
Not really, having an Assembler around has been quite common in compiled languages, before the rise of scripting languages in the 2000's.
For all my complaints about Go's design, that is certainly one that I appreciate.
cyberax · 5h ago
Remember that Go actually compiles the code to machine code directly, so it needs to have an assembler for its compiler. And if you have it, then why not make it available?
Stratoscope · 4h ago
> Go actually compiles the code to machine code directly
True.
> so it needs to have an assembler for its compiler.
No, it doesn't need an assembler for this. As you said correctly, it compiles to machine code directly.
While it was once fairly common to use assembly as an intermediate step, very few or any modern compilers do that. They just compile directly to binary machine code.
Go does have a -S flag to generate assembly language so you can review the generated code more easily. But that assembly code isn't part of the compilation pipeline, it's just optional output for human review.
kbolino · 2h ago
I think this is just half true.
When dealing with purely high-level code (including C without inline assembly), the compiler doesn't need a discrete assembler. This much is true, and most modern compilers will not spit out assembly unless requested.
However, there's usually still a stage of the compilation process where an assembly-like internal representation is used. Once the compiler has chosen which registers to use and where, and which instructions to use and in what order, etc., it's close to machine code but not fully there yet. At this point, jump targets will still be labels without concrete addresses, instructions will be still be referenced by mnemonic, etc.
Serializing this internal representation as assembly will generally produce more readable code than disassembling the final binary. So it's not assembly exactly, but it's pretty close.
Stratoscope · 2h ago
Yes, very true! And more accurate than what I wrote. The point I meant to make was that assembly language in the text form that we would use is not part of the compilation process. But I oversimplified my description rather badly.
Interestingly, Go also has a -d=ssa/all switch that outputs not just an assembly representation of the final code, but also the results of each optimization pass.
Here is a discussion I had with ChatGPT about this:
The division between 'assembler' and 'generating machine code directly' can be pretty blurry depending on the interface. I imagine a compiler would basically interleave what the assembler would normally do with the code generation. I don't think this really yields deep insight, though, you're effectively doing the same thing either way.
Better to structure your code so that you do the pointer manipulation (and allocation) in Go code instead, and leave assembly only for what is absolutely necessary for performance (usually things like bulk operations, special instructions, and so on).
While I generally agree with this, one way to mitigate the maintenance issue is to offer a macro assembler instruction that performs a write barrier that is kept up to date with what the Go compiler emits. If it the compiler itself uses that macro assembler method, it's already pretty easy.
After many years of racking my brain on how to make each of the things "blessed and bulletproof", I've realized that systems languages will inevitably be used by people building high performance systems. People doing this will go to crazy lengths and low-level hackery that you can't anticipate, e.g. generating their own custom machine code[1], doing funky memory mapping tricks, etc. The boundaries between that machine code and your language are always going to run into representational issues like write barriers and object layouts.
[1] Pretty much all I do is language VMs, and there is no way around this. It will have a JIT compiler.
https://go.dev/wiki/AssemblyPolicy
https://github.com/golang/go/tree/master/src/crypto/internal...
For example a RISC-V CPU implementing the Zkt extension is required to implement a whole bunch of logical and arithmetic integer operations with operand-independent timing. This includes the two "branchless move" instructions from the Zicond extension, czero.eqz and czero.nez.
For all my complaints about Go's design, that is certainly one that I appreciate.
True.
> so it needs to have an assembler for its compiler.
No, it doesn't need an assembler for this. As you said correctly, it compiles to machine code directly.
While it was once fairly common to use assembly as an intermediate step, very few or any modern compilers do that. They just compile directly to binary machine code.
Go does have a -S flag to generate assembly language so you can review the generated code more easily. But that assembly code isn't part of the compilation pipeline, it's just optional output for human review.
When dealing with purely high-level code (including C without inline assembly), the compiler doesn't need a discrete assembler. This much is true, and most modern compilers will not spit out assembly unless requested.
However, there's usually still a stage of the compilation process where an assembly-like internal representation is used. Once the compiler has chosen which registers to use and where, and which instructions to use and in what order, etc., it's close to machine code but not fully there yet. At this point, jump targets will still be labels without concrete addresses, instructions will be still be referenced by mnemonic, etc.
Serializing this internal representation as assembly will generally produce more readable code than disassembling the final binary. So it's not assembly exactly, but it's pretty close.
Interestingly, Go also has a -d=ssa/all switch that outputs not just an assembly representation of the final code, but also the results of each optimization pass.
Here is a discussion I had with ChatGPT about this:
https://chatgpt.com/share/6859aea5-df1c-8012-be70-f2361060fb...