This is such a small thing, but I love the inclusion of the value ranges for the integers! I can never remember which side can go one deeper ("is it [-128 to 127] or [-127 to 128]"). Bookmarking this for reference later!
newpavlov · 6h ago
Tangential note: I sometimes wish that signed integers were symmetrical. i8 would represent the range of [-127 to 127] with 0xFF representing NaN. Any operation which can not be computed (division by zero, overflows, operation with another NaN, etc.) would result in NaN. For further symmetry we could do the same for signed integers as well.
Yes, it's possible to encode such types manually, but it will not be efficient since CPUs do not natively support such operations.
lock1 · 6h ago
Wouldn't this make CPU flags useless? I think it would complicate branch instructions too, as most modern CPUs tend to use integer operations for branching.
Also, this in-band signaling probably would invite something similar to `null` mess in type systems. I can't wait to tell CPU to JMP NaN.
newpavlov · 6h ago
>Wouldn't this make CPU flags useless?
They would, but I agree with RISC-V here, CPUs should not rely on them in the first place.
I do not understand your argument about branches, how would it hinder the jump instructions?
We still would need separate "wrapping" instructions (e.g. for implementing bigints and cryptographic algorithms), but they probably could be limited to unsigned operations only.
>I can't wait to tell CPU to JMP NaN.
How is it different from jumping to null? If you do such jump, it means you have a huge correctness problem with your code.
lock1 · 5h ago
> I do not understand your argument about branches, how would it hinder the jump instructions?
Extra set of logic for handling NaN cases? I don't think it's impossible, just kind of less intuitive. Jump instruction using integer w/o NaN always valid, while NaN-able integer sometimes invalid (ignoring whether the memory address can be accessed).
newpavlov · 5h ago
For absolute jumps you don't need extra logic, since CPUs could declare the last page always unmapped, so such jumps would always result in a page fault (similarly to the null page on most systems).
For relative non-immediate jumps the added logic is extremely simple (hardware exception on NaN) and should not (AFAIK) hinder performance of jumps in any way.
zokier · 6h ago
That sounds surprisingly reasonable idea for signeds. Less so for unsigneds though. Has there been any architecture doing anything like that?
newpavlov · 5h ago
I can not name an ISA with such instructions out of my head.
As for unsigned integers, as I mentioned in the other comment, we probably need two separate instruction sets for "wrapping" and NaN-able operations on unsigned integers.
throwawaymaths · 11h ago
It's always negative. 0xFFFF... Cannot have a two's complement, and the top bit is set.
delusional · 10h ago
I find that the easiest way to remember it is to remember that 0 is positive but has no negative counterpart.
high_priest · 5h ago
The 0 is positive is not true, but some day you are hopefully going to get it.
The true answer is that negative numbers have the top bit set, which can't be used for positive numbers. Hence positives are one bit short.
jibal · 9h ago
I can't imagine suffering from that. Understanding twos complement representation is an essential programming skill. And a byte value of 128? What is that in hex?
dzaima · 8h ago
You could pretty easily have an integer representation using [-127; 128]; 128 being 0x80 of course (all other values being the same as in two's complement). Still would hold that -n == 1 + ~n, zero is all-zeroes, and the property that add/sub needn't care about signed vs unsigned. Only significant difference being that top bit doesn't determine negativeness, though of course it's still "x < 0" in code. (at the hardware level, sign extension & comparisons might also get very slightly more complicated, but that's far outside what typical programmers would need to know)
For most practical purposes outside of low-level stuff all that really matters about two's complement is Don't Get Near 2^(width-1) Or Bad™ Things Happen. Including +128 would even have the benefit of 1<<7 staying positive.
moefh · 7h ago
> Only difference being that you need to do a bit more work to determine negativeness (work which in hardware you'd already likely have the bulk of for determining is-zero).
The work needed to calculate the overflow flag (done in every add/sub operation in most ISAs) is also way more complicated when the high bit does not represent sign.
dzaima · 6h ago
Oh, true. Even further down low-level/frequently-unused details though; and RISC-V does without it (/ flags in general) roughly fine.
AnIrishDuck · 8h ago
> Understanding twos complement representation is an essential programming skill
The field of programming has become so broad that I would argue the opposite. The vast majority of developers will never need to think about let alone understand twos complement as a numerical representation.
oconnor663 · 8h ago
What is your goal for this comment?
wubrr · 6h ago
> Understanding twos complement representation is an essential programming skill.
It is completely irrelevant for the vast majority of programming.
koakuma-chan · 8h ago
I have no idea what is twos complement representation
koakuma-chan · 8h ago
It just means the most significant bit represents the sign?
craftkiller · 8h ago
It's a little bit more complicated than that. If only the most significant bit represented the sign then you'd have both positive and negative zero (which is possible with floats), and you'd only be able to go from [-127 to 127]. Instead, it's some incantation where the MSB is the sign but then you flip all the bits and add 1. It is only relevant for signed integers, not unsigned integers.
pests · 8h ago
Ben Eater has a really good YT video on this.
lock1 · 7h ago
That's called "ones complement", the most significant bit represents a sign. Like the sibling post mentioned, it does have a weird quirk of having 2 representations for 0: (-0) and (+0).
While "twos complement" turns the MSB unsigned value to a negative instead of a positive. For example, 4-bit twos complement: 1000 represents -8 (in unsigned 4-bit, this supposed to be +8), 0100 represents 4, 0010 represents 2, 0001 represents 1. Some more numbers: 7 (0111), -7 (1001), 1 (0001), -1 (1111).
Intuitively, "ones complement" MSB represents a multiplication by (-1). While "twos complement" MSB adds (-N), with N = 2^(bit length - 1), in case of 4-bit twos complement it's (-2^3) or (-8). Both representation leave non-MSB bits work exactly like unsigned integer.
craftkiller · 8h ago
Eh, how often are you going down to the bit representation of signed integers? Naturally I learned two's complement ages ago, but all of my bitwise manipulation seems to be on unsigned integers (and frankly I've only used bitwise operations at work once for implementing bloom filters. Normally I only get to do lower level stuff like that in side-projects). So internalizing two's complement has never seemed relevant.
One part that I love especially about it is that it represents lifetimes [1] and memory layout [2] of data structures in graphical format. They're as invaluable as API references. I would love to see it included in other documentation as well.
There aren't that much of them actually ! Almost feel like an element table
smj-edison · 11h ago
I really like how it scrolls left-to-right on mobile, instead of collapsing down.
adastra22 · 8h ago
Why is PhantomData in the unsafe support group?
john-h-k · 8h ago
It obviously can be used for other things but it principally was designed for unsafe support (allowing dropck to understand unsafe types that own a value through a pointer). See https://doc.rust-lang.org/nomicon/phantom-data.html
saghm · 8h ago
Interesting, I've had to use it a number of times over the years despite never really doing much unsafe. At least to me, it seems pretty well-scoped as a workaround from the requirements that the compiler has around needing to use generic type parameters in type definitions, which certainly isn't something you need to be writing unsafe code to run into. I wouldn't be shocked if it used unsafe under the hood, but then again, so does Vec.
afdbcreid · 6h ago
The original reason to design it (instead of the previously inferred bivariance) was so that unsafe code that really does not want bivariance, and will be unsound if it will be used, will remember to consider that.
It doesn't use unsafe under the hood, rather it's compiler magic.
john-h-k · 6h ago
> At least to me, it seems pretty well-scoped as a workaround from the requirements that the compiler has around needing to use generic type parameters in type definitions
The reason those requirements exist is (primarily) to do with unsafe code. Specifically it’s about deciding the variance of the type (which doesn’t matter for a truely unused type parameter).
Yes, it's possible to encode such types manually, but it will not be efficient since CPUs do not natively support such operations.
Also, this in-band signaling probably would invite something similar to `null` mess in type systems. I can't wait to tell CPU to JMP NaN.
They would, but I agree with RISC-V here, CPUs should not rely on them in the first place.
I do not understand your argument about branches, how would it hinder the jump instructions?
We still would need separate "wrapping" instructions (e.g. for implementing bigints and cryptographic algorithms), but they probably could be limited to unsigned operations only.
>I can't wait to tell CPU to JMP NaN.
How is it different from jumping to null? If you do such jump, it means you have a huge correctness problem with your code.
For relative non-immediate jumps the added logic is extremely simple (hardware exception on NaN) and should not (AFAIK) hinder performance of jumps in any way.
As for unsigned integers, as I mentioned in the other comment, we probably need two separate instruction sets for "wrapping" and NaN-able operations on unsigned integers.
The true answer is that negative numbers have the top bit set, which can't be used for positive numbers. Hence positives are one bit short.
For most practical purposes outside of low-level stuff all that really matters about two's complement is Don't Get Near 2^(width-1) Or Bad™ Things Happen. Including +128 would even have the benefit of 1<<7 staying positive.
The work needed to calculate the overflow flag (done in every add/sub operation in most ISAs) is also way more complicated when the high bit does not represent sign.
The field of programming has become so broad that I would argue the opposite. The vast majority of developers will never need to think about let alone understand twos complement as a numerical representation.
It is completely irrelevant for the vast majority of programming.
While "twos complement" turns the MSB unsigned value to a negative instead of a positive. For example, 4-bit twos complement: 1000 represents -8 (in unsigned 4-bit, this supposed to be +8), 0100 represents 4, 0010 represents 2, 0001 represents 1. Some more numbers: 7 (0111), -7 (1001), 1 (0001), -1 (1111).
Intuitively, "ones complement" MSB represents a multiplication by (-1). While "twos complement" MSB adds (-N), with N = 2^(bit length - 1), in case of 4-bit twos complement it's (-2^3) or (-8). Both representation leave non-MSB bits work exactly like unsigned integer.
> And a byte value of 128? What is that in hex?
0x80
One part that I love especially about it is that it represents lifetimes [1] and memory layout [2] of data structures in graphical format. They're as invaluable as API references. I would love to see it included in other documentation as well.
[1] https://cheats.rs/#memory-lifetimes
[2] https://cheats.rs/#memory-layout
It doesn't use unsafe under the hood, rather it's compiler magic.
The reason those requirements exist is (primarily) to do with unsafe code. Specifically it’s about deciding the variance of the type (which doesn’t matter for a truely unused type parameter).