If you want your library to operate on bytes, then rather than taking in an io.Reader and trying to figure out how to get bytes out of it the most efficient way, why not just have the library taken in []byte rather than io.Reader?
If someone has a complex reader and needs to extract to a temporary buffer, they can do that. But if like in the author's case you already have []byte, then just pass that it rather than trying to wrap it.
I think the issue here is that the author is adding more complexity to the interface than needed.
If you need a []byte, take in a []byte. Your callers should be able to figure out how to get you that when they need to.
With go, the answer is usually "just do the simple thing and you will have a good time".
Isn't using the stdlib simpler than not for your callers?
I also often hear gophers say to take inspiration from the go stdlib. The 'net/http' package's 'http.Request.Body' also has this same UX. Should there be `Body` and `BodyBytes` for the case when your http request wants to refer to a reader, vs wants to refer to bytes you already have?
jchw · 1h ago
The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.
In most cases I'd argue it really is idiomatic Go to offer a []byte API if that can be done more efficiently. The Go stdlib does sometimes offer both a []byte and Reader API for input to encoding/json, for example. Internally, I don't think it actually streams incrementally.
That said I do see why this doesn't actually apply here. IMO the big problem here is that you can't just rip out the Bytes() method with an upcast and use that due to the wrapper in the way. If Go had a way to do somehow transparent wrapper types this would possilby not be an issue. Maybe it should have some way to do that.
TheDong · 1h ago
> The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.
Ah, sorry, we were talking about two different 'http.Request.Body's. For some weird reason both the `http.Client.Do`'s request and `http.Server`'s request are the same type.
You're right that you usually don't have the bytes for the server, but for the client, like a huge fraction of client requests are `http.NewRequestWithContext(context.TODO(), "POST", "api.foo.com", bytes.NewReader(jsonBytesForAPI))`. You clearly have the bytes in that case.
Anyway, another example of the wisdom of the stdlib, you can save on structs by re-using one struct, and then having a bunch of comments like "For server requests, this field means X, for client requests, this is ignored or means Y".
tptacek · 1h ago
It is, but one of the virtues of the Go ecosystem is that it's also often very easy to fork the standard library; people do it with the whole TLS stack all the time.
The tension Ted is raising at the end of the article --- either this is an illustration of how useful casting is, or a showcase of design slipups in the standard library --- well, :why-not-both:. Go is very careful about both the stability of its standard library and the coherency of its interfaces (no popen, popen2, subprocess). Something has to be traded off to get that; this is one of the things. OK!
vjerancrnjak · 57m ago
That's how leaky abstraction of many file std implementations starts.
Reading into a byte buffer, pass in a buffer to read values, pass in a buffer to write values. Then OS does the same thing, has its own buffer that accepts your buffer, then the underlying storage volume has its own buffer.
Buffers all the way down to inefficiency.
int_19h · 1h ago
Because it forces the reader to read data into a temporary buffer in its entirety. If the thing this function is trying to do doesn't actually require it to do its job, that introduces unnecessary overhead.
silverwind · 22m ago
A good API should just accept either,e.g. the union of []byte and io.Reader.
Both have pros and cons and those should be for the user to decide.
thayne · 4m ago
Ah, but go doesn't have union types.
woah · 37m ago
Seems pretty crazy to force a bunch of data to be saved into memory all the time just for programming language aesthetic reasons
dgb23 · 1h ago
Personally I rarely use or even implement interfaces except some other part needs them. My brain thinks in terms of plain data by default.
I appreciate how they compose, for example when I call io.Copy and how things are handled for me. But when I structure my code that way, it’s extra effort that doesn’t come naturally at all.
jchw · 3h ago
The biggest issue here IMO is the interaction between two things:
- "Upcasting" either to a concrete type or to an interface that implements a specific additional function; e.g. in this case Bytes() would probably be useful
- Wrapper types, like bufio.Reader, that wrap an underlying type.
In isolation, either practice works great and I think they're nice ideas. However, over and over, they're proving to work together poorly. A wrapper type can't easily forward the type it is wrapping for the sake of accessing upcasts, and even if it did, depending on the type of wrapper it might be bad to expose the underlying type, so it has to be done carefully.
So instead this winds up needing to be handled basically for each type hierarchy that needs it, leading to awkward constructions like the Unwrap function for error types (which is very effective but weirder than it sounds, especially because there are two Unwraps) and the ResponseController for ResponseWriter wrappers.
Seems like the language or standard library needs a way to express this situation so that a wrapper can choose to be opaque or transparent and there can be an idiomatic way of exposing this.
movpasd · 2h ago
I'm not sure I fully understand the issue as I don't know Go, but is this something that a language-level delegation feature could help with?
hello_computer · 1h ago
It is the same struggle you can find in any language with private/public props. The stream he wants to read from is actually just a buffer that has been wrapped as a stream, and he’s having a hard time directly accessing the buffer through its wrapper. He could stream it into a new temporary buffer, but he’s trying to avoid that since it’s wasteful. I’ve had the same problem in C++.
> Now, why doesn’t bytes.Reader implement Peek? It’s just a byte slice, it’s definitely possible to peek ahead without altering stream state. But it was overlooked, and instead this workaround is applied.
When I first looked at Go, it seemed to have far too many layers of abstraction on top of one another. Which is so ironic, considering that's one of the main things it was trying to fix about Java. It ended up becoming the thing it fought against.
treyd · 2h ago
I would agree with you but not so much here specifically. It's much more true with how goroutines and channels work, in that they're too unstructured and don't compose well, which necessitates needing to make ad-hoc abstractions around them.
millipede · 2h ago
Type inspection is the flaw of Go's interface system. Try to make a type that delegates to another object, and the type inspection breaks. It's especially noticeable with the net/http types, which would be great to intercept, but then breaks things like Flusher or Hijacker.
msteffen · 2h ago
> The bytes.Reader should really implement Peek. I’m pretty sure the reason it doesn’t is because this is the only way of creating read only views of slices. And a naughty user could peek at the bytes and then modify them. Sigh. People hate const poisoning, but I hate this more.
When I was a Google, a team adjacent to ours was onboarding a new client with performance demands that they could not realistically meet with anything resembling their current hardware footprint. Their service was a stateless Java service, so they elected to rewrite in C++. Now, Java has some overhead because of garbage collection and the JVM, and they hoped that this might move the needle, but what happened was they went from 300qps/core to 1200, with lower tail latency. Literally 3x improvement.
Why? Probably a lot of reasons, but the general consensus was: Java has no const, so many of Google’s internal libraries make defensive copies in many places, to guarantee immutability (which is valuable in a highly concurrent service, which everything there is). This generates a huge amount of garbage that, in theory, is short-lived, rarely escapes its GC generation, and can all be cleaned up after the request is finished. But their experience was that it’s just much faster to not copy and delete things all over the place. Which you can often avoid by using const effectively. I came to believe that this was Java’s biggest performance bottleneck, and when I saw that Go had GC with no const, I figured it would have the exact same problem
hinkley · 2h ago
Java has a little const. Strings are immutable. You can make objects with no mutations, so you can make read only collections fairly easily which is usually where const becomes a problem.
But then you have for instance Elixir, where all functions are pure, so mutating inputs to outputs takes a ton of copying, and any data structure that is not a DAG is a gigantic pain in the ass to modify. I lost count of how many tries it took me to implement Norvig’s sudoku solver. I kept having to go back and redesign my data structures every time I added more of the logic.
[edit to add]: DTOs exist in Java because some jackass used the term “Value Object” to include mutation despite large swaths of the CS world considering VOs to be intrinsically const. So then they had to make up a new term that meant Value Object without using the concept they already broke.
hnlmorg · 2h ago
Are you able to explain the problem a little more because “const” does exist as a keyword, so I assume it’s doing something different to what you’re referring to with regards to C++ constants. Is Go not substituting constants like a macro? Or are we discussing something entirely different and I’m misunderstanding the context here?
jerf · 1h ago
Java, Go, and C++ all have enough differences here that at this level of detail you shouldn't assume any other conversion will have exactly the same result that msteffen lays out. Java generally has more sophisticated compilation and a lot more engineering effort poured into it, but Go often ends up with less indirection in the first place and started life with what Java calls records [1] so they are more pervasive throughout the ecosystem. Which effect "wins" can be difficult to guess in advance without either a deep analysis of the code, or just trying it.
What msteffen talks about is a general principle that you can expect even small differences between languages to sometimes have significant impact on code.
I think this is also one of the reasons Rust libraries tend to come out so fast. They're very good at not copying things, but doing it safely without having to make "just in case" copies. It's hard to ever see a benchmark in which this particular effect makes Rust come out faster than any other language, because in the natural process of optimizing any particular benchmark for a non-Rust language, the benchmark will naturally not involve taking random copies "just in case", but this can have a significant impact on all that real code out in the real world not written for benchmarking. Casually written Rust can safely not make lots of copies, casually written code in almost anything else will probably have a lot more copying than the programmer realizes.
They mean const in the sense of readonly guarantees.
In java types are generally shared and mutable so let's say you want a list input, you generally don't store it as is because the caller could modify it at any point, so if you accept a `List`, you defensively copy it into an inner type for safety, which has a cost (even more so if you also need to defensively copy the list contents).
And likewise on output otherwise the caller could downcast and modify (in that specific case you could wrap it in an unmodifiableList, but not all types have an unmodifiable view available).
kentm · 2h ago
I’m assuming they mean const function/method parameters. Being able to mark inputs to functions as const to guarantee that they aren’t mutated in C++ which often means you can just pass in the value by reference safely.
bobbylarrybobby · 43m ago
I am not a gopher, so this may be a dumb question: when an io.Reader produces a buffer of its contents, does it not have the option of just returning the buffer it wraps if it does in fact wrap a buffer? Something like (pseudocode) `if self isa BufferedReader { self.takeBuffer() } else { let buffer = newBuffer(); self.fill(buffer); buffer }`.
hkpack · 2h ago
It seems that go library is ok with you paying the performance price when using io.Reader/io.Writer on memory structures.
You can write clean idiomatic code, but it won’t be the fastest. So for maximum results you should always do everything manually for your use case: i.e. don’t use additional readers/writers and operate on []byte directly if that is what you are working with.
I think it is mostly a good thing - you can quickly write simple but slower code and refactor everything later when needed.
liampulles · 2h ago
The reason interface smuggling exists as a pattern in the Go standard library and others is because the Go team (and those who agree with its philosophy) take breaking API changes really seriously.
It is no small feat that Go is still on major version 1.
treyd · 2h ago
Wouldn't you say that it's a design oversight that the interface system leads to tight constraints on what you can do without breaking APIs?
liampulles · 49m ago
I can't call it a design oversight no, because I'm not sure what reasonable alternatives were considered before Go v1 was released. I also don't have context of all the decision factors that went into Go's spec. To be honest, I'm not anywhere near an expert on programming language design - I'm probably the wrong person to ask.
I am thankful that they haven't broken the spec to change that design, but maybe others don't care about that as much as I do.
38 · 1h ago
> What I would like is for my image decoding function to notice that the io.Reader it has been given is in fact a bytes.Reader so we can skip the copy.
What a terrible idea. If you want bytes.reader, then use that in the function signature, or better yet just a byte slice. It should have been a red flag when your solution involves the unsafe package
If you want your library to operate on bytes, then rather than taking in an io.Reader and trying to figure out how to get bytes out of it the most efficient way, why not just have the library taken in []byte rather than io.Reader?
If someone has a complex reader and needs to extract to a temporary buffer, they can do that. But if like in the author's case you already have []byte, then just pass that it rather than trying to wrap it.
I think the issue here is that the author is adding more complexity to the interface than needed.
If you need a []byte, take in a []byte. Your callers should be able to figure out how to get you that when they need to.
With go, the answer is usually "just do the simple thing and you will have a good time".
Isn't using the stdlib simpler than not for your callers?
I also often hear gophers say to take inspiration from the go stdlib. The 'net/http' package's 'http.Request.Body' also has this same UX. Should there be `Body` and `BodyBytes` for the case when your http request wants to refer to a reader, vs wants to refer to bytes you already have?
In most cases I'd argue it really is idiomatic Go to offer a []byte API if that can be done more efficiently. The Go stdlib does sometimes offer both a []byte and Reader API for input to encoding/json, for example. Internally, I don't think it actually streams incrementally.
That said I do see why this doesn't actually apply here. IMO the big problem here is that you can't just rip out the Bytes() method with an upcast and use that due to the wrapper in the way. If Go had a way to do somehow transparent wrapper types this would possilby not be an issue. Maybe it should have some way to do that.
Ah, sorry, we were talking about two different 'http.Request.Body's. For some weird reason both the `http.Client.Do`'s request and `http.Server`'s request are the same type.
You're right that you usually don't have the bytes for the server, but for the client, like a huge fraction of client requests are `http.NewRequestWithContext(context.TODO(), "POST", "api.foo.com", bytes.NewReader(jsonBytesForAPI))`. You clearly have the bytes in that case.
Anyway, another example of the wisdom of the stdlib, you can save on structs by re-using one struct, and then having a bunch of comments like "For server requests, this field means X, for client requests, this is ignored or means Y".
The tension Ted is raising at the end of the article --- either this is an illustration of how useful casting is, or a showcase of design slipups in the standard library --- well, :why-not-both:. Go is very careful about both the stability of its standard library and the coherency of its interfaces (no popen, popen2, subprocess). Something has to be traded off to get that; this is one of the things. OK!
Reading into a byte buffer, pass in a buffer to read values, pass in a buffer to write values. Then OS does the same thing, has its own buffer that accepts your buffer, then the underlying storage volume has its own buffer.
Buffers all the way down to inefficiency.
Both have pros and cons and those should be for the user to decide.
I appreciate how they compose, for example when I call io.Copy and how things are handled for me. But when I structure my code that way, it’s extra effort that doesn’t come naturally at all.
- "Upcasting" either to a concrete type or to an interface that implements a specific additional function; e.g. in this case Bytes() would probably be useful
- Wrapper types, like bufio.Reader, that wrap an underlying type.
In isolation, either practice works great and I think they're nice ideas. However, over and over, they're proving to work together poorly. A wrapper type can't easily forward the type it is wrapping for the sake of accessing upcasts, and even if it did, depending on the type of wrapper it might be bad to expose the underlying type, so it has to be done carefully.
So instead this winds up needing to be handled basically for each type hierarchy that needs it, leading to awkward constructions like the Unwrap function for error types (which is very effective but weirder than it sounds, especially because there are two Unwraps) and the ResponseController for ResponseWriter wrappers.
Seems like the language or standard library needs a way to express this situation so that a wrapper can choose to be opaque or transparent and there can be an idiomatic way of exposing this.
When I first looked at Go, it seemed to have far too many layers of abstraction on top of one another. Which is so ironic, considering that's one of the main things it was trying to fix about Java. It ended up becoming the thing it fought against.
When I was a Google, a team adjacent to ours was onboarding a new client with performance demands that they could not realistically meet with anything resembling their current hardware footprint. Their service was a stateless Java service, so they elected to rewrite in C++. Now, Java has some overhead because of garbage collection and the JVM, and they hoped that this might move the needle, but what happened was they went from 300qps/core to 1200, with lower tail latency. Literally 3x improvement.
Why? Probably a lot of reasons, but the general consensus was: Java has no const, so many of Google’s internal libraries make defensive copies in many places, to guarantee immutability (which is valuable in a highly concurrent service, which everything there is). This generates a huge amount of garbage that, in theory, is short-lived, rarely escapes its GC generation, and can all be cleaned up after the request is finished. But their experience was that it’s just much faster to not copy and delete things all over the place. Which you can often avoid by using const effectively. I came to believe that this was Java’s biggest performance bottleneck, and when I saw that Go had GC with no const, I figured it would have the exact same problem
But then you have for instance Elixir, where all functions are pure, so mutating inputs to outputs takes a ton of copying, and any data structure that is not a DAG is a gigantic pain in the ass to modify. I lost count of how many tries it took me to implement Norvig’s sudoku solver. I kept having to go back and redesign my data structures every time I added more of the logic.
[edit to add]: DTOs exist in Java because some jackass used the term “Value Object” to include mutation despite large swaths of the CS world considering VOs to be intrinsically const. So then they had to make up a new term that meant Value Object without using the concept they already broke.
What msteffen talks about is a general principle that you can expect even small differences between languages to sometimes have significant impact on code.
I think this is also one of the reasons Rust libraries tend to come out so fast. They're very good at not copying things, but doing it safely without having to make "just in case" copies. It's hard to ever see a benchmark in which this particular effect makes Rust come out faster than any other language, because in the natural process of optimizing any particular benchmark for a non-Rust language, the benchmark will naturally not involve taking random copies "just in case", but this can have a significant impact on all that real code out in the real world not written for benchmarking. Casually written Rust can safely not make lots of copies, casually written code in almost anything else will probably have a lot more copying than the programmer realizes.
[1]: https://blogs.oracle.com/javamagazine/post/records-come-to-j...
Thanks for the explanation
In java types are generally shared and mutable so let's say you want a list input, you generally don't store it as is because the caller could modify it at any point, so if you accept a `List`, you defensively copy it into an inner type for safety, which has a cost (even more so if you also need to defensively copy the list contents).
And likewise on output otherwise the caller could downcast and modify (in that specific case you could wrap it in an unmodifiableList, but not all types have an unmodifiable view available).
You can write clean idiomatic code, but it won’t be the fastest. So for maximum results you should always do everything manually for your use case: i.e. don’t use additional readers/writers and operate on []byte directly if that is what you are working with.
I think it is mostly a good thing - you can quickly write simple but slower code and refactor everything later when needed.
It is no small feat that Go is still on major version 1.
I am thankful that they haven't broken the spec to change that design, but maybe others don't care about that as much as I do.
What a terrible idea. If you want bytes.reader, then use that in the function signature, or better yet just a byte slice. It should have been a red flag when your solution involves the unsafe package