If you've ever done this to a C library, the first thing that you'll look at when someone else does it is not the FILE type, but how stdin, stdout, and stderr have changed.
The big breaking change is usually the historical implementation of the standard streams as addresses of elements of an array rather than as named pointers. (Plauger's example implementation had them as elements 0, 1, and 2 of a _Files[] array, for example.) It's possible to retain binary compatibility with unrecompiled code that uses the old getc/putc/feof/ferror/fclearerr/&c. macros by preserving structure layouts, but changing stdin, stdout, and stderr can make things not link.
And indeed that has happened here.
loeg · 3h ago
I think FreeBSD tried to opaque FILE[1], but it was reverted[2] and still non-opaque in main[3].
OpenBSD tends to commit to breaking changes much more aggressively than others. Something tells me they're not reverting.
loeg · 33m ago
I think FreeBSD is also more concerned with performance regression than OpenBSD is.
abnercoimbre · 4h ago
Can someone elaborate? I always treated FILE as opaque, but never imagined people could poke into it?
pm215 · 3h ago
The MH and nmh mail clients used to directly look into FILE internals. If you look for LINUX_STDIO in this old version of the relevant file you can see the kind of ugliness that resulted:
It's basically searching an email file to find the contents of either a given header or the mail body. These days there is no need to go under the hood of libc for this (and this code got ripped out over a decade ago), but back when the mail client was running on elderly VAXen this ate up significant time. Sneaking in and reading directly from the internal stdio buffer lets you avoid copying all the data the way an fread would. The same function also used to have a bit of inline vax assembly for string searching...
The only reason this "works" is that traditionally the FILE struct is declared in a public header so libc can have some of its own functions implemented as macros for speed, and that there was not (when this hack was originally put in in the 1980s) yet much divergence in libc implementations.
fweimer · 3h ago
In gnulib, there is code that patches FILE internals for various platforms to modify behavior of <stdio.h> functions, or implement new functionality.
Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
collinfunk · 3h ago
Yes, that part of Gnulib has caused some problems previously. It is mostly used to implement <stdio_ext.h> functions on non-glibc systems. However, it is also needed for some buggy implementations of ftello, fseeko, and fflush.
P.S. Hi Florian :)
quotemstr · 1h ago
> Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
Hyrum's law strikes again. People cast dl_info and poke at internal bits all the time too.
glibc and others should be using kernel-style compiler-driven struct layout randomization to fight it.
jancsika · 34m ago
> Hyrum's law strikes again.
Is there a name for APIs that are drawn directly from some subset of observed behaviors?
Like Crockford going, "Hey, there's a nice little data format buried in these JS objects. Schloink"
recipe19 · 4h ago
The standard doesn't specify any serviceable parts, and I don't think there are any internals of the struct defined in musl libc on Linux (glibc may be a different story). However, on OpenBSD, it did seem to have some user-visible bits:
If you expose it, someone will probably sooner or later use it, but probably not in any sane / portable code. On the face of it, it doesn't seem like a consequential change, but maybe they're mopping up after some vulnerability in that one weird package that did touch this.
loeg · 3h ago
Historically some FILE designs exposed the structure somewhere so that some of the f* methods could be implemented as macros or inline functions (e.g., `fileno()`).
asveikau · 43m ago
I've seen old code do this over the years. When you consider for example that snprintf() didn't used to be standardized until the late 1990s. People would mock up a fake FILE* and use fprintf.
pjmlp · 4h ago
People use reflection for monkey patching and complain when using compiled languages less supportive of such approaches.
So it wouldn't surprise me, that a few folks would do some tricks with FILE internals.
bitwize · 3h ago
Hyrum's Law applies: the API of any software component is the entire exposed surface, not just what you've documented. Hence, if you have FILE well-defined somewhere in a programmer-accessible header, somebody somewhere can and will poke at the internal bits in order to achieve some hack or optimization.
krylon · 3h ago
OTOH, yes.
OTOH, when coding, I consider FILE to be effectively opaque in the sense that it probably is not portable, and that the implementers might change it at any time.
Yes, it would not be sane to depend on implementation details of something like this.
But the sad reality is that many developers (myself included earlier in my career) will do insane things to fix a critical bug or performance problem when faced with a tight deadline.
ksherlock · 3h ago
*BSD stdio.h used to include macro versions of some stdio functions (feof, ferror, clearerr, fileno, getc, putc) so they would be inlined.
/*
* This has been tuned to generate reasonable code on the vax using pcc.
*/*
zahlman · 2h ago
I always assumed that people could poke into it, but shuddered at the thought.
cperciva · 2h ago
In addition to "some code frobs internals", non-opaque FILE also allows for compatibility with code which puts FILE into a structure, since an opaque FILE doesn't have a size.
p0w3n3d · 4h ago
However, who should really rely on internals of FILE? Isn't this a bad practice?
vitaut · 4h ago
In general, it is a bad practice. However, it can be useful for some low-level libraries. For example, https://github.com/fmtlib/fmt provides a type-safe replacement for `printf` that can write directly to the FILE buffer providing comparable or better performance to native stdio.
Retr0id · 3h ago
Doesn't fwrite more or less write directly to the FILE buffer, if buffering is enabled?
I'm curious to take a closer look at fmtlib/fmt, which APIs treat FILE as non-opaque?
In SunOS 4.x `FILE` was not opaque, and `int fileno(FILE *)` was a macro, not a funciton, and the field of the struct that held the fd number was a `char`. Yeah, that sucked for ages, especially since it bled into the Solaris 2.x 32-bit ABI.
It was a then-important optimization to do the most common operations with macros since calling a function for every getc()/putc() would have slowed I/O down too much.
That's why there is also fgetc()/fputc() -- they're the same as getc()/putc() but they're always defined as functions so calling them generated less code at the callsite at the expense of always requiring a function call. A classic speed-vs-space tradeoff.
But, yeah, it was a mistake that it originally used a "char" to store the file descriptor. Back then it was typical to limit processes to 20 open files ( https://github.com/dspinellis/unix-history-repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
notepad0x90 · 2h ago
I don't know if I agree, but this is one shining example of what makes *bsd's great, not being afraid of change. Linux should take note. So much of Windows' headaches stem from not wanting to break things, and needing to support old client code.
justincormack · 2h ago
There isn't really much of "Linux" here - this code is in libc, so glibc, but that was built from portability, it isn't very Linux specific. Linux doesn't have an all encpmpassing community for userspace.
somat · 3h ago
To misquote the street fighter movie: OpenBSD to Linux:
"For you the day you changed your ABI was the most important day in your life, but for me? It was Tuesday"
I enjoy the dichotomy between how bad the Linux project is at changing their ABI and how good OpenBSD is at the same task.
Where for the most part Linux just decides to live with the bad ABI forever. and if they do decide it actually needs to be changed it is a multi year drama with much crying and missteps.
I mean sure, linux has additional considerations that make breaking the ABI very scary for them. the big one is the corpus of closed source software, but being a orders of magnitude bigger project and their overall looser integration does not help any.
viraptor · 3h ago
This has nothing to do with Linux-the-project. An equivalent change would be in glibc / musl / ...
ioasuncvinvaer · 3h ago
I think the difference is just the amount of people using the technology.
quotemstr · 1h ago
CHERI would defend against access to internal data structures without having to bounce between address spaces, FWIW.
The big breaking change is usually the historical implementation of the standard streams as addresses of elements of an array rather than as named pointers. (Plauger's example implementation had them as elements 0, 1, and 2 of a _Files[] array, for example.) It's possible to retain binary compatibility with unrecompiled code that uses the old getc/putc/feof/ferror/fclearerr/&c. macros by preserving structure layouts, but changing stdin, stdout, and stderr can make things not link.
And indeed that has happened here.
[1]: https://github.com/freebsd/freebsd-src/commit/c17bf9a9a5a3b5...
[2]: https://github.com/freebsd/freebsd-src/commit/19e03ca8038019...
[3]: https://github.com/freebsd/freebsd-src/blob/main/include/std...
https://cgit.git.savannah.gnu.org/cgit/nmh.git/tree/sbr/m_ge...
It's basically searching an email file to find the contents of either a given header or the mail body. These days there is no need to go under the hood of libc for this (and this code got ripped out over a decade ago), but back when the mail client was running on elderly VAXen this ate up significant time. Sneaking in and reading directly from the internal stdio buffer lets you avoid copying all the data the way an fread would. The same function also used to have a bit of inline vax assembly for string searching...
The only reason this "works" is that traditionally the FILE struct is declared in a public header so libc can have some of its own functions implemented as macros for speed, and that there was not (when this hack was originally put in in the 1980s) yet much divergence in libc implementations.
https://cgit.git.savannah.gnu.org/cgit/gnulib.git/tree/lib/s...
Yes, it's not a good idea to do this. There are more questionable pieces in gnulib, like closing stdin/stdout/stderr (because fflush and fsync is deemed too slow, and regular close reports some errors on NFS on some systems that would otherwise go unreported).
P.S. Hi Florian :)
Hyrum's law strikes again. People cast dl_info and poke at internal bits all the time too.
glibc and others should be using kernel-style compiler-driven struct layout randomization to fight it.
Is there a name for APIs that are drawn directly from some subset of observed behaviors?
Like Crockford going, "Hey, there's a nice little data format buried in these JS objects. Schloink"
https://github.com/openbsd/src/commit/b7f6c2eb760a2da367dd51...
If you expose it, someone will probably sooner or later use it, but probably not in any sane / portable code. On the face of it, it doesn't seem like a consequential change, but maybe they're mopping up after some vulnerability in that one weird package that did touch this.
So it wouldn't surprise me, that a few folks would do some tricks with FILE internals.
OTOH, when coding, I consider FILE to be effectively opaque in the sense that it probably is not portable, and that the implementers might change it at any time.
I am reminded of this fine article by Raymond Chen, which covers a similar situation on Windows way back when: https://devblogs.microsoft.com/oldnewthing/20031015-00/?p=42...
But the sad reality is that many developers (myself included earlier in my career) will do insane things to fix a critical bug or performance problem when faced with a tight deadline.
I'm curious to take a closer look at fmtlib/fmt, which APIs treat FILE as non-opaque?
Edit: ah, found some of the magic, I think: https://github.com/fmtlib/fmt/blob/35dcc58263d6b55419a5932bd...
I'm curious how much speedup is gained from this.
It was a then-important optimization to do the most common operations with macros since calling a function for every getc()/putc() would have slowed I/O down too much.
That's why there is also fgetc()/fputc() -- they're the same as getc()/putc() but they're always defined as functions so calling them generated less code at the callsite at the expense of always requiring a function call. A classic speed-vs-space tradeoff.
But, yeah, it was a mistake that it originally used a "char" to store the file descriptor. Back then it was typical to limit processes to 20 open files ( https://github.com/dspinellis/unix-history-repo/blob/Researc... ) so a "char" I'm sure felt like plenty.
"For you the day you changed your ABI was the most important day in your life, but for me? It was Tuesday"
I enjoy the dichotomy between how bad the Linux project is at changing their ABI and how good OpenBSD is at the same task.
Where for the most part Linux just decides to live with the bad ABI forever. and if they do decide it actually needs to be changed it is a multi year drama with much crying and missteps.
I mean sure, linux has additional considerations that make breaking the ABI very scary for them. the big one is the corpus of closed source software, but being a orders of magnitude bigger project and their overall looser integration does not help any.