ZFS: Apple's New Filesystem that wasn't (2016)

66 jitl 36 4/27/2025, 9:25:36 AM ahl.dtrace.org ↗

Comments (36)

ahl · 28m ago
Back in 2016, Ars Technica picked up this piece from my blog [1] as well as a longer piece reviewing the newly announced APFS [2] [3]. Glad it's still finding an audience!

[1]: https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...

[2]: https://ahl.dtrace.org/2016/06/19/apfs-part1/

[3]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-ana...

rrdharan · 58m ago
Kind of odd that the blog states that "The architect for ZFS at Apple had left" and links to the LinkedIn profile of someone who doesn't have any Apple work experience listed on their resume. I assume the author linked to the wrong profile?
nikhizzle · 50m ago
Ex-Apple File System engineer here who shared an office with the other ZFS lead at the time. Can confirm they link to the wrong profile for Don Brady.

This is the correct person: https://github.com/don-brady

Also can confirm Don is one of the kindest, nicest principal engineer level people I’ve worked with in my career. Always had time to mentor and assist.

ahl · 42m ago
Not sure how I fat-fingered Don's LinkedIn, but I'm updating that 9-year-old typo. Agreed that Don is a delight. In the years after this article I got to collaborate more with him, but left Delphix before he joined to work on ZFS.
jFriedensreich · 38m ago
The death of ZFS in macOS was a huge shift in the industry. This has to be seen in the context of microsoft killed their largely ambitious WinFS which felt like the death of desktop innovation in combination.
thyristan · 10m ago
Both are imho linked to "offline desktop use cases are not important anymore". Both companies saw their future gains elsewhere, in internet-related functions and what became known as "cloud". No need to have a fancy, featurefull and expensive filesystem when it is only to be used as a cache for remote cloud stuff.
volemo · 2h ago
It was just yesterday I relistened to the contemporary Hypercritical episode on the topic: https://hypercritical.fireside.fm/56
jitl · 4h ago
Besides the licensing issue, I wonder if optimizing ZFS for low latency + low RAM + low power on iPhone was an uphill battle or if it’s easy. My experiencing running ZFS years ago was poor latency and large RAM use with my NAS, but that hardware and drive configuration was optimized for low $ per gb stored and used parity stuff.
twoodfin · 1h ago
This seems like an early application of the Tim Cook doctrine: Why would Apple want to surrender control of this key bit of technology for their platforms?

The rollout of APFS a decade later validated this concern. There’s just no way that flawless transition happens so rapidly without a filesystem fit to order for Apple’s needs from Day 0.

TheNewsIsHere · 1h ago
(Edit: My comment is simply about the logistics and work involved in a very well executed filesystem migration. Not about whether ZFS is good for embedded or memory constrained devices.)

What you describe hits my ear as more NIH syndrome than technical reality.

Apple’s transition to APFS was managed like you’d manage any kind of mass scale filesystem migration. I can’t imagine they’d have done anything differently if they’d have adopted ZFS.

Which isn’t to say they wouldn’t have modified ZFS.

But with proper driver support and testing it wouldn’t have made much difference whether they wrote their own file system or adopted an existing one. They have done a fantastic job of compartmentalizing and rationalizing their OS and user data partitions and structures. It’s not like every iPhone model has a production run that has different filesystem needs that they’d have to sort out.

There was an interesting talk given at WWDC a few years ago on this. The roll out of APFS came after they’d already tested the filesystem conversion for randomized groups of devices and then eventually every single device that upgraded to one of the point releases prior to iOS 10.3. The way they did this was to basically run the conversion in memory as a logic test against real data. At the end they’d have the super block for the new APFS volume, and on a successful exit they simply discarded it instead of writing it to persistent storage. If it errored it would send a trace back to Apple.

Huge amounts of testing and consistency in OS and user data partitioning and directory structures is a huge part of why that migration worked so flawlessly.

jeroenhd · 1h ago
I don't see why ZFS wouldn't have gone over equally flawlessly. None of the features that make ZFS special were in HFS(+), so conversion wouldn't be too hard. The only challenge would be maintaining the legacy compression algorithms, but ZFS is configurable enough that Apple could've added their custom compression to it quite easily.

There are probably good reasons for Apple to reinvent ZFS as APFS a decade later, but none of them technical.

I also wouldn't call the rollout of APFS flawless, per se. It's still a terrible fit for (external) hard drives and their own products don't auto convert to APFS in some cases. There was also plenty of breakage when case-sensitivity flipped on people and software, but as far as I can tell Apple just never bothered to address that.

jonhohle · 21m ago
HFS compression, AFAICT, is all done in user space with metadata and extended attributes.
hs86 · 1h ago
While its deduplication feature clearly demands more memory, my understanding is that the ZFS ARC is treated by the kernel as a driver with a massive, persistent memory allocation that cannot be swapped out ("wired" pages). Unlike the regular file system cache, ARC's eviction is not directly managed by the kernel. Instead, ZFS itself is responsible for deciding when and how to shrink the ARC.

This can lead to problems under sudden memory pressure. Because the ARC does not immediately release memory when the system needs it, userland pages might get swapped out instead. This behavior is more noticeable on personal computers, where memory usage patterns are highly dynamic (applications are constantly being started, used, and closed). On servers, where workloads are more static and predictable, the impact is usually less severe.

I do wonder if this is also the case on Solaris or illumos, where there is no intermediate SPL between ZFS and the kernel. If so, I don't think that a hypothetical native integration of ZFS on macOS (or even Linux) would adopt the ARC in its current form.

fweimer · 1h ago
If I recall correctly, ZFS error recovery was still “restore from backup” at the time, and iCloud acceptance was more limited. (ZFS basically gave up if an error was encountered after the checksum showed that the data was read correctly from storage media.) That's fine for deployments where the individual system does not matter (or you have dedicated staff to recover systems if necessary), but phones aren't like that. At least not from the user perspective.
zoky · 2h ago
If it were an issue it would hardly be an insurmountable one. I just can't imagine a scenario where Apple engineers go “Yep, we've eked out all of the performance we possibly can from this phone, the only thing left to do is change out the filesystem.”
klodolph · 2h ago
Does it matter if it’s insurmountable? At some point, the benefits of a new FS outweigh the drawbacks. This happens earlier than you might think, because of weird factors like “this lets us retain top filesystem experts on staff”.
karlgkk · 1h ago
It’s worth remembering that the filesystem they were looking to replace was HFS+. It was introduced in the 90s as a modernization of HFS, itself introduced in the 80s.

Now, old does not necessarily mean bad, but in this case….

smittywerben · 42m ago
Thanks for sharing I was just looking for what happened to Sun. I like the second-hand quote comparing the IBM and HP as "garbage trucks colliding" plus the inclusion of blog posts with links to the court filings.

Is it fair to say ZFS made most sense on Solaris using Solaris Containers on SPARK?

ahl · 23m ago
ZFS was developed in Solaris, and at the time we were mostly selling SPARC systems. That changed rapidly and the biggest commercial push was in the form of the ZFS Storage Appliance that our team (known as Fishworks) built at Sun. Those systems were based on AMD servers that Sun was making at the time such as Thumper [1]. Also in 2016, Ubuntu leaned in to use of ZFS for containers [2]. There was nothing that specific about Solaris that made sense for ZFS, and even less of a connection to the SPARC architecture.

[1]: https://www.theregister.com/2005/11/16/sun_thumper/

[2]: https://ubuntu.com/blog/zfs-is-the-fs-for-containers-in-ubun...

thyristan · 1m ago
We had those things at work as fileservers, so no containers or anything fancy.

Sun salespeople tried to sell us the idea of "zfs filesystems are very cheap, you can create many of them, you don't need quota" (which ZFS didn't have at the time), which we tried out. It was abysmally slow. It was even slow with just one filesystem on it. We scrapped the whole idea, just put Linux on them and suddenly fileserver performance doubled. Which is something we weren't used to with older Solaris/Sparc/UFS or /VXFS systems.

We never tried another generation of those, and soon after Sun was bought by Oracle anyways.

ghaff · 8m ago
Yeah I think if it hadn’t been for the combination of Oracle and CDDL, Red Hat would have been more interested in for Linux. As it was they basically went with XFS and volume management. Fedora did eventually go with btrfs but dints know if there are are any plans for copy-on-write FS for RHEL at any point.
throw0101b · 4m ago
Apple and Sun couldn't agree on a 'support contract'. From Jeff Bonwick, one of the co-creators ZFS:

>> Apple can currently just take the ZFS CDDL code and incorporate it (like they did with DTrace), but it may be that they wanted a "private license" from Sun (with appropriate technical support and indemnification), and the two entities couldn't come to mutually agreeable terms.

> I cannot disclose details, but that is the essence of it.

* https://archive.is/http://mail.opensolaris.org/pipermail/zfs...

Apple took DTrace, licensed via CDDL—just like ZFS—and put it into the kernel without issue. Of course a file system is much more central to an operating system, so they wanted much more of a CYA for that.

jeroenhd · 1h ago
I wonder what ZFS in the iPhone would've looked like. As far as I recall, the iPhone didn't have error correcting memory, and ZFS is notorious for corrupting itself when bit flips hit it and break the checksum on disk. ZFS' RAM-hungry nature would've also forced Apple to add more memory to their phone.
amarshall · 54m ago
> ZFS is notorious for corrupting itself when bit flips hit it and break the checksum on disk

ZFS does not need or benefit from ECC memory any more than any other FS. The bitflip corrupted the data, regardless of ZFS. Any other FS is just oblivious, ZFS will at least tell you your data is corrupt but happily keep operating.

> ZFS' RAM-hungry nature

ZFS is not really RAM-hungry, unless one uses deduplication (which is not enabled by default, nor generally recommended). It can often seem RAM hungry on Linux because the ARC is not counted as “cache” like the page cache is.

---

ZFS docs say as much as well: https://openzfs.github.io/openzfs-docs/Project%20and%20Commu...

williamstein · 49m ago
And even dedup was finally rewritten to be significantly more memory efficient, as of the new 2.3 release of ZFS: https://github.com/openzfs/zfs/discussions/15896
ahl · 35m ago
It's very amusing that this kind of legend has persisted! ZFS is notorious for *noticing* when bits flip, something APFS designers claimed was rare given the robustness of Apple hardware.[1][2] What would ZFS on iPhone have looked like? Hard to know, and that certainly wasn't the design center.

Neither here nor there, but DTrace was ported to iPhone--it was shown to me in hushed tones in the back of an auditorium once...

[1]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-ana...

[2]: https://ahl.dtrace.org/2016/06/19/apfs-part5/#checksums

terlisimo · 40m ago
> ZFS is notorious for corrupting itself when bit flips

That is a notorious myth.

https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...

Dylan16807 · 53m ago
> ZFS is notorious for corrupting itself when bit flips hit it and break the checksum on disk

I don't think it is. I've never heard of that happening, or seen any evidence ZFS is more likely to break than any random filesystem. I've only seen people spreading paranoid rumors based on a couple pages saying ECC memory is important to fully get the benefits of ZFS.

thfuran · 47m ago
They also insist that you need about 10 TB RAM per TB disk space or something like that.
yjftsjthsd-h · 29m ago
There is a rule of thumb that you should have at least 1 GB of RAM per TB of disk when using deduplication. That's.... Different.
thfuran · 14m ago
So you've never seen the people saying you should steer clear of ZFS unless you're going to have an enormous ARC even when talking about personal media servers?
williamstein · 17m ago
Fortunately, this has significantly improved since dedup was rewritten as part of the new ZFS 2.3 release. Search for zfs “fast dedup”.
mrkeen · 48m ago
> ZFS is notorious for corrupting itself when bit flips hit it and break the checksum on disk.

What's a bit flip?

ahl · 32m ago
Sometimes data on disk and in memory are randomly corrupted. For a pretty amazing example, check out "bitsquatting"[1]--it's like domain name squatting, but instead of typos, you squat on domains that would bit looked up in the case of random bit flips. These can occur due e.g. to cosmic rays. On-disk, HDDs and SSDs can produce the wrong data. It's uncommon to see actual invalid data rather than have an IO fail on ECC, but it certainly can happen (e.g. due to firmware bugs).

[1]: https://en.wikipedia.org/wiki/Bitsquatting

zie · 34m ago
Basically it's that memory changes out from under you. As we know, computers use Binary, so everything boils down to it being a 0 or a 1. A bit flip is changing what was say a 0 into a 1.

Usually attributed to "cosmic rays", but really can happen for any number of less exciting sounding reasons.

Basically, there is zero double checking in your computer for almost everything except stuff that goes across the network. Memory and disks are not checked for correctness, basically ever on any machine anywhere. Many servers(but certainly not all) are the rare exception when it comes to memory safety. They usually have ECC(Error Correction Code) Memory, basically a checksum on the memory to ensure that if memory is corrupted, it's noticed and fixed.

Essentially every filesystem everywhere does zero data integrity checking:

  MacOS APFS: Nope
  Windows NTFS: Nope
  Linux EXT4: Nope
  BSD's UFS: Nope
  Your mobile phone: Nope
ZFS is the rare exception for file systems that actually double check the data you save to it is the data you get back from it. Every other filesystem is just a big ball of unknown data. You probably get back what you put it, but there is zero promises or guarantees.
crazygringo · 4m ago
> disks are not checked for correctness, basically ever on any machine anywhere.

I'm not sure that's really accurate -- all modern hard drives and SSD's use error-correcting codes, as far as I know.

That's different from implementing additional integrity checking at the filesystem level. But it's definitely there to begin with.