American Prairie unlocks another 70k acres in Montana (earthhope.substack.com)

If you had 1 KiB average file size then you have quadrillions of metadata objects to quickly search and manage with fine-granularity. The kinds of operations and coordination you need to do with metadata is difficult to achieve reliably when the metadata structure itself is many PB in size. There are many interesting edge cases that show up when you have to do deep paging of this metadata off of storage. Making this not slow requires unorthodox and unusual design choices that introduce a lot of complexity. Almost none of the metadata fits in memory, including many parts of conventional architectures we assume will easily fit in memory.

A mere trillion objects is right around the limit of where the allocators, metadata, etc can be made to scale with heroic efforts before conventional architectures break down and things start to become deeply weird on the software design side. Storage engines need to be reliable, so avoiding that design frontier makes a lot of sense if you can avoid it.

It is possible to break this barrier but it introduces myriad interesting design and computer science problems for which there is little literature.

heipei · 1h ago

Yeah, that was the first thing I checked as well. Being suited for small / tiny files is a great property of the SeaweedFS system.

pandemic_region · 1h ago

What happens if you put a tiny file on it then? Bad perf, possible file corruption, ... ?

jleahy · 1h ago

It's just not optimised for tiny files. It absolutely would work with no problems at all, and you could definitely use it to store 100 billion 1kB files with zero problems (and that is 100 terabytes of data, probably on flash, so no joke). However you can't use it to store 1 exabyte of 1 kilobyte files (at least not yet).

redundantly · 1h ago

Probably wasting space and lower performance.

ttfvjktesd · 1h ago

How does TernFS compare to CephFS and why not CephFS, since it is also tested for the multiple Petabyte range?

rostayob · 1h ago

(Disclaimer: I'm one of the authors of TernFS and while we evaluated Ceph I am not intimately familiar with it)

Main factors:

* Ceph stores both metadata and file contents using the same object store (RADOS). TernFS uses a specialized database for metadata which takes advantage of various properties of our datasets (immutable files, few moves between directories, etc.).

* While Ceph is capable of storing PBs, we currently store ~600PBs on a single TernFS deployment. Last time we checked this would be an order of magnitude more than even very large Ceph deployments.

* More generally, we wanted a system that we knew we could easily adapt to our needs and more importantly quickly fix when something went wrong, and we estimated that building out something new rather than adapting Ceph (or some other open source solution) would be less costly overall.

mgrandl · 43m ago

There are definitely insanely large Ceph deployments. I have seen hundreds of PBs in production myself. Also your usecase sounds like something that should be quite manageable for Ceph to handle due to limited metadata activity, which tends to be the main painpoint with CephFS.

rostayob · 19m ago

I'm not fully up to date since we looked into this a few years ago, at the time the CERN deployments of Ceph were cited as particularly large examples and they topped out at ~30PB.

Also note that when I say "single deployment" I mean that the full storage capacity is not subdivided in any way (i.e. there are no "zones" or "realms" or similar concepts). We wanted this to be the case after experiencing situations where we had significant overhead due to having to rebalance different storage buckets (albeit with a different piece of software, not Ceph).

If there are EB-scale Ceph deployments I'd love to hear more about them.

kachapopopow · 30m ago

Ceph is more of: here's a raw block of data, do whatever the hell you want with it, not really good for immutable data.

bananapub · 31m ago

seems like a colossusly nice design.

VikingCoder · 13m ago

I see what you did there.

nunobrito · 50m ago

Thanks for sharing.

sreekanth850 · 51m ago

Wow, great project.

American Prairie unlocks another 70k acres in Montana (earthhope.substack.com)

Geizhals Preisvergleich Donates USD 10k to the Perl and Raku Foundation (perl.com)

Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com)

Slack has raised our charges by $195k per year (skyfall.dev)

TernFS – An exabyte scale, multi-region distributed filesystem (xtxmarkets.com)

Luau – fast, small, safe, gradually typed scripting language derived from Lua (luau.org)

Flipper Zero Geiger Counter (kasiin.top)

The quality of AI-assisted software depends on unit of work management (blog.nilenso.com)

KDE is now my favorite desktop (kokada.dev)

Fuck, you're still sad? (bessstillman.substack.com)

Automatic Differentiation Can Be Incorrect (stochasticlifestyle.com)

Midcentury North American Restaurant Placemats (casualarchivist.substack.com)

CERN Animal Shelter for Computer Mice (computer-animal-shelter.web.cern.ch)

This Website Has No Class (aaadaaam.com)

Show HN: The text disappears when you screenshot it (unscreenshottable.vercel.app)

Meta Ray-Ban Display (meta.com)

Pnpm has a new setting to stave off supply chain attacks (pnpm.io)

Fast Fourier Transforms Part 1: Cooley-Tukey (connorboyle.io)

Rereading (maxgirkins.com)

CircuitHub (YC W12) Is Hiring Operations Research Engineers (UK/Remote) (ycombinator.com)

Tesla is looking to redesign its door handles following trapped-passenger report (cnn.com)

You Had No Taste Before AI (matthewsanabria.dev)

Mirror Life Worries (science.org)

Nvidia buys $5B in Intel stock in seismic deal (tomshardware.com)

Keeping SSH sessions alive with systemd-inhibit (kd8bny.com)

A better future for JavaScript that won't happen (drewdevault.com)

One Token to rule them all – Obtaining Global Admin in every Entra ID tenant (dirkjanm.io)

Boring is good (jenson.org)

An Afternoon at the Recursive Café: Two Threads Interleaving (ipfs.io)

Samsung confirms its smart fridges will start showing you ads (androidauthority.com)

A postmortem of three recent issues (anthropic.com)

WASM 3.0 Completed (webassembly.org)

Orange Pi RV2 $40 RISC-V SBC: Friendly Gateway to IoT and AI Projects (riscv.org)

Chaos Inside FEMA as Death Threats Distract from Hurricane Response (bloomberg.com)

History of the Gem Desktop Environment (nemanjatrifunovic.substack.com)

Ton Roosendaal to step down as Blender chairman and CEO (cgchannel.com)

Hypervisor 101 in Rust (tandasat.github.io)

Stepping Down as Libxml2 Maintainer (discourse.gnome.org)

US Government seeks deportation of Mahmoud Khalil (again) (france24.com)

Towards a Physics Foundation Model (arxiv.org)

Apple Photos app corrupts images (tenderlovemaking.com)

60 years after Gemini, newly processed images reveal details (arstechnica.com)

A QBasic Text Adventure Still Expanding in 2025 (the-ventureweaver.itch.io)

Optimizing ClickHouse for Intel's 280 core processors (clickhouse.com)

Drought in Iraq reveals tombs created 2,300 years ago (smithsonianmag.com)

How Container Filesystem Works: Building a Docker-Like Container from Scratch (labs.iximiuz.com)

U.S. investors, Trump close in on TikTok deal with China (wsj.com)

What's New in C# 14: Null-Conditional Assignments (blog.ivankahl.com)

Nvidia to Invest $5B in Intel (ft.com)

Elements of C Style (1994) (teamten.com)

TernFS – An exabyte scale, multi-region distributed filesystem

Comments (16)