Git-Annex

130 keepamovin 31 8/25/2025, 4:18:56 AM git-annex.branchable.com ↗

Comments (31)

nolist_policy · 5h ago

I use git-annex to manage all my data on all my drives. It automatically keeps track of which files are on which drives, it ensures that there are enough copies and it checksums everything. It works perfectly with offline drives.

git-annex can be a bit hard to grasp, so I suggest to create a throw-away repository, following the walkthrough[1] and try things out. See also workflows[2].

[1] https://git-annex.branchable.com/walkthrough/

[2] https://git-annex.branchable.com/workflow/

albertzeyer · 4h ago

How much data do you have? I'm using git-annex on my photos, and that are around 100k-1M files, several TB of data, on a ZFS. In the beginning, everything was fine, but it starts to become increasingly slow, such that every operation takes several minutes (5-30 mins or so).

I wonder a bit whether that is ZFS, or git-annex, or maybe my disk, or sth else.

warp · 1h ago

My experience is the same, git-annex just doesn't work well with lots of small files. With annexes on slow USB disks, connected to a Raspberry Pi 3 or 4, I'm already annoyed when working with my largest annex (in file count) of 25000 files.

However, I mostly use annex as a way to archive stuff and make sure I have enough copies in distinct physical locations. So for photos I now just tar them up with one .tar file per family member per year. This works fine for for me for any data I want to keep safe but don't need to access directly very often.

egwor · 45m ago

One thing to check is whether any security/monitoring software might be causing issues. Since there are so many files in git repos, it can put a lot of load on that type of software.

matrss · 1h ago

I had tested a git-annex repository with about 1.5M files and it got pretty slow as well. The plain git repo size grew to multiple GiB and plain git operations were super slow, so I think this was mostly a git limitation. DataLad's approach of nested subdatasets (in practice git submodules where each submodule is a git-annex repository) can help, if it fits the data and workflows.

riedel · 4h ago

It would be great to have comprehensive benchmarks for git lsf, git annex, dvc and alike. I am also always getting annoyed with one or the other , e.g. due to the hashing overhead, etc. However, in many cases the annoyances come with bad filesystem integration on Windows in my case.

rurban · 3h ago

My guess is the windows virus scaner

_Algernon_ · 2h ago

I have thought about doing this in the past but ran into issues (one of them being the friction in permanently deleting files once added). I'd be curious how you use it if you have time to share.

goku12 · 42m ago

My only problem with git-annex is Haskell. I don't hate the language itself, but the sheer number of dependencies it has to install is staggering. Many of those dependencies are not used by anything else, or may be incompatible versions when more than one application uses it. The pain is when you install them using the system package manager. Just two Haskell applications - annex and pandoc - are enough to fill your daily updates with may be a dozen little Haskell packages. God forbid you're on a distro that installs from source!

It's quite safe to just statically link most, if not all of them directly into the application, even when some of them are shared by other applications. I have seen this complaint repeated a few times. The reply from the Haskelliers seem to be that this is for the fine grained modularity of the library ecosystem. But why do they treat it like everything starts and ends with Haskell? Sometimes, there are other priorities like system administration. None of the other compiled languages have this problem - Rust, Go, Zig, ... Even plain old C and C++ aren't this frustrating with dependencies.

I need to clarify that I'm not hostile towards the Haskell language, its ecosystem and its users. It's something I plan to learn myself. But why does this problem exist? And is there a solution?

IsTom · 21m ago

> It's quite safe to just statically link most, if not all of them directly into the application

If you're talking about distro's repos, isn't this a matter of distro and package manager policy?

aragilar · 23m ago

Which package manager are you using? I've not seen any issues with apt-based systems with Haskell?

internet_points · 3h ago

The page doesn't say it, but git-annex was created by https://www.patreon.com/joeyh who also made the wonderful https://joeyh.name/code/moreutils/ and https://etckeeper.branchable.com/

ttiurani · 5h ago

Relevant discussion 9 days ago about the new native git large object promisers in "The future of large files in Git is Git":

https://news.ycombinator.com/item?id=44916783

avar · 1h ago

Thanks, also not-so-relevant, for the reasons I noted in a comment in that thread: https://news.ycombinator.com/item?id=44922405

I.e. annex is really in a different problem space than "big files in git", despite the obvious overlap.

A good way to think about it is that git-annex is sort of a git-native and distributed solution to the storage problem at the "other side" of something like LFS, and to reason about it from there.

kajika91 · 2h ago

I'm using my self-hosted forgejo. I don't see any benefit of git-annex over LFS so far, I'm not even sure I could setup annex as easily.

Digging a little bit I found that git-annex is coded in haskell (not a fan) and seems to be 50% slower (expected from haskell but also only 1 source so far so not really reliable).

I don't see appeal of the complexity of the commands, they probably serve a purpose. Once you opened a .gitattributes from git-LFS you pretty much know all you need and you barely need any commands anymore.

Also I like how setting up a .gitattribute makes everything transparent the same way .gitignore works. I don't see any equivalent with git-annex.

Lastly any "tutorial" or guide about git-annex that won't show me an equivalent of 'git lfs ls-files' will definitely not appeal to me. I'm a big user of 'git status' and 'git lfs ls-files' to check/re-check everything.

avar · 1h ago

Annex isn't slow because it's written in Haskell, it tends to be slow because of I/O and paranoia that's warranted as the default behavior in a distributed backup tool.

E.g. if you drop something it'll by default check the remotes it has access to for that content in real time, it can be many orders of magnitude faster to use --fast etc., to (somewhat unsafely) skip all that and trust whatever metadata you have a local copy of.

seanparsons · 1h ago

LFS and git-annex have subtly different use cases in my experience. LFS is for users developing something with git that has large files in the repo like the classic game development example. git-annex is something you'd use to keep some important stuff backed up which happens to involve large files, like a home folder with music or whatever in it. In my case I do the latter.

aragilar · 14m ago

What it works really well at is storing research data. LFS can't upload to arbitrary webdav/S3/sharepoint/other random cloud service.

aragilar · 16m ago

How big are the repos you have? The largest git-annex repo I have is multiple TB (spread across multiple systems), with some files 10s of GB.

I'm not sure what you are doing, but from looking at the git-lfs-ls-files manpage `git annex list --in here` is likely what you want?

stv0g · 1h ago

There is a soft-fork of Forgejo which adds support for git-annex:

https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo

EmilStenstrom · 6h ago

Happy to see use cases front and center in command line documentation. They seem to always start with ”obscure command flag that you’ll probably never use”.

Munksgaard · 5h ago

Git-Annex is a cool piece of technology, but my impression is that it works best for single-user repositories. So for instance, as @nolist_policy described in a sibling comment, managing all your personal files, documents, music, etc. across many different devices.

I tried using it for syncing large files in a collaborative repository, and the use of "magic" branches didn't seem to scale well.

ygritte · 6h ago

Could this be abused to simulate something like SVN externals? I always found git submodules to be a very bad replacement for that.

fragmede · 6h ago

GitHub really embraced the Microsoft-esque NIH with LFS, instead of adopting git-annex.

mathstuf · 5h ago

While I also find git-annex more elegant, its cross-platform story is weaker. Note that LFS was originally a collaboration between GitHub and Bitbucket (maybe? Some forge vendor I think). One had the implementation and the other had the name. They met at a Git conference and we have what we have today. My main gripes these days are the woefully inadequate limits GitHub has in place for larger projects. Coupled with the "must have all objects locally to satisfy an arbitrary push", any decently sized developer community will blow the limit fairly quickly.

FD: I have contributed to git-lfs.

keepamovin · 6h ago

To its absolute detriment

Here is a talk by a person who adores it: Yann Büchau: Staying in Control of your Scientific Data with Git Annex https://www.youtube.com/watch?v=IdRUsn-zB2s

codemac · 5h ago

While Yann has built many things with git-annex, we should be clear that the creator of git-annex is relatively singular, Joey Hess.

keepamovin · 5h ago

Here is a comment about Joey: https://news.ycombinator.com/item?id=14908529

And an interview When power is low, I often hack in the evenings by lantern light. https://usesthis.com/interviews/joey.hess/

andrewmcwatters · 5h ago

git-annex has some really awkward documentation.

You can apparently do, sort of, but not really, the same thing git-fetch-file[1] does, with git-annex:

    git fetch-file add https://github.com/icculus/physfs.git "**" lib/physfs-main
    git fetch-file pull

`add` creates this at `.git-remote-files`:

    [file "**"]
    commit = 9d18d36b5a5207b72f473f05e1b2834e347d8144
    target = lib/physfs-main
    repository = https://github.com/icculus/physfs
    branch = main

But git-annex's documentation goes on and on about a bunch of commands I don't really want to read about, whereas those two lines and that .git-remote-files manifest just told you what git-fetch-file does.

[1]: https://github.com/andrewmcwattersandco/git-fetch-file

nolist_policy · 5h ago

Not at all. git-annex is for managing large files in git and unlike git-lfs it preserves the distributed nature of git.

keepamovin · 5h ago

Here is a guide you might like: https://www.youtube.com/watch?v=p0eVyhv2rbk

Delhi High Court Orders Sci-Hub, Libgen to Be Blocked in India (thehindu.com)

AI Bullet-Time Videos: Bullet Time Without Multi-Camera Rigs (bullettime.net)

Show HN: Diggit.dev – Git history for architecture archaeologists (diggit.dev)

Swarm Testing Data Structures (tigerbeetle.com)

Show HN: Open-source Go repo comparing DCA vs. Grid with reproducible backtests (github.com)

Reinforcement learning and VR for personalized arachnophobia treatment (arxiv.org)

Norway's Northern Lights CCS project starts operations with first CO2 injected (reuters.com)

How can England possibly be running out of water? (theguardian.com)

Got an unfair £195 parking ticket,built a tool that helped 10k people fight back (resolvo.uk)

Show HN: Printablesudoku.net – Download and play printable Sudoku PDFs online (printablesudoku.net)

Evolution of AI Powered IDEs – Comparing Aider, Warp and Kiro [video] (youtube.com)

Study finds sea-level projections from the 1990s were spot on (news.tulane.edu)

Four journalists among 20 dead in Israeli strike on hospital (bbc.com)

Adding MCP (funcall.blogspot.com)

Base: SQLite Editor for macOS (menial.co.uk)

Permutation City: First Decentralized Digital City Takes Shape (thedailydust.com)

Next-generation 3D DRAM approaches reality as scientists achieve 120-layer stack (tomshardware.com)

IO_uring Ready for Uring_cmd Multishot Support with Provided Buffers (phoronix.com)

Race to Build Lunar Nuclear Reactor Heats Up (spectrum.ieee.org)

High Levels of Toxic Metals Found in Popular Chocolate Brands (scitechdaily.com)

Lisp from Nothing (t3x.org)

The History of The New Yorker's Vaunted Fact-Checking Department (newyorker.com)

Google President Praised MAGA Speech Slamming 'Climate Extremist Agenda' (desmog.com)

Good Vibes: A Claude-Code Case-Study (taylor.town)

Censorship in the EU: Hate Speech Laws Are Suffocating Free Speech (freedom-research.org)

Pausing Insect Activity (asimov.press)

Linear Scan with Lifetime Holes (bernsteinbear.com)

Teletext in North America (computer.rip)

We built fast, fresh vector indexing at scale in CockroachDB (cockroachlabs.com)

US Black Ops Against China Is Legitimately Insane [video] (youtube.com)

MiniMax $150k AI Agent Challenge (minimax-agent-hackathon.space.minimax.io)

Which is the stronger pw? g72$l#pT9a or ..COW………………. (ianmccloy.com)

Tether hires Trump's top crypto official to help lead U.S. stablecoin expansion (fortune.com)

Private Placement Memorandum (PPM): The Anchor Document for Every AIF (aifservices.in)

Show HN: Itura – Build and use database-backed apps in chat (itura.ai)

The Ingredients Behind AI's Creativity (wired.com)

US customs regulations hamper the shipping of goods to the USA (post.ch)

Prediction-Encoded Pixels image format (github.com)

Scott Bessent bets on stablecoins to bolster demand for Treasuries (ft.com)

The Agent Builder's Reading List: What Matters (mazeez.dev)

The Startup Nation's Secret Weapon -The Power of Data Centers (medium.com)

Show HN: Mcp-plugins CLI/py equips your MCP with powerful prebuilt plugins (github.com)

Gemini API Billing Bug Causing Erroneous Charge for 'Image Generation' (discuss.ai.google.dev)

Who's Afraid of a Hard Page Load? (unplannedobsolescence.com)

Magnetic Pole Shift Causes Extinctions [video] (youtube.com)

Apple to Kick Off Three-Year Plan to Reinvent Its Iconic iPhone (bloomberg.com)

Happy 34th Birthday Linux (groups.google.com)

Where do you look for micro-influencers?

The rise and fall of socioeconomic status gradients in height around the world (cepr.org)

Many points is surely out of scope (aras-p.info)

Git-Annex

Comments (31)