The impact of file position on code review

53 whatever3 31 7/17/2025, 6:14:41 PM arxiv.org ↗

Comments (31)

kevinsync · 5h ago
IMO file/folder structure in a project is a geographic mapping exercise. It's not that far off from city planning. I think it's so important to design the structure and naming conventions of a project with the ultimate goal of aesthetically-pleasing, human-understandable spatial organization of digital artifacts. You "go here" to "see that". This "lives here". You travel to the destination of the thing you're looking for, and gain a deeper understanding of how the world is constructed that you are inhabiting.

This is one of my pet peeves with any kind of cloud file storage -- oftentimes they try to reinvent or abstract away the idea of a deeply-nested filesystem tree, and it just feels anti-human to me lol

Edit: I hit submit and then acknowledged I was kind of ranting about something else entirely from the paper itself!

astrange · 2h ago
This is my problem with almost every UI framework invented in the overengineered era of the 90s-2000s.

Try to find a feature in basically any open-source C++ or Java GUI app just by looking for the code that does it. You can't, because there isn't any code that does anything! Instead there's several different widely scattered pieces of boilerplate, none of which individually has any meat in it.

suzzer99 · 4h ago
Agreed. I always try to design apps so that renaming or changing the file structure causes minimal disruption. I want as little friction to refactoring as possible.

I'm a huge fan of files that just work wherever you put them (think pages on a website), and then relative import paths to all their subcomponents. For subcomponents shared throughout the app, I always use absolute paths so that reafactoring my page-level components into subfolders won't break the import.

stillpointlab · 3h ago
I totally agree, and I use LLMs for this. I have a script that will output my directory structure (just the folders and filenames + the lines of code; aggregate for folders and per file). Then I ask the LLM to give me feedback. Surprisingly this has lead to some very useful feedback and improvements.
jsbg · 5h ago
> We found files shown earlier in a PR to receive more comments than files shown later

I often stop reviewing a PR if I have too many comments on it already that need to be addressed so I wouldn't say this is necessarily what it sounds like (later files get less attention). Also, a comment on an early file might address a concern in a later file that will be fixed by the time I actually review the later files.

That said, keep your PRs short and you won't have to worry about stuff like that.

devnullbrain · 3h ago
Read this as though it has a visibly bulging temple vein:

Does that mean you do the thing where I respond to your review, request a re-review, then get comments about code that was already there a week ago?!

lmm · 1h ago
If your PR had lots of stuff that needs comments then I don't think you can reasonably expect me to check it fully in the first pass. E.g. if your first 3 functions are indented wrong, I'm going to write that they're indented wrong and not bother reading the rest of your PR until you've fixed that.
zoover2020 · 4h ago
Can't seem to get anyone else to Baird the train of small CRs. It's so obvious
analog31 · 4h ago
This is kind of a weird aside, but I attended many of my kids' music lessons. The teacher would ask them to play a selection, and then stop them after the first few measures, to work on some point of technique or interpretation. It meant that they never got to the end of anything, until the recital.

"Measure position" had a strong impact on review.

I realized that perfecting the entire selection isn't the point. Instead, the point is for the student to learn how to critique themselves, and improve their practicing.

semperos · 6h ago
My coworker Toby Crawley recently developed this Firefox plugin to randomize the order of files in GitHub PR reviews:

https://addons.mozilla.org/en-US/firefox/addon/github-pr-fil...

progbits · 5h ago
People review files top down?

I can't imagine doing that. I glance at the list, and try to spot the highest level change (schema changes, interfaces, etc) and start there. If the core idea is wrong no point wasting time on the rest.

Then I go up from there to implementation details and tests. Often jumping between modules or functions, coming back to some things later when I have more context.

devnullbrain · 3h ago
I think people putting enough thought and effort into reviewing, and developing it as a skill, are a small minority.
FajitaNachos · 4h ago
I would bet the vast majority of people do this. If they didn't, or preferred something else, it wouldn't be presented in the UI this way.
taeric · 4h ago
I'm curious if this actually tells us anything about the quality of the reviews, though? You could easily surmise that you make some basic comments on structure and some patterns in a file and send it back to the coder with a general "take that advice to the full change, let me know when I can look at it again."

That is, it isn't necessarily the case that the other files weren't reviewed. Just any overarching comments could have been left as an exercise for the coder. No?

MarkusQ · 3h ago
This is why I always review the files with bugs in them first.
IceDane · 1h ago
I find it easiest to just leave out the bugs in the first place. I put them aside for a rainy day.

I now have a collection spanning a decade's worth of bugs. Just waiting for a reason to unleash them on the world. It will be biblical.

slongfield · 5h ago
I've recently started using Github code reviews for a lot of C++, and one thing that I wish it would do is show the header (.h) files before the implementation (.cc) files.

Small PRs help, but I often end up just opening a handful of windows to have everything open at once.

rwmj · 5h ago
You can't do that in github as far as I know, but you can get git to display diffs that way (eg. for local review or email based workflows). You have to use a git "orderfile". Example:

https://gitlab.com/nbdkit/nbdkit/-/blob/master/scripts/git.o...

We found it useful to display header files, interface files and documentation first, but maybe the linked paper will make us review that!

devnullbrain · 2h ago
Github similarly lacks the ability to override the diff algorithm. This is a shame, because `histogram` makes some diffs look much more like what the human intended.

The default, Myers, is from 1986!

devnullbrain · 3h ago
For this reason I usually either check it out locally or use the Github IDE (. or >)
aerostable_slug · 6h ago
Note to self: bury malicious code in "later"-reviewed files.
kridsdale3 · 5h ago
Similar, if you're going up against a promotion committee, or parole board in jail, you want to go when the judges have been recently fed.
teddyh · 3h ago
Wasn’t that debunked?
gruez · 3h ago
jpalawaga · 5h ago
One thing my colleagues and I do is expressly offer up a starting point when reviewing changes: "the meat of the change is in xxx.go; the rest is ancillary/supporting". This allows the reader to contextualize themselves more quickly in the change, and understand the rest of the changes with greater ease.

You could take an adversarial view of it: maybe it's better if people get dropped into the changes so they're left to workout if the changes in each file stand alone rather, decontextualized from their original change.

gavinhoward · 6h ago
One of the few times I've read a paper before it was posted.

I like this paper. While it's not anything groundbreaking or sensational, it did help me with the design of something I am working on.

deckar01 · 6h ago
I have always preferred small, self-contained commits for this reason. The GUI tooling for line based commits has left me wanting though. Maybe an LLM can be trained to demux commits.

No comments yet

breatheoften · 5h ago
I think the whole concept of source files as unordered blobs on the filesystem is just _wrong_ -- and a serious drag on the ability of a reader to quickly digest a codebase and make inferences about changes in isolation ...

Codebases are _ordered_ constructs -- regardless of the ridiculous broken build system abstractions that do everything possible to obscure this truth ...

Daviey · 6h ago
It's a angle I hadn't considered, but I'm aware that I likely do this so the impact of the paper is that I be more dilligent. Thank you for sharing.
12_throw_away · 6h ago
Heh, I've had certain modules where I ended up prefixing the files with `a_initialization.lang` `b_startup.lang` `c_run_the_thing.lang` [...] to make it easier to read for humans. Just like, in, say, a book, where the chapters appear in a specific order, the order of things matters! Probably this is a literate programming sort of thing?
loeg · 6h ago
Makes sense. As a reviewer I take active steps to focus primarily on the main change by "minimizing" (Phabricator) test and other incidental file changes in my first pass.