Ask HN: Good resources for DIY-ish animatronic kits for Halloween?
4 points by xrd 1d ago 0 comments
Why the Technological Singularity May Be a "Big Nothing"
7 points by starchild3001 1d ago 8 comments
Formatting code should be unnecessary
320 MaxLeiter 438 9/7/2025, 11:08:42 PM maxleiter.com ↗
Source code formatting programs are not the same as lint[0] programs. The former rewrites source code files such that the output is conformant with a set of layout rules without altering existing logic. The latter is a category of idempotent source code analysis programs typically used to identify potential implementation errors within otherwise valid constructs.
Some language tools support both formatting and source code analysis, but this is an implementation detail.
0 - https://en.wikipedia.org/wiki/Lint_(software)
They slyly add git noise and pollute your audit trails by just going through and moving shit around whenever you save a file.
And sometimes, they actually insert bugs - string formatting errors are my favorite example.
It's for people who think good code is a about adhering to aesthetic ideologies instead of making things documented and accountable.
This is most noticeable in open source contributions. Sometimes I'll get a pull request with like 2 lines of change and 120 lines of some reformating tool.
You think I accept that?
It's not a good idea
This only happens because the file doesn't already adhere to the rules it's implementing. These are normally highly configurable, and once your code complies to a standard, the tool prevents future code from pulling you away from that standard.
> And sometimes, they actually insert bugs - string formatting errors are my favorite example.
Do you have a concrete example?
> Sometimes I'll get a pull request with like 2 lines of change and 120 lines of some reformating tool.
Is your existing code formatting at least consistent?
> You think I accept that?
This is a social issue rather than a technical one. You can tell people in your development readme to use specific style rules, or even a project-wide precommit hook. If your own code is formatted with one of these tools, you can even (to my understanding) set up automated checks on GitHub's side.
But of course you are free to reject any PR you want.
I left out my largest critique - spacing is semantic both for the compiler and the human.
Often I police the whitespace very thoughtfully with long comments for clarity.
I care deeply about maintainability and legibility of code and try to consider future human readers everywhere.
Then the formatter says "haha, fuck that!"
That's my biggest personal gripe with it. It's consistency over clarity, conformity over craft.
This all depends on what kind of code you're writing.
But honestly the jobs I sign up for demand that kind of care so I am really frustrated when I'm prevented from exercising my professional judgement and doing what I think is best due to some bureaucratic red tape
Formatters also value consistency over clarity.
I break formatting all the time for the sake of clarity.
Sometimes my comments are paragraphs long with citations and things are carefully broken down with interstitial comments and references and then the formatter fucks it all up and the linter says "wah this oblivious pedant rule isn't followed"
The problem is it doesn't treat me like an adult And I'm not in this industry for dumb Nanny tools that scold me because they don't understand things
You can solve the Git noise issue by enforcing formatting in CI and keeping formatter configuration in repo. This is what most high quality open source projects will do. The purpose of this is not about "adhering to aesthetic ideologies", it's about not bothering people with the minutiae of yet another pointless set of formatting conventions. Most developers couldn't give a shit less where you think braces should go, or whether you like tabs or spaces, or whatever else, they care about more important things like data structures and writing more correct code. Having auto formatting enables them to effortlessly follow project norms without needing to, for every single repo they work in, carefully try to adhere to the documented formatting (which usually winds up being inconsistent eventually anyways, in projects without auto formatting, because humans are fallible.)
The reason why people submit code with a huge formatting diff is usually because your project didn't ship a formatter manifest but their editor is configured to format on save. That's because probably most of the projects people work on now do actually use some form of automatic formatting, be it clang-format, gofmt, prettier, black, etc. so it winds up being necessary to special case your project to not try to run a formatter. It's still a beginner's mistake to actually commit and PR a huge reformatting, but it definitely happens by accident to even experienced devs when working on projects that have weird manual formatting.
This wouldn't happen nearly as much if you had a defined set of formatting rules plugged into CI instead of chaos
The reformatting tools should be CI-enforced so you'll only end up with sudden massive changes like this once when you start using auto-formatters.
Regardless, tell your teammates to separate out formatting changes vs logic changes into separate commits (preferably separate PRs). Since they're auto-formatters it wouldn't even be any additional work, just:
https://github.com/orgs/community/discussions/5033
Also, there are many linters that also do formatting, blurring the "line" you're pointing at.
Yes, I can get used to other layouts, but that by no means means all layouts are equal to me in terms of how readable they are, and how well things stand out when they should, or blend in when they should.
I recognise this isn't the case for everyone - some people read code beginning to end and it doesn't matter how its laid out. But I pattern match visually, and read fragments based on layout, and I remember code based on visual patterns.
Ironically, because I have aphantasia, and don't visualise things with my "minds eye", but I still remember things by visual appearance and spatial cues better than by text.
Sorted lists (or sorted includes) is also something that makes my life easier. If they're not sorted then everyone adds their new things to the end, which means there are many times more merge conflicts. sorted doesn't mean there are zero but does mean there are less than "append to the end". So, just like an auto-formatter is there to save time, don't waste my time by not sorting where possible.
Also, my OCD hates inconsistency. So
Is ok and Is ok but Is not. I don't care which but pick ONE style, not two styles!All that really matters is consistency. Let a team make some decisions and then just move forward.
I’ve seen companies with such a large amount of developer churn that literally one person was left defending the status quo saying “we do X here, we voted on it once in 2019 and we’re not changing it just for new people”. 90% of the team were newcomers.
(The better teams I’ve worked on maintain a core set of leaders who are capable of building consensus through being very agreeable and smart. Gregarious Technocracy >> Popular Democracy!)
Not so! Amount of tokens correlates to perceived code complexity to some. One example is how some people can't unsee or look past lisps parenthesis.
Another example is how some people get used to longDescriptiveVariableNames but others find that overwhelming (me for instance) when you have something like:
Above isn't bad, but imagine variables named that verbosely used over and over, esp in same line.Compare it to:
The second example loses some information, but I'd argue it doesn't matter too much given the context one would typically have in a function named `userSignup`.I've had codebases where consistency required naming all variables like `firstNameInputField` rather than just `firstName` and it made functions unreadable because it made the unimportant parts seem more important than they were simply by taking up more space.
And if there is something more important, then instead of of micro-optimizing the rules when there is strong disagreement it’s probably best if one of the parties takes the high road and lives with it so you can all focus on what matters.
Not to mention the overhead of running these worthless inefficient tools on every commit (even locally).
Tools like this just raise the debate from different opinions about formatting to different opinions about workflows. Workflows impact productivity a lot more than formatting.
No comments yet
But the scale of technical debt this insight has revealed is depressing.
I'm just suddenly slightly terrified someone's going to see this and think it's genuinely a good idea and make it part of the next popular scripting language, where lists are defined by starting commas or something :S
Mine hates trailing commas :)
More seriously, I don't like having lists like that in the code in the first place. I don't want multiple lines taken up for just constant values, and if it turns out to require maintenance then the data should be in a config file instead anyway.
This judgement is rather based on a strong personal opinion (which I don't claim to be wrong, but also not as god-given) on what is one, and what are two changes in the code:
- If you consider adding an additional item to the end of the list to be one code change, I agree that a trailing comma makes sense
- On the other hand, it is also a sensible judgment to consider this to be a code change of two lines:
1. an item (say 'peach') is added to the end of the list
2. 'orange' has been turned from the last element of the list to a non-last element of the list
If you are a proponent of the second interpretation, the version that you consider to be non-advantageous is the one that does make sense.
Then why not consider it four changes?
3. 'banana' has been turned from the last-but-one element of the list to the last-but-two element of the list
4. 'apple' has been turned from the last-but-two element of the list to the last-but-three element of the list
No comments yet
There do exist reasons why this can make sense:
- In an Algebraic Data Type implementation of a non-empty list, the last symbol is a different type constructor than the one to append an item to the front of an existing non-empty list (similarly how for an Algebraic Data Type implementation of an arbitrary list, the type constructor for an initial empty list is "special").
- In a single-linked list implementation, sometimes (depending on the implementation) the terminal element of the list is handled differently.
---
By the way: at work, because adding parameters at the beginning of a (parameter) list of a function is "special" (because in the code for many functions the first parameters serve a very special purpose), but adding some additional parameter at the end is not, we commonly use parameter lists formatted like
I thought I was very smart. Like, really really smart, maybe the smartest programmer in the team.
And as such my opinion was very important. Maybe the most important opinion in the team. Everyone had to listen to it!
That is all. Also, I was wrong.
This is probably the only useful takeaway, but can you explain why you were wrong?
First and foremost I was wrong thinking that I was smarter than others — that's not even how intelligence works.
Second I was wrong being so stubbornly pro-tabs / anti-spaces (for example). It doesn't make that much of a difference, so there's no point in being so passionate about it.
And third I was wasting everyone's time (and my persuasion powers) by not choosing my battles more wisely.
My suggestion would be nowadays: let's choose a popular style guide, set up a linter and be done with it.
Yes both git and all these PL are actually damn stupid to take lines at face value instead of something more elegant like Ada does. In my 20+ year career I've been proposed only once a project that involved Ada.
It's hard to come with something elegant and efficient. It's even harder to make it reach top tiers global presence, all the more when the ecological niche is already filled with good enough stuff.
I can still live with it. And I like the clean, minimal version when I don’t have to edit. Just adding that “style” can have impact beyond how it looks involving ease of editing. And it stinks when your preferences clash with the community.
About once every other project, some portion of the source benefits from source code being arranged in a tabular format. Long lines which are juxtaposed help make dissimilar values stand out. The following table is not unlike code I have written:
Even if we add 4-5 more operational parameters, I find this arrangement much more readable than the short-line equivalent: Or worse, the formatter may keep the long lines but normalize the spaces, ruining the tabular alignment: Sometimes a neat, human-maintained block of 200 character lines brings order to chaos, even if you have to scroll a little.The pain point you describe is real, which is why that was intentionally added as a feature.
Of course it requires a language that allows trailing commas, and a formatter that uses that convention.
1) Horizontal scrolling sucks
2) Changing values easily requires manually realigning all the other rows, which is not productive developer time
3) When you make a change to one small value, git shows the whole line changing
And I ultimately concluded code files are not the place for aligned tabular data. If the data is small enough it belongs in a code file rather than a CSV you import then great, but bothering with alignment just isn't worth it. Just stick to the short-line equivalent. It's the easiest to edit and maintain, which is ultimately what matters most.
I've often wished that formatters had some threshold for similarity between adjacent lines. If some X% of the characters on the line match the character right above, then it might be tabular and it could do something to maintain the tabular layout.
Bonus points for it's able to do something like diff the adjacent lines to detect table-like layouts and figure out if something nudged a field or two out of alignment and then insert spaces to fix the table layout.
And sometimes, if the code doesn't look good after automatic formatting, the code itself needs to be fixed. I'm specifically thinking about e.g. long or nested ternary statements; as soon as the auto formatter spreads it over multiple lines, you should probably refactor it.
This was more about lamenting the need for such things. Clang-format can already somewhat tabularize code by aligning equals signs in consecutive cases. I was just wishing it had an option to detect and align other kinds of code to make or keep it more table like. (Destroying table-like structuring being the main places I tend to disagree with its formatting.)
I like short lines in general, as having a bunch of short lines (which tend to be the norm in code) and suddenly a very long line is terrible for readability. But all has exemptions. It's also very dependent on the programming language.
In a post-modern editor (by which I mean any modern editor that takes this kind of thing into consideration which I don't think any do yet) it should be possible for the editor to determine similarity between lines and achieve a tabular layout, perhaps also with styling for dissimilar values in cases where the table has a higher degree of similarity than the one above. Perhaps also with collapsing of tables with some indicator that what is collapsed is not just a sub-tree but a table.
But are there more examples? May be it's not high price to pay. I'm using either second or third approach for my code and I never had much issues. Yes, first example is pretty, but it's not a huge deal for me.
However, it is the formatting I adopt when forced to bow down to line length formatters.
This is why a Big Dictator should just make a standard. Everyone who doesn't like the standard approach just gets used to it.
Thus 80 or perhaps 120 char line lengths!
Especially 80 characters is a ridiculously low limit that encourages people to name their variables and functions some abbreviated shit like mbstowcs instead of something more descriptive.
Because "I" might be older or sight-impaired, and have "my" font at size 32, and it actually fills "my" (wider than yours) screen completely?
Would you advise me to "fix my eyes" too? I'd love to!
"Why should I accommodate others" is a terrible take.
80-column line lengths is a pretty severe ask.
80 is probably too low these days but it's nice for git commit header length at least.
Just use descriptive variable names, and break your lines up logically and consistently. They are not mutually exclusive, and your code will be much easier for you and other people to read and edit and maintain, and git diffs will be much more succinct and precise.
What a terrible attitude to have when working with other people.
"Oh, I'm the only one who writes Python? Fix your setup. why should I, who know python, not write it for your sake?"
"Oh, I'm the only one who speaks German? Fix your setup. Why should I, who know German, not speak it for your sake?"
How about doing it because your colleagues, who you presumably like collaborating with to reach a goal, asks you to?
Working together with others should not mean having to limit everyone to the lowest common denominator, especially when there are better options for helping those with limitations that don't impact everyone else.
>How about doing it because your colleagues, who you presumably like collaborating with to reach a goal, asks you to?
If a someone wants me to do a certain thing in a certain way, they simply have to state it in terms of:
- some benefit they want to achieve
- some drawback they want to avoid
- as little as an acknowledged unexamined preference like "hey I personally feel more comfortable with approach X, how bout we try that instead"
I'm happy to learn from their perspective, and gladly go out of my way to accomodate them. Sometimes even against my better judgment, but hell, I still prefer to err on the side of being considerate. Just like you say, I like to work with people in terms of a shared goal, and just like you do, in every scenario I prefer to assume that's what's going on.
If, however, someone insists on certain approaches while never going deeper in their explanations than arbitrary non-falsifiable qualifiers such as "best practice", "modern", "clean", etc., then I know they haven't actually examined those choices that they now insist others should comply with. They're just parroting whatever version they imagine of industry-wide consensus describes their accidental comfort zone. And then boy do they hate my "make your setup assume less! it's the only way to be sure!". But no, I ain't reifying their meme instead of what I've seen work with my own two.
You're moving the goalposts of this discussion. The guy I was responding to said "fix your setup" to another person saying "Your table wrapped for me. The short line equivalent looks best on my screen." That's a stated preference based on a benefit he'd like to achieve.
We are not discussing "best practice" type arguments here.
Unless they have been a thing since the start of a project; existing code should never be affected by formatters, that's unnecessary churn. If a formatter is introduced later on in a project (or a formatting rule changed), it should be applied to all code in one go and no new code accepted if it hasn't passed through the formatter.
I think nobody should have to think about code formatting, and no diff should contain "just" formatting changes unless there's also an updated formatting rule in there. But also, you should be able to escape the automatic formatting if there is a specific use case for it, like the data table mentioned earlier.
But that's the core of this article, too; since then it's normalized to store the plain text source code in git and share it, but it mentions a code and formatting agnostic storage format, where it's down to people's editors (and diff tools, etc) to render the code. It's not actually unusual, since things like images are also unreadable if you look at their source code, but tools like Github will render them in a human digestable format.
With some expressions, like lookup tables or bit strings, hand wrapping and careful white space use is the difference between “understandable and intuitive” and “completely meaningless”. In JS world, `// prettier-ignore` above such an expression preserves it but ideally there’s a more universal way to express this.
Boy that was fast.
could. Yesterday notepad (win 10) just plainly refused.
Some languages (java) really need the extra horizontal space if you can afford it and aren’t too hard to read when softwrapped.
Log statements however I think have an effectively unbounded length. Nothing I hate more than a stupid linter turning a sprinkling of logs into 7 line monsters. cargo fmt is especially bad about this. It’s so bad.
Sent from my 49” G9 Ultrawide.
https://en.wikipedia.org/wiki/Line_length#cite_note-dykip-8
All that said, I'm interested with this 132 number, where does it come from?
Interesting here perhaps is that even back then it was recognized, that for different situations, different display modes were of advantage.
I'd forgotten that; now that waa a fugly font. I don't think anyone ever used it (aside from the "Setup" banner on the settings screen)
I think the low pixel count was rather mitigated by the persistence of phospher though - there's reproductions of the fonts that had to take this into account; see the stuff about font stretching here: https://vt100.net/dec/vt220/glyphs
Really suites each language imo Although I could probably get away with 80, habit to use tailwind classes can get messy compared to 120
What I actually want from a linter is “120, unless the trailing bits aren’t interesting in which case 140+ is fine”. The ideal rule isn’t hard and fast! It’s not pure science. There’s an art to it.
16:9 is rarely what you want for anything that is mainly text.
But someone will always have to either scroll horizontally or wrap the text. I’m speaking as someone who often views code on my phone, with a ~40 characters wide screen.
In typography, it’s well accepted that an average of ~66 chars per line increases readability of bulk text, with the theory being that short lines require you to mentally «jump» to the beginning of the next line frequently which interrupts flow, but long lines make it harder to mentally keep track of where you are in each line. There is however a difference between newspapers and books, since shorter ~40-char columns allows rapid skimming by moving your eyes down a column instead of zigzagging through the text.
But I don’t think these numbers translate directly to code, which is usually written with most lines indented (on the left) and most lines shorter than the maximum (few statements are so long). Depending on language, I could easily imagine a line length of 100 leading to an average of ~66 chars per line.
In my experience, with programming you rarely have lines of 140 printable characters. A lot of it is indentation. So it’s probably rarely a problem to find your way back on the next line.
For C/C++ headers I absolutely despise verbose doxygen bullshit commented a spreading relatively straightforward functions across 10 lines of comments and args.
I want to be able to quickly skim function names and then read arguments only if deemed relevant. I don’t want to read every single word.
I like splitting long text as in log statements into appropriate source lines, just like you would a Markdown paragraph. As in:
I agree that many formatters are bad about this, like introducing an indent for all but the first content line, or putting the concatenation operator in the front instead of the back, thereby also causing non-uniform alinkemt of the text content.I once made a stupid mistake of having a list of directories to delete:
Can you spot the error? I somehow forgot the comma in the list. That meant that rather than creating a tuple of directories, I created a single string. So when the `for` loop ran, it iterated on individual characters of the string. What was the first character? "/" of course.I essentially did an `rm -rf /` because of the implicit concatenation.
Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others. It is not separate from the code, as typical design documentation is, but an integral part of it, stored in metadata. Consider it an extension of variable and function naming.
There is of course "literate programming" [1], but somehow (improvements of) that never took off in larger systems.
[1] https://en.wikipedia.org/wiki/Literate_programming
My guess is it is the same reason why the most common form of creating source code is typing and not other readily available mechanisms:
Graphical visualizations are approachable representations and very useful for introductory, infrequent, and/or summary needs. However, they become cumbersome when either a well-defined repetitive workflow is used or usage variations are not known a priori.An example of both are the emacs and vi editors. The vast majority of supported commands are at most a few keystrokes and any programming language source code can be manipulated by them.
I suppose this is because nobody has been able to create good tooling for it (the visualization itself, the efficient editing, etc). You'll have to deal with the text version of it at some point if not all tools that we rely on get a version for the new visualization.
Another hypothesis is that it might not matter this much that we work with text directly after all.
> Note that, in my mind, this visualization is not automatically generated, but lovingly created by humans who wish their code to be understood by others.
If you allow manual crafting there, I suspect you'll need some sort of linting too.
I really wish we lived in a universe where a lisp became the lengua franca of the world instead of javascript, as almost happened with Netscape, but alas ...
Virtually all programming languages are parsed into ASTs, and these ASTs can be serialized back. This is what formatters/"prettifiers" usually do.
Did I miss something?
Most people probably do this. These types of discussions (probably) come up when someone else made the choice and other people also need to adhere to this choice. This is important for teams, but sometimes big egos don't want these choices made for them.
Another argument that is a pet peeve of mine is significant white-space vs curly braces. It literally doesn't matter. We often get new Python developers coming from a C# background and the amount of bitching about curly braces is so annoying. Just learn the language bro, it's not that hard.
I've worked with several Development Leads to actually define these. After the initial adjustment period, everybody's local environment setup properly: No one ever spent time reviewing style and formatting on Pull Requests.
Just decide as a team, auto-apply if possible (less than 5 seconds for big changes), enforce, and be done with it. Stop wasting everybody's time because after weeks you cannot make your mind on it and also don't tell your team/Lead about it.
Arthur Witney formats like this:
If your code was formatted automatically like that, do you think you'd get used to it after a week?My point is there is meaning of how code is formatted and there is an effect on understanding for certain people.
I think that at a certain point of "reasonable" and for most "normal" people your statements hold true, but I don't want anyone to think that every person caught up on formatting is just doing it for bike-shedding or other trivial reasons.
I don't know what is actionable if what I say is true, but it feels important to say.
See my other comment: https://news.ycombinator.com/item?id=45166670
This, however, usually doesn't effect me if the official format for a project is one way or the other because [drumroll] I just format my tree differently and then format to the official style when I push.
And there's no centralized idea on best practices.
When it comes to formatting, there's other languages (Go, Python?) that have clear, top-down guidelines applied by tooling, at least for code style. I think that's clever, and besides the odd mailing list post trying to change it because of a personal preference, it minimizes discussions about trivialities over the really important things.
Because 2 vs 4 spaces or line length discussions are ultimately futile; those aren't features, individual preferences don't matter. Codebases have millions of lines and thousands of developers; individual opinions do not matter at scale, consistency does.
Recently, I discovered that the ruff linter for Python doesn't like the assert statement, because since it does nothing in "optimized" mode it isn't reliable. But such complaints about unit tests are not particularly useful.
[1] https://docs.astral.sh/ruff/rules/
>First off, I’d suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it’s a great symbolic gesture.
https://www.kernel.org/doc/html/v4.10/process/coding-style.h...
I think you just answered your own question ;-)
Now you are bikeshedding. Just go with the defaults.
I’ve never understood why people care so much about the linter. Just let people write code and don’t worry about the linter. I don’t need to fight a linter which makes my code worse when I could just write it in a way that doesn’t suck. I promise it’ll be fine. I’m too busy doing actual software engineering to care if code is not perfectly formatted to some arbitrary style specification.
I feel like style lingers are horseshoe theory. Use them enough and eventually you wrap back around to just living without them.
99% the linter is not enforcing correctness in my experience. It's just enforcing a bunch of subjective aesthetic constraints. Which import order, max number of empty lines between statement, what type of string literal to use, no trailing white space, etc. A non trivial part of my day is spent dealing with this giant catalog of dinner etiquette. Not all of it is auto fixable. Also, there are plenty of situations where everyone would agree that violating the rule is necessary (eg. "no use before define" but you need mutual recursion). Also sometimes rules are circularly in conflict (eg you have to change a line but there is no way to do it without violating the max-line-length rule).
Linters enforcing rules that need to be broken is a pet peeve of mine, and I agree with you there. Most linters allow for using comments to explicitly exclude certain lines from being linted. This should ~never be necessary. If it is regularly necessary, then either you're programming bad (always a possibility!) or the rule has too many false positives and you should remove it.
To be frank, everyone I've worked with that complained about the linter didn't know much about their tooling. They didn't know about the fix command (even though I put it in the readme and told them about it), they didn't know how to turn on lintfix and prettier on save, wouldn't switch on git hooks and didn't know their lint failed until GitHub said so, and none of the people like this were so productive that it made up for this trait.
I find linters make me faster. Sometimes I’m feeling lazy and I just want to pump out a bunch of lines of ugly code with mappings poorly formatted, bad indents, and just have it all synched up when I save.
Don’t get me wrong: modern liners often annoy me and devs who spend a lot of time fiddling with those settings tend not to be very good programmers. But sometimes having guardrails is necessary.
No comments yet
There's a python linter named `black` and it converts my code:
into this: This `black` is non-configurable (because it's "opinionated") and yet, out of some strange cargo cult, people swear by it and try to impose it on everybody.Why are you caring about formatting? Just write your code, get it working, let Black tidy it up in the standard way. Don't worry about the formatting.
In cases where you're annoyed about some choice the formatter makes, somebody else would be equally annoyed by the choice you would rather make. There is no perfect solution. The whole point is to have a reasonable, plausible default, and to automate it so that nobody has to spend any time thinking about it whatsoever.
Running a standard formatter when code is checked in minimizes the source control churn due to re-formatting. That churn is a pointless waste of time. If you don't run a standard formatter, I guarantee that badly-formatted code will make it into source control, and that's annoying.
There's a quote from Steve Jobs (or maybe his carpenter father):
When you say "Don't worry about the formatting", what you're saying is "use a piece of plywood on the back," and I'm just not going to do that.I just honestly believe that if you fully automate the formatting, the results are better than if you do it painstakingly by hand; better by virtue of being more consistent. It's using the right tool for the job.
I don’t really care about whether the back is plywood or whatever. I don’t know how to write plywood code. I do care about creating clear, readable code that communicates my intent. Sometimes formatters help with that. Often they hinder, as they reflect the arbitrary aesthetic preferences of their creators.
The trailing comma is an improvement as it makes the diff clearer on future edits.
Edit to add: occurs to me that I oversimplified my position earlier and it probably looks like I'm trying to have it both ways. I do advocate aiming for clean and clear formatting; I'm just against doing this manually. You should instead use automation, and steer it lightly only when you have to.
For example, I explicitly don't want people to manually "tab-align" columns in their code. It looks nice, sure, but it'll inevitably get messed up in future edits. Better to do something simpler and more robust.
In the above example, if I think I have listed all of the `important_numbers`, there is a certain point of not having the trailing comma there.
Here's another terrible example from `black`:
From this:
To this: The trailing comma it added makes no sense whatsoever because I can not have an intent of adding more things -- I've already exhausted the parameters in the string!On the top of it, I don't quite get why I need to change the way I write in order to please the machine. Who should be serving whom?
Edit: changed "print" to "my_print" to not have to argue about named parameters of print ("sep", "file" etc.).
Edit 2: here's a variant that `black` has no issues with whatsoever. It does not suggest a trailing comma or any other change:
So an existence of a trailing comma is a product of string length?Who's to say you don't add a new argument to the function in the future, like
Sorry but it doesn't make any sense to me. If your argument is "a trailing comma is a good thing," it should go into any and all function calls/list declarations/etc. Who's to say I won't add this in the future:
So do I need to have this now? There's a very responsive playground at https://black.vercel.app/ and whatever it does looks strange to me, because the underlying assumptions look inconsistent one with the other (to my eye at least.) Specifically, "the length of the string should decide whether there is a trailing comma or there isn't" makes zero sense.No, the argument is quite specifically that a one line diff to add a new argument/element to the end of a list is preferable to a two line diff to do the same thing. The presence of the trailing comma is necessary to achieve that only when elements are on their own line.
Edit: sorry for using single quotes, in my 20 years of writing Python it was never an issue, but now with `black` it apparently is.
I write perfectly legible code. More legible than a linter infact. Because the rules for what is ideal are not so simple as to be encoded in simple lint rules. Sure it gets like 95%. But the last 5% is so bad it ruins the positives.
If your goal is “code that is easy to read and understand” then a linter is only maybe the first 20%. Lots of well linted code is thoroughly inscrutable.
I 100% believe you. And for god's sake please use linter.
British and American spelling are both 100% legible English. But when multiple people coauthor a book, they should stick to one instead of letting each author use their favorite spelling.
I'll gladly pay the price of making the one person's code worse if it improves the other nineteen's.
If you write and edit and read and search code every day, code formatting is rather important.
What the pattern is doesn't really matter.
Took me a sec, but well played
1. Assuming at least one person who cares about linter settings isn't utterly confused or moronic, what are their self-described reasons why they care? People's work styles, brains, and even sensory perception differ in some important ways!
2. As freedom-loving developers [1] who want to make our own choices to help our own styles of work, why should we even have to care about "enforcing" one standard for something that isn't really necessary? This one-standard-per-project thing is a downstream result of a design decision upstream (storing source code as plain text).
3. How should we design languages going forward? This brings the conversation back to top-level post (which is why we're here -- to think about what languages could be, not to rehash tired old debates, after all): how can we take what we've learned and build better languages -- perhaps ones where the primary source of truth for source code is not plain text?
[1] Slightly tongue-in-cheek. It is one thing to want to have freedom to do our jobs well, it is another thing to turn this into advocacy an overarching system such as a political philosophy or various decentralized financial mechanisms and so on. Here, I'm merely referring to the "let me do my job in the way that actually works for my brain" sense.
There's a scissor that cuts through the formatting debate: If initial space width was configurable in their editor of choice, would those who prefer tabs have any other arguments?
All of your examples work better for code with structural knowledge:
- grep: symbol search (I use it about 100x as often as a text grep) or https://github.com/ast-grep/ast-grep
- diff: https://semanticdiff.com (and others), i.e.: hide noisy syntax only changes, attempt to capture moved code. I say attempt, because with projectional programming we could have a more expressive notion of code being moved
- sed: https://npmjs.com/package/@codemod/cli
- version control: I'd look towards languages like Unison to see what funky things we could do here, especially for libraries. A general example: no conflicts due to non-semantic changes (re-orderings, irrelevant whitespaces, etc.)
It’s a really subtle difference but I can’t quite put my finger on why it is important. I think of all the little text files I’ve made over the decades that record information in various different ways where the only real syntax they share is that they use short lines (80 columns) and use line orientation for semantics (lah-dee-dah way of saying lots of lists!)
I have a lot of experience of being firmly ensconced in software engineering environments where the only resources being authored and edited were source code files.
But I’ve also had a lot of experience of the kind of admin / project / clerical work where you make up files as you go along. Teaching in a high school was a great place to practice that kind of thing.
And there are abilities we lose completely by making text the source of truth, like a reliable version control for "this function moved to a new file".
But if you store ASTs, you _have_ to have the support of each of the language for each of the tools (because each language has its own AST). This basically means a major chicken-and-egg problem - a new language won't be compatible with any of the tools, so the adoption will be very low until the editor, diff, sed etc.. are all updated.. and those tools won't be updated until the language is popular.
And you still don't get any advantages over text! For example, if you really cared about "this function moved to new file" functionality, you could have unique id after each function ("def myfunc{f8fa2bdd}..."), and insert/hide them in your editor. This way the IDE can show nice definition, but grep/git etc.. still work but with extra noise.
In fact, I bet that any technology that people claim requires non-readable AST files, can be implemented as text for many extra upsides and no major downsides (with the obvious exception of truly graphical things - naive diffs on auto-generated images, graphs or schematics files are not going to be very useful, no matter what kind of text format is used)
Want to have each person see it's own formatting style? Reformat to person's style on load and format back to project style on save. Modern formatters are so fast, people won't even notice this.
Want fast semantic search? Maintain the binary cache files, but use text as source-of-truth.
Want better diff output? Same deal, parse and cache.
Want to have no files, but instead have function list and edit each one directly, a la Smalltalk? Maintain files transparently with text code - maybe one file per function, or one file per class, or one per project...
The reason people keep source code as text as it's really a global maximum. The non-text format gives you a modest speedup, but at the expense of imposing incredible version compatibility pain.
I'm also not saying we can have all these good things, but they are not free, and the costs are more spread out and thus less obviously noticeable than the ones projectional code imposes.
If the runtime, then I bet almost no one will notice, especially if the appropriate caching is used.
If the programming-time - sure, but it's not like you can avoid parsers altogether. If the parsers are not in the tools, they must be in IDE. Factor out that parsing logic, and make it a library all the tools can use (or a one-shot LSP server if you are in the language that has hard-to-use bindings).
Note even with AST-in-file approach, you _still_ need the library to read and write that AST, it's not like you can have a shared AST schema for multiple languages. So either way, tools like diff will need to have a wide variety of libraries linked in, one for each language they support. And at that point, there is not much difference between AST reader and code parser.
Cross-language libraries don't seem to be super common for this. The recovering-sense-from-text tools I named all use different parsers in their respective languages.
Again, reading (and yes, technically that's also parsing) from an AST from a data-exchange formatted file is mags simpler. And for parsing these schemes there are battle-tested cross-language solutions, e.g. protobuf.
And yet it didn't, it reversed. I think the fact that "plain text for all source files" actually won in the actual ecosystem wasn't just because too many developers had the wrong idea/short-sightedness -- because in fact most influential people wanted and believed in what you say. It's because there are real factors that make the level of investment required for the other paths unsustainable, at least compared to the text source path.
it's definitely related to the "victory" of unix and unix-style OSs. Which is often understood as the victory of a philosophy of doing it cheaper, easier, simpler, faster, "good enough".
It's also got to do with how often languages and platforms change -- both change within a language/platform and languages/platforms rising and falling. Sometimes I wish this was less quick, I'm definitely a guy who wants to develop real expertise with a system by using it over a long time, and think you can work so much more effectively and productively when you have done such. But the actual speed of change of platforms and languages we see depends on reduced cost of tooling.
* [Difftastic](https://difftastic.wilfred.me.uk/) — my go-to diff tool for years * [Nu shell](https://www.nushell.sh/) — a promising idea, but still lacking in design/implementation maturity
What I’d really like to see is a *viable projectional editor* and a broader shift from text-centric to data-centric tools.
The issue is that nearly everything we use today (editors, IDEs, coreutils) is built around text, and there’s no agreed-upon data interchange format. There have been attempts (Unison, JetBrains MCP, Nu shell), but none have gained real traction.
Rare “miracles” like the C++ --> Rust migration show paradigm shifts can happen. But a text → projectional transition would be even bigger. For that to succeed, someone influential would need to offer a *clear, opt-in migration path* where:
* some people stick with text-based tools, * others move to semantic model editing, * and both can interoperate in the same codebase.
What would be needed:
* Robust, data-native alternatives to [coreutils](https://wiki.archlinux.org/title/Core_utilities) operating directly on structured data (avoid serialize ↔ parse boundaries). Learn from Nushell’s mistakes, and aim for future-compatible, stable, battle-tested tools. * A more declarative-first mindset. * Strong theoretical foundations for the new paradigm. * Seamless conversion between text-based and semantic models. * New tools that work with mainstream languages (not niche reinventions), and enforce correctness at construction time (no invalid programs). * Integration of semantic model with existing version control systems * Shared standards for semantic models across languages/tools (something on the scale of MCP or LSP — JetBrains’ are better, but LSP won thanks to Microsoft’s push). * Dual compatibility in existing editors/IDEs (e.g. VSCode supporting both text files and semantic models). * Integrate knowledge across many different projects to distill the best way forward -> for example learn from Roslyn's semantic vs syntax model, look into tree sitter, check how difftastic does tree diffing, find tree regex engines, learn from S-expressions and LISP like languages, check unison, adopt helix editor/vim editing model, see how it can eb integrated with LSP and MCP etc.
This isn’t something you can brute-force — it needs careful planning and design before implementation. The train started on text rails and won’t stop, so the only way forward is to *build an alternative track* and make switching both gradual and worthwhile. Unfortunately it is pretty impossible to do for an entity without enough influence.
https://docs.helix-editor.com/syntax-aware-motions.html
https://www.masteringemacs.org/article/combobulate-structure...
https://zed.dev/blog/syntax-aware-editing
Etc etc.
Without tools in mainstream editors I don't see how it can push us forward instead of saying a niche barely anyone knows about.
The goal of having every developer viewing the code with their own preferences just isn't that important. On every team I've been on, we just use a standard style guide, enforced by formatter, and while not everyone agrees with every rule, it just doesn't matter. You get used to it.
Arguing and obsessing about code formatting is simply useless bikeshedding.
https://astyle.sourceforge.net/astyle.html#_style=whitesmith
And then someone said: oh yeah? Hold my beer https://astyle.sourceforge.net/astyle.html#_style=pico
Unless it's an accessibility issue, and it is an accessibility issue sometimes.
Bah! So, what is more important? Is the average convenience of the herd more important? Average of the convenience, even if there was ever such a thing.
What if you really liked reading books in paper format, but were forced to read them on displays for... reasons?
If we had a formatting tool that operated solely on AST, checked in code could be in a canonical form for a given AST. Editors could then parse the AST and display the source with a different formatting of the users choice, and convert to canonical form when writing the file to disk.
About grep and diff working on a textual representation of the AST, it would be like grepping on Javascript source code when the actual source code is Typescript or some other more distant language that compiles to Javascript (does anybody remember Coffescript?) We want to see only the source code we typed in.
By the way, add git diff to the list of tools that should work on the AST but show us the real source code.
If we can’t progress our ecosystem because we are reliant on one very specific 50+ year old line parser, then that says more about the inflexibility of the industry to move forward than it does about the “new” ideas being presented.
Grep works great.
If languages compile to a common byte code then you just need one tool. You already see examples of this with things like the IR assembly produced by LLVM, various Microsoft languages that compile to CLR, and the different languages that target JVM.
There are also already common ways to create reusable parsing rules like LSP for IDEs and treesitter.
In fact there are already grep-like utilities that are based on treesitter.
So it’s not only very possible to create reusable tools for different languages; but these tools already exist and being used by a great many developers.
> Grep works great
For LF-separated lists it does. But if it worked great for structured content then we wouldn’t be having this conversation to begin with.
So the real choice is either:
- new tool: grep with caching reverse-formatter filter.
- new tool: ast-grep with understanding of AST serialization format for your specific language.
At least in the first case, you still have fall back.
the unix philosophy on the other hand only "thrives" if every other tool is designed around (and contains code to parse) "plain text"
And how did that work out for them?
This seems like one of the many cases where unix won out by being a lowest common denominator. Every platform can handle plain text.
The lowest common denominator rather is binary blobs. :-)
You still work with text, the text just isn't the canonical stored representation. You get diffs to resolve only when structure is changed.
You get most of the same benefit with a pre-commit linter hook, though.
What happens when you stage the line `} else return {`? git doesn't allow to stage specific AST nodes. It would also mean that you can't stage partial code (that produces syntax errors)
You would still store text, and still check out text, just transformed text. You could still check in anything you want, including partial code, syntax errors, or any other arbitrary text. Diffs would work the same way they do now.
All the same tools can exist with a text backend, and you get grep/sed support for free too!
This becomes an issue with say CI where maybe I add a gate to check something with grep. But whose format do I assume? My local (that I used to test it locally) or the canonical (which means I need to switch local format to test it)?
You would use the format on disk for the grep. "Your format" only exists displayed in your editor.
Yes, of course, because tab width is * dynamically* flexible, so initial space width isn't enough
But for "dirty-width" indents, eg, after some text that can vary in size (proportional fonts or some special chars even in fixed fonts) you can't align with spaces while a tab width can be auto-adjusted to match the other line
Perhaps this is rather a design mistake in how UNIX handles things and is so focused on text.
> Everyone had their own pretty-printing settings for viewing [DIANA] however they wanted.
I’m still confused because the specifically call the IR DIANA, and they talk about viewing the IR. It isn’t clear to me if the IR is more like a bytecode or something, or more like just the original source code with a little processing done to it. They also have a quote,
> Grady Booch summarizes it well: R1000 was effectively a DIANA machine. We didn't store source code: source code was simply a pretty-printing of the DIANA tree.
So maybe the other visualizations they could do by transforming the IR were so nice that nobody even cared to look at the original ADA that they’d written to generate it?
What I would be curious on is tracing from errors back to the source code. Nearly every language I’ve used prints line number and offset on the line for the error. How that worked in the Diana world would be interesting to learn.
[1]: https://github.com/Wilfred/difftastic
Yes. Because Yaml exists. And mixing tabs and spaces is horrible in it. And the rules are very finnicky.
Optimal tab usage is emit 2-4 spaces.
xslt was a Diana like pre-parsed representation of dsssl. oh how I miss dsssl (a scheme based sgml transformation language) but no. dsssl was a lisp! with hygienic macros! "ikes" they went and invented XSLT.
the "logic" escapes me to this day.
no. plain text it is. human readable. and grep/sed/diff able.
https://naildrivin5.com/blog/2013/05/17/source-code-typograp...
(That said, it must be possible to make a more sophisticated formatter for the source code too.)
I wouldn't draw any conclusions about autoformatters from clang-format.
What’s the point of such an heavy obfuscation of the intend, really? Let’s take the first example.
If we are fine with the "lengthy" register, why not use character in full word? Or if we want something shorter sign would be actually semantically more on point in general.What with the star to design a pointer? Why not sign-pointer? Or pin for short if we dare to use a pretty straightforward metaphor, so sign-pin. Ah yes by the way, using "dot" (.) or "dash, greater than" (->) is such a typographical non-sense.
And as a side note *char brings nothing in readability compared to sign-pin-pin. Remember that most people read words or even word sequences as a whole. And let’s compare **char to something like sign-pin-back-5.
What with strcpy? Do we want to play code-obfuscation to look smart being able to decode this pile of letter sequence? What’s wrong with string·copy* or even stringcopy (compare photocopy)? Or even simply copy? If we want to avoid some redundant identifier without relying on overriding through argument types, English is rich in synonyms. For example duplicate, replicate, reproduce.
Various parentheses could be just as well optional to ease code browsing if proper typography is already on place, and English already provide many adverb/preposition that could replace/complement them into a linguistically more usual counterparts.
Speaking about prepositions, using from and to as identifiers for things which would be far more aptly described with nouns is really such a confusing choice. What’s wrong with origin/source and destination/target? It’s also a bit counterproductive to put the identifier, which is the main point of interest, at the very end of it’s declaration statement.
Equal for assignment is just really an artifact of more relevant symbol like ← or ≔ because most keyboard layouts stem from disastrous design. But using an more adequate symbol is really pushing for unnecessary obscured notation.
Mandatory semicolon to end a statement is obviously also a typographical nonsense.
If a parameter is to be left blank in for, we would obviously be better served with a separate control-flow construction rather than any way to highlight it’s not filled in that employ.
So packing it all:
Given that in that case the parentheses and comas are purely ornamental, the compiler could just ignore them and would have enough information with something like Or evenNow explain a declaration like "char *argv[]"...
> We’ve also re-set the data type such that there is no space between char and * - the data type of both of these variables is “pointer to char”, so it makes more sense to put the space before the argument name, not in the middle the data type’s name (update: it should be pointed out that this only makes sense for a single declaration. A construct like char* a, b will create a pointer to char, a, and a regular char, b).
Ah, yes, the delusional C++ formatting style. At least it's nice that the update provides the explanation why it should be avoided.
You also don't think about dollars differently than other units, just because the sign goes before the number.
> Some of us even align other parts of our code, such repeated inline comments
> Now, the arguments block forms a table of three columns. The modifiers make up the first column, the data types are aligned in the second column, and the names are in the third column
These feel like pretty trivial routines that can be encompassed by code formatting.
We can contrive more extreme examples, like the for loop, but super custom formatting ("typesetting") like that has always made me feel awkward, feels like it givesicemse for people to use all manners of arbitrary formatting. The author has some intent, but when you run into an inconsistent code based with lots of things going on, the variance doesn't feel informative or helpful: it sucks and it's a drain.
What's stored is perhaps more minimal, some kind of reference encoding, maybe prettier-ifies for js. The meat of this article to me is that it shouldn't matter: the IDE should let you view and edit as you like:
> Everyone had their own pretty-printing settings for viewing it however they wanted.
Status quo fallacy alert. Arguments are not forever mired in a current state of affairs. People can learn and can build tools to help them do better.
This could change quickly; e.g. if Claude or GitHub or (Your Team) decide to prioritize how source code looks.
1. The developer has enough experience to understand that formatting matters.
2. The developer has enough discipline to stick with their chosen formatting rules.
3. The developer has the taste necessary to choose good formatting rules.
4. The developer has the judgement necessary to identify when other concerns justify one-off violations of the rules.
These are really important attributes for a developer to have. They affect every aspect of the code, not just formatting. Formatting is just a very quick proxy to measure those by.
Unfortunately, things like autoformatting and linter rules are destroying the signal. Goodheart's law strikes again.
To go through the details: The post explicitly complained about a linter enforcing style rules. It did not object to the presence of mechanically-enforced style rules. In fact, it glorified them implicitly by saying how great it would be if everything was formatted at presentation-time. This glorification is the exact thing I was criticizing.
I think machine-enforced rules are bad because they destroy a communication channel that importantly has point 4 that I listed - when well-formatted code breaks its conventions, there must be a reason for it. That is important information that enforced presentation rules force to be put into another channel.
And it's certainly true that other channels do convey this other information, but I find more value in having it conveyed in the presentation channel than I do in having that channel replaced by mechanistic formatting.
This is the premise underlying the article that I object to. It is present so heavily in the subtext that if you pretend it's not, the post becomes incoherent.
And FWIW, HN rules say not to accuse people of not having read the article. I think that rule is mostly there because someone can read the article and notice something you missed, and it's wiser to not post than it is to assume you absorbed 100% of the context of the post.
- they have probably never worked on a codebase where files are edited by more than 1 person
- they have never done any significant amount of merging between branches
- they have never maintained a large codebase
- they have never had to refactor a large codebase
- they don't use diff/comparison tools to read the history of their codebase
- they have never written any tooling for their codebase
- they are not good team-players and/or only care about their own stuff
Furthermore, instead of nitpicking over small details, it can actually be a good idea to just leave everything on default, forgo whatever your individual style might be and stick to what's been deemed to be good enough as the default - so the code will look more familiar to anyone who picks it up (and has used the tools you use for linting and formatting). Yes, formatting is different from linting; though if you set up one, you might as well do the other.
In my very limited experience, I learned the importance of penmanship in that profession.
In my much larger experience since, I've learned the irrelevance of penmanship to writing code. I don't practice my blueprint handwriting anymore. It would be wholly unfit-for-purpose without a bunch of practice. But I understand its value in that context.
If I understand the thrust of your comment correctly, you're pointing towards removing formatting as a channel being a net positive, despite the loss of all these indicators. I might almost agree with that, except for my point 4. Sometimes it's better, on the whole, to break conventions. Mechanical formatting systems cannot make these judgement calls.
I think the minor friction of explicit formatting is a net positive. I think the communication channel it adds carries more value than the friction it imposes hurts. (And I'm calling it explicit formatting because it doesn't have to be manual - it just has to be done with intention, judgement, and approval.)
I don't think the massive friction imposed by submitting code as ink on paper provides enough value to be worth its costs, by contrast.
> The developer has the taste necessary to choose good formatting rules
Rely on this and you’re in trouble. More time will be lost just to argue which style is better. Go with the in-built formatter way of Go and Rust
It's talking about the Ada programming language and that its code was apparently stored not as plaintext but an intermediate representation (IR) that could then be transformed back into code.
So formatting was handled by tooling by the nature of the setup. Developers would each have their own custom settings for "pretty printing" the code.
The author isn't saying don't use code formatters. They're highlighting an unusual approach that the industry at large isn't aware of. Instead of getting rid of arguments about code style via formatters, you can get rid of them by saving code in an IR instead of plaintext.
Refactorings (when done right) are syntax tree transformations that preserve things like referential integrity, etc. that ensure code does the same thing before and after applying a refactoring.
A rename becomes trivial if you are simply working on the symbol directly. For that to work with file based source trees, you need to parse the whole thing, keep track of where symbols are referred in files, rename the symbol and then update all the places in the source tree. That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree. The symbol just gets a different name. Anything that uses the symbol will still use the same symbol.
People like editing files of course and that has resulted in a lot of friction developing richer tools that don't store text but something that preserves more structure. The fact that we're still going on about formatting issues a quarter century later maybe shows that this is something to revisit. For many languages and editors, robust symbol renames are still somewhat science fiction. And that's just the most basic refactoring.
> That stuff becomes a lot easier when the code representation isn't a bunch of files but the syntax tree
You are just mixing abstraction layers here. That syntax tree still needs to be stored in file(s) somehow, and nothing prevents having syntax tree aware (or smarter) tooling operating on human readable files. Basically deserializing AST and parsing source code are the same thing. The storage format really isn't that significant factor here.
So what is needed is better tools rather than fiddling with storage format. Microsofts Roslyn is obvious example, but plenty of modern compilers are moving in the direction of exposing APIs to interact with the codebase.
Sure, but there are less flaky ways than spreading a syntax tree across files. Visual Age actually used a database for this back in the day. Smalltalk did similar things by storing code in an image file that contained both byte code and method definitions. You could export source code if you wanted. But wouldn't do that while developing typically. That's not an approach that caught on. But it has some advantages.
What you are describing is what Eclipse did with Java. Eclipse was the successor to Visual Age. The Eclipse incremental compiler for Java updated an internal data structure for the IDE. It could do neat things as partial compilation to enable running tests even in the presence of some compile errors. It also was really fast. By the time you stopped typing, it would have already compiled your code. Running the tests was similarly fast.
The problem of syncing a tree of source files with an AST is just a bit hard. Intellij never came close to this and has always had lots of trouble keeping its internal caches coherent. There's even a top level "invalidate caches" option in the File menu (still there, I checked. Right next to the Repair IDE option). They were off by 2-3 orders of magnitude. Seconds (at best) instead of milliseconds. I still miss Eclipse's speed every day I use Intellij.
Some compilers are taking some steps to supporting more advanced IDEs. But there aren't a lot of those beyond what Jetbrains provides. VS Studio Code support varies between different languages. But mostly it's very limited on this front. The Rust compiler is one of those. Though I don't know the current state of that. Mostly it's not well known for its blazing performance (the compiler). I'm not sure if Jetbrains leverages many of those features in its Rust IDE (I'm not a Rust developer).
Consider the following (pseudo-)code example:
Should this code formatted this way? Or should it be formatted to emphasize that three assignments are done?Or should this code be formatted
to bring make the "depth" of the structure variables more tabular so that you can immediately see by the tabular shape which "depth" a member variable has?We can go even further like
which emphasizes that the author considers it to be very important that the reader can easily grasp the magnitudes of the numbers involved (which is why in Excel or LibreOffice Calc, numbers are right-aligned by default). Or combining this with making the depth "tabular": Each of these formattings emphasizes different aspects of the code that the author wants to emphasize. This information cannot be deduced from some abstract syntax tree alone. Rather, this needs additional information by the programmer in which sense the structure behind the code intended by the programmer is to be "interpreted".Storing the AST instead of the text is a lossy encoding, but would we lose something more valuable than what we gain? If your example is the best thing we’d lose - i’d say it’s still net a massive win.
and there are ways to emphasize different parts, that would survive the roundtrip to AST. E.g. one way to emphasize depth:
or to emphasize the data: Or heck you could allow style overides if you really wanted to preserve this kind of styling:Here’s an old video of JetBrains MPS rendering a table from code https://www.youtube.com/watch?v=XolJx4GfMmg&t=63s
I’m hoping for an IDE able to render dictionaries as tables -- my wishlist doesn’t stop there.
Currently, we have a glimpse of those features, such as code folding, inlay hints, or docstrings rendered as HTML:
https://x.com/efortis/status/1922427544470438381
The plain text encoding itself exists in a process of incremental, path-dependent development from Morse Code signals to Unicode resulting in a "Gigantic Lookup Table" (GLUT, my coining) approach to symbolic comprehension. The assumption is useful - lots of features can "just work" by knowing that a particular bit pattern is always a particular symbol.
If we push up the abstraction level, we get a different set of symbols that are better suited to the app, but not equivalent GLUT tooling. Instead we usually get parsing of plain text as a transport. For example, CSV parsing. It is sloppy; it is also good enough.
Edit: XML is also a key example. It goes out of its way to respect the text transport approach. There are dedicated XML editors. But people want to edit it as plain text and they can't quite get there because funny-business with character encodings gets in the way, adding a bunch of ampersands and semicolons onto the symbols they want to edit. Thus we have ended up with "the CSV of hypertext documents", Markdown.
https://github.com/airbnb/javascript/issues/1271
https://github.com/airbnb/javascript/issues/1122
I literally spent over an hour when adapting an existing project to use the airbnb config, when code was perfectly correct, clear and maintainable. I ended up disabling those specific rules locally. I never used it in another project. (Looks like the whole project is no longer maintained. Good riddance.)
The airbnb config is, in my view, the perfect example of unnecessarily wasting people's productivity when linting is done badly.
With things like treesitter and the like, I sometimes daydream about what an efficient and effective HCI for an AST or IR would look like.
Things like f#s ordered compilation often make code reviews more simple for me, but that’s because a piece of the intermediate form (dependency order) is exposed to me as a first class item. I find it much more simple to reason about compared to small changes in code with more lax ordering requirements, where I often find myself jumping up and down and back and forth in a diff and all the related interfaces and abstract classes and implementations to understand what effect the delta is having on the program as a whole.
All of this seems doable, I just think for the most part we don't care very much about our preferences, it has very little impact on readability. Its definitely doable however we could view the code however we most wanted it and have it stored in a different formatting. Might not be 100% round trip stable but it probably doesn't matter.
There is always better where the defaults can be overridden and formatting forced and we only format new and changed lines to reduce potential instability but again go fmt doesn't really suffer from this so its possible to make things pretty reliable. Its simple really, there is a default formatting and the code is stored that way and we can then have our view of choice reformat the code as we want it, when its stored its stored in the default.
I guess lisp still has whitespace? That seems like the only meaningful way it isn't already just what the post is describing.
In theory a system could be made where this level of code isn't what's actually stored and is just a reverse pretty-print-with-my-preferences version of the code, as the post mentions. SBCL compiles my function when I enter it, I can ask SBCL to describe it back to me:
I can also ask SBCL to show me the disassembly, perhaps again in theory a system could be made where you can get and edit text at that level of abstraction before putting it back in. (SBCL does actually let you modify the compiled code directly if you felt the urge to do such a thing. You just get a pointer to the given origin address and offset and write away.)But just going back to the Lisp source form, it's close enough that you could recover the original and format it a few different ways depending on different preferences. e.g. someone might prefer the first expression given to handler-case to be on the same line instead of a new line like I did. But to such a person, is that preference universal, or does it depend on the specific expressions involved? There are other not strictly formatting preferences at play here too, like the use of "cl-bcrypt" vs "bcrypt" as package name, or one could arrange to have no explicit package name at all. My own preferences on both matters are context-sensitive. The closest universal preference I have around this general topic is that I really hate enforced format tools even if they bent to my specific desires 100% of the time.
I'd say the closest modern renditions of what the post is talking about are expressed by node editors. Unreal's Blueprints or Blender's shader editor are two examples, ETL tools are another. But people tend to work at the node level (and may have formatting arguments about the node layout) rather than a pretty-printed text representation of the same data. I think in the ETL world it's perhaps more common to go under the hood a little and edit some text representation, which may be an XML file (and XML can be pretty-printed for many different preferences) or a series of SQL statements or something CSV or INI like... whether or not that text is a 'canonical' representation or a projection would depend on the tool.
That's true, but there is a very big difference between S-expressions stored as text and other programming languages stored as text because there is a standard representation of S-expressions as text, and Common Lisp provides functions that implement that standard in both directions (READ and PRINT) as part of its standard library. Furthermore, the standard ensures READ-PRINT equivalency, i.e. if you READ the result of PRINTing an object the result is an equivalent object. So there is a one-to-one mapping (modulo copying) between the text form and the internal representation. And, most importantly, the semantics of the language are defined on the internal representation and not the textual form. So if you wanted to store S-expressions in, say, a relational database rather than a text file, that would be an elementary exercise. This is why many CL implementations provide alternative serializations that can be rendered and parsed more efficiently than the standard one, which is designed to be human-readable.
This is in very stark contrast to nearly every other programming language, where the semantics are defined directly on the textual form. The language standard typically doesn't even require that an AST exist, let alone define a canonical form for it. Parsers for other languages are typically embedded deep inside compilers, and not provided as part of the standard library. Every one is bespoke, and they are often byzantine. There are no standard operations for manipulating an AST. If you want to write code that generates code, the output must be text, and the only way to run that code is to parse and compile it using the bespoke parser that is an opaque part of the language compiler. (Note that Python is a notable exception.)
By that I mean highlighting the diff between these:
With the diff highlighting the `car` changed to `cdr` rather than just the raw lines being changed.I'm pretty sure this exists, but it's uncommon (at least to me its uncommon).
Also, structural diff is actually a very hard problem.
I had never heard of DIANA but I love old ideas being new again. (Plus you made me laugh)
e.g. if the formatter is really shifting stuff around, your code might be too nested - if you have a compiler, let it take the strain.
Black is great, but maybe it's just me since it aligns with how I like the code formatted.
Would there be any downsides for python (or git ?) to define a standard way of formatting to save a valid file, and all the formatting necessary to read a file happens in the IDE showing the file ?
That would very much fit with python ethos 'There should be one-- and preferably only one --obvious way to do it.'
I can't see a crazy huge downside from a python point of view, but seems like a much bigger upside than flexible formatting would be needed to justify breaking from all of that stuff.
Actually, this could be a really easy feature for the IDE and could work already easily.
You have to get everyone set up to use it, whereas everyone is already, of necessity, set up to use plain text.
And we aren't all using the same programming language and the same hardware setup.
Thus, specifically:
* everyone has to agree on an IR standard; if it can't accommodate every programming language, then there needs to be coverage for all the programming languages, and a way for software systems to know which one to use
* everyone has to have local software that can convert back and forth (they can't just rely on something built in to the "development system", I assume burned into a ROM)
* everyone's version control setup has to invoke that software as a commit hook
* the IR has to be designed in a way that allows for meaningful diffs, and the version-control software needs to be aware of how to diff and patch (which potentially also means a new standard for diff files)
They don't have to agree on anything.
The poster child for this is Smalltalk. An untraditional environment despite being around 60 years. The source code is stored locally in an internal file tied directly to the runtime image. You can export/import code through "traditional" avenues, but not just anything, it needs to be structured source code. You can't readily move raw text in and out.
Despite this impedance mismatch with "the rest of the world", ST folks have been developing code and collaborating for decades. They even manage to get things accomplished.
Also, consider many of the modern logging platforms that are logging to databases rather than just raw files. While some grouse about that, others manage to make do. Letting the structured log managers handle the lower level details and provide a better UX.
The game is to make sure that your UX for your system is capable of the task, not worrying about interoperating with everyone else.
There are most likely good reasons why Ada and DIANA are not in widespread use.
Its such a cool idea, though I haven't spent much time using it in anger, so its hard to say if its a useful idea.
I'm just waiting for a breakthrough project to show that it's ready for wider adoption. Leaving text-based tooling is a big ask.
The principles behind Unison, for those who haven't read them yet: https://www.unison-lang.org/docs/the-big-idea/#richer-codeba...
> Each Unison definition is identified by a hash of its syntax tree.
The project is dead enough that they no longer own the TLD for the company. As far as I know, the only remnants of the project are youtube recordings of demos held at conferences.
If you want everyone to see their own preference of format, either write a script or get AI to format it for you.
A. not everyone on your team is using prettier
B. not everyone is using the same config/agrees on what it should be
I heard this, many years ago, when we used Perforce. The Perforce consultant that we dealt with, told us this, as an example of triggers. Back then, I was told that Google was a big Perforce shop (maybe just a part of Google. I dunno).
I have heard that this was one of the goals of developing IDLs. I think the vision was, that you could have a dozen different programmers, working in multiple languages (for example, C for the drivers, Haskell for the engine, and Lua for the UI). They would be converted to a common IDL, when submitted to configuration management, and then extracted from that, when the user looks at it.
I can't see that working, but a lot of stuff that I used to think was crazy, has happened, so, who knows?
I was on an internal tools team doing distinctly unsexy LAMP-stack work, but all the documentation I ever saw talked about perforce/p4.
With modern tools it it is easy to add formatting on saving or on commit. So I don't understand what's the fuss about.
At the same time, for the most important tool in software engineering, Git, it matters which lines are changes. And it is better to only see actual logic changes, not swamped in tabs vs space or other parts that are just formatting.
That said, I would love to see more of this splitting between actual internal representation and view. Don't like anything in style guide (or even syntax alike curly brackets vs indentions) - just change view, alike folding.
Yes, we should expect better from our tools and languages.
I can write code that (IMHO) is substantially better than any formatter. But I've realized that there is no way to make other people on a team have the same opinions and skill as me, so I accept automatic code formatters.
But I'll also mention that this pretty much already exists. You can have whitespace options for git. I also imagine there's some setup using hooks that uses one formatter locally, and another for remote.
Also, the common IR already exists - it's just the AST. It was "solved" back in the day when people were throwing whatever they could to the wall to see what sticks since it was all so new. With the benfit of hindsight, I think we can say that it's not that good of an idea.
https://git-scm.com/book/pt-br/v2/Customizing-Git-Git-Attrib...
What about comments? Were they part of the IR?
(I agree with others that version control, grep etc. are also very important, and kind of a deal breaker).
Raw text is amazing at smaller scales. The ability to apply a bunch of intermediate incorrect transformations to reach a valid destination is invaluable (like doing a bunch of hacky find/replace).
Projectional editors like JetBrains MPS have tons of disadvantages vs text, and the few advantages don't make up for it.
Formatting is a silly problem to have, but far beyond that why are we manipulating text files directly rather than editing a live program (ala Smalltalk). Text can just be the on-disk serialization format you never look at.
(Raw text is still how you edit individual functions and methods in Smalltalk, there just isn't any actual text file on disk)
Code is flat out complicated, with lots and lots and lots of steps, each with perhaps even more detail.
And it's hard to do that efficiently with visual editors. Imagine a display with instead of thousands of lines of code, you have thousands of symbols.
Or, the visual editors break things down in to components that are so small they do not convey the "big picture" well.
It's a personal complaint with the way Smalltalk works. Lots of methods, small (ideally) snippets of code, all viewed in isolation.
It's common (at least for me) to put related code together in the source file. It's useful to scan the whole file to get a feel for the flow of the code, and the system. Looking at isolated code, out of context, has always been a struggle for me. There's a reason my code is not sorted alphabetically by function name.
Maybe if you organized code visually, that is, perhaps the upper left is the start up code, the lower right is some core math all collected together like beads in a pot. "All red ones go here, all the 1" ones go there".
Granted I have not worked on such a tool or such a project. But the linear presentation of code as structured text has worked well for me, even when I bounce around between modules in the IDE.
Though it too breaks down, because the relations between various bits of code may be so complex that there's no good way to "linearize" them.
And you should be documenting your code, but documentation comments take up space on the screen since they are linearly arranged in the same file, so you see less functions at a time.
Imagine as an alternative that the documentation was presented on a side view of the functions, like how you can open two files side by side in VSCode. Then you'd be able to see many more functions at the same time.
If you have any unit tests then it would be great if you could see them (and run them) while editing the function. In Rust you can put tests in the same file as the function (very nice) but usually on a submodule at the bottom of the file rather than near the function itself. Again, the problem of trying to linearize everything in a single file.
The issue with visual programming tools is that they don't put any thought into this. On how they could actually help get you the information you want to see. Instead they focus on letting you make cute drawings.
It's a UX problem, we should be able to do better than text files, even if what we end up editing is still text (because of all the advantages it has).
https://btmc.substack.com/p/thoughts-on-visual-programming
https://www.unison-lang.org/
There's nothing special about whitespace (unless you write python).
Capitalization and a bunch of other stuff in your coding convention document are usually just signs that you have poor tooling and lack of skill.
Give me a PR that satisfies the requirements and the appropriate test cases and i'll happily rewrite it to spaces only indented with curly braces on newlines and etc... as I see fit.
The hard part is the first two tasks, you can train an intern to do the third
However 'if (x) == (1) {}' is totally fine with the formatter. As is an assignment of '(x) = (y)'.
It's actively annoying too because like, extra parenthesis often have important meaning.
For example, consider the following code:
In that case, the code is obviously temporarily commented out, but go's formatting will make it so that if you comment it out like that, fmt, and then uncomment it and forget to re-add the parens, you get shot in the foot.I've hit that far more times than it's uhh... I dunno, I guess removed parenthesis I didn't want? I don't write them if I don't want them.
"It must have been good because Grady Booch says so".
Code should be generally written so it's easy to read.
<picks pop-corns>
I've never found a single formatter that formats my way though...
On those machines you were able to abbreviate keywords.
At the same time, they support full screen editing. That meant you could just cursor up over some code, make changes, hit enter, and the changes would take place.
However, when using the abbreviations, it was possible to create lines that were too long. I don't recall the specifics, but there was a line limit for BASIC input. Lets say it was 80 chars (for discussion).
Using abbreviations (like ? for print) and you could end up with a line that would LIST for more than 80, but if you tried to change it with the screen editor, the lines would be too long, and truncate silently.
So you had to be cautious with your use of the abbreviations.
The speccy was more advanced in terms of this (as mentioned in the parent comment), and it had the better BASIC for sure.
Imagine Java if you could…
But formatting still doesn't matter. Outside of whitespace-dependent languages, formatting is a subjective thing -- it's a people concern, not a computer concern. I can store my JavaScript as AST if I want to.
Leave code format up to the primary owner of the file. It is pretty rare that code has more than one person that does 95% of the edits on a file so let them own the formatting. In the rare case where there are shared files with shared edits then it is ok to mandate some sort of enforced format but those are so rare that it generally isn't worth discussing. The proposed approach here ignores all the messy non-standard stuff that happens because of the margins or the rules that are very hard to build in when codifying personal coding style.
Let me have my messy desk and I'll let you have yours.
Would a few decades help in universally having such a translator in all the tools?
i wonder how many default formatting decisions are made this way (including go fmt, etc)
Something between "everything fits on one short line" and "every argument gets its own line" would be nice too. Spreading a function definition or call across ten lines when it would fit on two or three doesn't feel like an automatic win.
I'm mainly just being pedantic to be honest, I realise my comment is just me essentially saying "what could possiblye go wrong?"
The bigger problem is you now need custom tooling for your IDE, version control, diff & merge, code review, code hosting, etc. etc.
re: intermediate representation and projectional editing: yes, editors are now getting better at helping you refactor code (rename function in language XYZ is possible in language servers for IDEs, /no AI required, it works better when a human coded AST tool does it/)
projectional editors aren't around /because the more complex parts of it are harder/ - BUT - I could definitely see more intelligent refactor tooltips written by humans.
For example: in Rust, if I've been passing a pointer vs borrowing (or whatever), pattern A for most of my code, then pattern B and it complains, it would be useful to have a tooltip that goes "do you want to refactor all the other references/parameters to pattern B" instead of Rust's default "this function isn't using pattern A" borrow checker error.
btw: have a look at how much disdain was reserved for systemd and its pletora of binary blobs + custom tools (e.g. the journal stuff) ... and that was basically forced upon from the distributions
It doesn’t get much less formatted than Minified JavaScript, except maybe Perl or Brainfuck.
To have something that sometimes checks the types and some times does is not a feasible solution.