Stop writing CLI validation. Parse it right the first time

81 dahlia 33 9/6/2025, 6:20:26 PM hackers.pub ↗

Comments (33)

12_throw_away · 26m ago
I like this advice, and yeah, I always try to make illegal states unrepresentable, possibly even to a fault.

The problem I run into here is - how do you create good error messages when you do this? If the user has passed you input with multiple problems, how do you build a list of everything that's wrong with it if the parser crashes out halfway through?

ambicapter · 11m ago
Most validation libraries worth their salt give you options to deal with this sort of thing? They'll hand you an aggregate error with an 'errors' array, or they'll let you write an error message "prettify-er" to make a particular validation error easier to read.
akoboldfrying · 9m ago
Agree. It should definitely be possible to get error messages on par with what TypeScript gives you when you try to assign an object literal to an incompatibly typed variable; whether that's currently the case, and how difficult it would be to get there if not, I don't know.
nine_k · 3h ago
This is a recurring idea: "Parse, don't validate". Previously:

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va... (2019, using Haskell)

https://www.lelanthran.com/chap13/content.html (April 2025, using C)

jetrink · 2h ago
The author credits Alexis King at the beginning and links to that post.
SoftTalker · 1h ago
I like just writing functions for each valid combination of flags and parameters. Anything that isn’t handled is default rejected. Languages like Erlang with pattern matching and guards make this a breeze.
andrewguy9 · 7m ago
Docopt!

http://docopt.org/

Make use of the usage string be the specification!

A criminally underused library.

jmull · 3h ago
> Think about it. When you get JSON from an API, you don't just parse it as any and then write a bunch of if-statements. You use something like Zod to parse it directly into the shape you want. Invalid data? The parser rejects it. Done.

Isn’t writing code and using zod the same thing? The difference being who wrote the code.

Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

akoboldfrying · 28m ago
Yes, both are writing code. But nearly all the time, the constraints you want to express can be expressed with zod, and in that case using zod means you write less code, and the code you do write is more correct.

> Of course, you hope zod is robust, tested, supported, extensible, and has docs so you can understand how to express your domain in terms it can help you with. And you hope you don’t have to spend too much time migrating as zod’s api changes.

Yes, judgement is required to make depending on zod (or any library) worthwhile. This is not different in principle from trusting those same things hold for TypeScript, or Node, or V8, or the C++ compiler V8 was compiled with, or the x86_64 chip it's running on, or the laws of physics.

SloopJon · 1h ago
I don't see anything in the post or the linked tutorial that gives a flavor of the user experience when you supply an invalid option. I tried running the example, but I've forgotten too much about Node and TypeScript to make it work. (It can't resolve the @optique references.) What happens when you pass --foo, --target bar, or --port 3.14?
macintux · 36m ago
I had a similar question: to me, the output format “or” statement looks like it might deterministically pick one winner instead of alerting the user that they erred. A good parser is terrific, but it needs to give useful feedback.
bsoles · 1h ago
>> // This is a parser

>> const port = option("--port", integer());

I don't understand. Why is this a parser? Isn't it just way of enforcing a type in a language that doesn't have types?

I was expecting something like a state machine that takes the command line text and parses it to validate the syntax and values.

lihaoyi · 1h ago
That's basically what my MainArgs Scala library does: take either a method definition or class structure and use it's structure to parse your command line arguments. You get the final fields you want immediately without needing to imperatively walk to args array (and probably getting it wrong!)

https://github.com/com-lihaoyi/mainargs

dvdkon · 4h ago
I, for one, do think the world needs more CLI argument parsers :)

This project looks neat, I've never thought to use parser combinators for something other than left-to-right string/token stream parsing.

And I like how it uses Typescript's metaprogramming to generate types from the parser code. I think that would be much harder (or impossible) in other languages, making the idiomatic design of a similar similar library very different.

ThinkBeat · 2h ago
And that is why there are plenty of parser generators so you dont have to write the parser yourself every time.
yakshaving_jgt · 4h ago
I've noticed that many programmers believe that parsing is some niche thing that the average programmer likely won't need to contend with, and that it's only applicable in a few specific low-level cases, in which you'll need to reach for a parser combinator library, etc.

But this is wrong. Programmers should be writing parsers all the time!

WJW · 3h ago
Last week my primary task was writing a github action that needed to log in to Heroku and push the current code on main and development branches to the production and staging environments respectively. The week before that, I wrote some code to make sure the type the object was included in the filters passed to an API call.

Don't get me wrong, I actually love writing parsers. It's just not required all that often in my day-to-day work. 99% of the time when I need to write a parser myself it's for and Advent of Code problem, usually I just import whatever JSON or YAML parser is provided for the platform and go from there.

yakshaving_jgt · 3h ago
Do you not write validation? Or handle user input? Or handle server responses? Surely there’s some data processing somewhere.
eska · 3h ago
I think most security issues are just due to people not parsing input at all/properly. Then security consultants give each one a new name as if it was something new. :-)
dkubb · 3h ago
The three most common things I think about when coding are DAGs, State Machines and parsing. The latter two come up all the time in regexps which I probably write at least once a day, and I’m always thinking about state transitions and dependencies.
nine_k · 3h ago
I'd say that engineers should use the highest-level tools that are adequate for the task.

Sometimes it's going down to machine code, or rolling your own hash table, or writing your own recursive-descent parser from first principles. But most of the time you don't have to reach that low, and things like parsing are but a minor detail in the grand scheme. The engineer should not spend time on building them, but should be able to competently choose a ready-made part.

I mean, creating your own bolts and nuts may be fun, but mot of the time, if you want to build something, you just pick a few from an appropriate box, and this is exactly right.

thealistra · 4h ago
Isn’t this like argparse from Python for typescript?
whilenot-dev · 3h ago
What OP calls an "combinatorial parser" I'd call object schema validation and that's more similar to pydantic[0] than argparse in python land.

[0]: https://docs.pydantic.dev/latest/

nhumrich · 1h ago
So, typer than
parhamn · 3h ago
> Try to access it and TypeScript yells at you. No runtime validation needed.

I was recently thinking about type safety and validation strategies are particularly thorny in languages where the typings are just annotations. E.g. the Typescript/Zod or Python/Pydantic universes. Especially in IO cases where the data doesn't originate in the same type system.

In a language like Go (just an example, not endorsing) if you parse something into say a struct you know worst case you're getting that struct with all the fields set to zero, and you just have to handle the zero values. In typescript-likes you can get a totally different structure and run into all sorts of errors.

All that is to say, the runtime validation is always somewhere (perhaps in the library, as they often are?), and the feature here isn't no runtime validation but typed cli arguments. Which is cool and great.

metaltyphoon · 2h ago
> worst case you're getting that struct with all the fields set to zero, and you just have to handle the zero values

In the field I work, zero values are valid and doing it in Go would be a nightmare

parhamn · 38m ago
Agreed, the pointer or "<field>_empty: bool" patterns are annoying. Point still stands though, you always get the structure you ask for.
HL33tibCe7 · 4h ago
Stopped reading after realising this is written by ChatGPT
akoboldfrying · 12m ago
I found the content novel and helpful (applying a known but underappreciated technique (Parse, Don't Validate) to a common problem where I hadn't thought to use it before) and the tone very enjoyable. In fact, it's so idiomatically written that I can't even believe it's just a machine translation of something written in another language.

In short, a great article.

bfung · 3h ago
Looked human-ish to me, what signs did you see?
cazum · 3h ago
What makes you think that and not that it's just an average auto-translate job from the author's native language (Korean)?
urxvtcd · 3h ago
I’ll go one step further: what makes you think it’s an average auto-translate job? I didn’t notice anything weird, felt like your average, slightly ranty HN post. I’m not a native speaker though.
AfterHIA · 57m ago
You've got to be careful; if you validate the CLI too much you might get URA in your validator. #chugalug #house