Parsing JSON in Forty Lines of Awk

68 thefilmore 26 6/28/2025, 3:23:48 PM akr.am ↗

Comments (26)

latexr · 1h ago

> Another option is to use Python, which is ubiquitous enough that it can be expected to be installed on virtually every machine

Not on macOS. You can do it easily by just invoking /usr/bin/python3, but you’ll get a (short and non-threatening) dialog to install the Xcode Command-Line Developer Tools. For macOS environments, JavaScript for Automation (i.e. JXA, i.e /usr/bin/osascript -l JavaScript) is a better choice.

However, since Sequoia, jq is now installed by default on macOS.

ameliaquining · 1h ago

I feel like the takeaway here is that jq should probably be considered an indispensable part of the modern shell environment. Because without it, you can't sensibly deal with JSON in a shell script, and JSON is everywhere now. It's cool that you can write a parser in awk but the point of this kind of scripting language is to not have to do things like that.

chaps · 7h ago

Awk is great and this is a great post. But dang, awk really shoots itself so much with its lack of features that it so desperately needs!

Like: printing all but one column somewhere in the middle. It turns into long, long commands that really pull away from the spirit of fast fabrication unix experimentation.

jq and sql both have the same problem :)

thrwwy9234 · 6h ago

  $ echo "one two three four five" | awk '{$3="";print}'
  one two  four five

chaps · 3h ago

Oh dang, that's good.

shawn_w · 2h ago

As long as you don't mind the extra space in the middle.

chaps · 46m ago

Often times I don't! Entirely depends on what I'm doing. #1 thing off the top of my head is to remove That One Column that's a bajillion characters long that makes exploratory analysis difficult.

saghm · 43m ago

I wonder if it's possible to encode the backspace character in the replacement string?

fragmede · 1h ago

if you do:

    sed 's/  / /g'

lucb1e · 25m ago

Or sticking with awk, I have this bash alias to remove excess whitespace that is just:

    awk '{$1=$1};1'

SoftTalker · 5h ago

> awk really shoots itself so much with its lack of features that it so desperately needs

Whence perl.

librasteve · 1h ago

or raku

https://github.com/moritz/json/blob/master/lib/JSON/Tiny/Gra...

saghm · 45m ago

I suspect the rationale for Perl is that most Linux systems will probably have it installed already. Installing something you're familiar with is great when you can, but I'm guessing the awk script linked to here was picked more for its ubiquity than elegance.

chaps · 14m ago

Kinda, but not really. Of the infrastructures I've worked on, not a single one has been consistent in installing perl on 100% of hosts. The ones that get close are usually like that because one high up person really, really likes perl. And they send a lot of angry emails about perl not being installed.

Within infrastructures where perl is installed on 95% of hosts, that 5% really bites you in the ass and leads to infrastructure rot very quickly. You're kinda stuck writing and maintaining two separate scripts to do the same thing.

jcynix · 4h ago

>awk really shoots itself so much with its lack of features that it so desperately needs!

That's why I use Perl instead (besides some short one liners in awk, which in some cases are even shorter than the Perl version) and do my JSON parsing in Perl.

This

diff -rs a/ b/ | ask '/identical/ {print $4}' | xargs rm

is one of my often used awk one liners. Unless some filenames contain e.g. whitespace, then it's Perl again

8n4vidtmkvmk · 1h ago

I've been using perl instead of sed because PCRE is just better and it's the same regex that PHP uses which I've been coding in for nearly 20 years. I still don't actually know perl, but apparently Gemini does. It wrote a particularly crazy find and replace for me. Never got around to using or learning awk. Only time I see it come up is when you want to parse some tab delimited output

ptspts · 2h ago

This is much safer: xargs -d '\n' rm -f --

jcynix · 2h ago

Sure, but my example was just that and I actually use /identical$/ as the pattern. Sorry for the typo.

And I use this "historic" one liner only when I know about the contents of both directories. As soon as I need a "safer" solution I use a Perl script and pattern matching, as I said.

mauvehaus · 6h ago

...And once you get away from the most basic, standard set of features, the several awks in existence have diverging sets of additional features.

chaps · 4h ago

Things are already like that, friend! We have mawk, gawk and nawk. But it's fun to think about how we could improve our ideal tooling if we had a time machine.

chubot · 5h ago

JSON is not a friendly format to the Unix shell — it’s hierarchical, and cannot be reasonably split on any character

Yes, shell is definitely too weak to parse JSON!

(One reason I started https://oils.pub is because I saw that bash completion scripts try to parse bash in bash, which is an even worse idea than trying to parse JSON in bash)

I'd argue that Awk is ALSO too weak to parse JSON

The following code assumes that it will be fed valid JSON. It has some basic validation as a function of the parsing and will most likely throw an error if it encounters something strange, but there are no guarantees beyond that.

Yeah I don't like that! If you don't reject invalid input, you're not really parsing

---

OSH and YSH both have JSON built-in, and they have the hierarchical/recursive data structures you need for the common Python/JS-like API:

    osh-0.33$ var d = { date: $(date --iso-8601) }

    osh-0.33$ json write (d) | tee tmp.txt
    {
      "date": "2025-06-28"
    }

Parse, then pretty print the data structure you got:

    $ cat tmp.txt | json read (&x)

    osh-0.33$ = x
    (Dict)  {date: '2025-06-28'}

Create a JSON syntax error on purpose:

    osh-0.33$ sed 's/"/bad/"' tmp.txt | json read (&x)
    sed: -e expression #1, char 9: unknown option to `s'
      sed 's/"/bad/"' tmp.txt | json read (&x)
                                     ^~~~
    [ interactive ]:20: json read: Unexpected EOF while parsing JSON (line 1, offset 0-0: '')

(now I see the error message could be better)

Another example from wezm yesterday: https://mastodon.decentralised.social/@wezm/1147586026608361...

YSH has JSON natively, but for anyone interested, it would be fun to test out the language by writing a JSON parser in YSH

It's fundamentally more powerful than shell and awk because it has garbage-collected data structures - https://www.oilshell.org/blog/2024/09/gc.html

Also, OSH is now FASTER than bash, in both computation and I/O. This is despite garbage collection, and despite being written in typed Python! I hope to publish a post about these recent improvements

packetlost · 3h ago

I don't really buy that shell / awk is "too weak" to deal with JSON, the ecosystem of tools is just fairly immature as most of the shells common tools predate JSON by at least a decade. `jq` being a pretty reasonable addition to the standard set of tools included in environments by default.

IMO the real problem is that JSON doesn't work very well at as a because it's core abstraction is objects. It's a pain to deal with in pretty much every statically typed non-object oriented language unless you parse it into native, predefined data structures (think annotated Go structs, Rust, etc.).

alganet · 3h ago

> Yes, shell is definitely too weak to parse JSON!

Parsing is a trivial, rejecting invalid input is trivial, the problem is representing the parsed content in a meaningful way.

> bash completion scripts try to parse bash in bash

You're talking about ble.sh, right? I investigated it as well.

I think they made some choices that eventually led to the parser being too complex, largely due to the problem of representing what was parsed.

> Also, OSH is now FASTER than bash, in both computation and I/O.

According to my tests, this is true. Congratulations!

izabera · 1h ago

did something along those lines a while ago https://github.com/izabera/j trying to keep an interface similar to jq

teddyh · 7h ago

“except Unicode escape sequences”

wutwutwat · 5h ago

Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.