> Regardless of whether YAML 1.2 has been (or will be) widely adopted, it does not help those who want to parse a JSON document with a YAML parser.
Who are these hypothetical lunatics?
makeitdouble · 1h ago
It works in most trivial cases, so I'm assuming it's widely done irl.
Imagine building some small app that reads external config file. You personally only care about yaml files, but your avid users ask for official json support as well because of a few of their files.
It's not high priority, but as you remember the old saying, json is just a subset, so you try a bunch of files, confirm it works well enough, and decide to pipe JSON files to your yaml parser as well. Done and done.
And it's not so big block of code or anything that jumps at the users, the dev just happens to share a parser between formats.
randallsquared · 2h ago
Anyone who wants to convert a piece at a time to yaml, but would prefer to just use a single parse step (as suggested by one of the quoted bits in the article). It's a niche case, but sensical.
osigurdson · 19m ago
Kubernetes is a good example. Objects can be read / written as json or yaml interchangeably.
MyOutfitIsVague · 54m ago
Ruby on Rails has a databases.yml that is preprocessed by ERB. I often do something like:
field: <%= File.read('/foo/bar').to_json %>
Though I have myriad criticisms of YAML, this article's arguments are not a concern at all if you can ensure that your YAML parser always uses 1.2 regardless of any %YAML directive.
0xbadcafebee · 19m ago
As someone who has written a lot of YAML-compatible software, this is my biggest takeaway: every single YAML-parsing tool is incompatible with each other (in some way). If two tools work together, it's an accident. They will work for a while until someone tries to do something with one tool that the other tool doesn't like.
If all you do with YAML is serialize data, from one tool, to the exact same tool, it's fine. For all other purposes you should seek a different data format (if you don't want to deal with the eventual bugs).
(note that I don't mean parser/library, I mean tool. the tool using the library will often use or not-use certain options which increases the complexity of the interactions and leads to more possible failures)
coderjames · 2h ago
> those who want to parse a JSON document with a YAML parser.
I've done it. We already had a YAML parser in an internal library I maintain since we were already ingesting YAML files for other reasons, so when we later added new files for a different reason that someone decided should be in JSON instead, it was easier and cleaner to keep using the existing YAML parser we already had incorporated rather than add a separate JSON parser along side it.
zamalek · 1h ago
You want to use a machine to create a document for something that accepts JSON. e.g. Something I was playing around with was having a github workflow that generates a github workflow. Or generating docker compose files.
deathanatos · 51m ago
Even if you have a YAML 1.2 parser, here's another one:
In [1]: v = "\N{PILE OF POO}"
In [2]: yaml.load(json.dumps(v), yaml.SafeLoader) == v
Out[2]: False
The specification is woefully underspecified with regards to Unicode escapes. E.g., it uses "Unicode characters" practically throughout, a construct that doesn't exist in Unicode & (AFAICT) is not defined by YAML. A reasonable interpretation of that leads us to
\uabcd
(which the spec says is "Escaped 16-bit Unicode character.") …decoding to the USV with value 0xabcd. But that's not compatible with JSON.
(PyYAML is not the only library with that reading of the spec, either. Rust's will outright error on the input here, as its `str` type is equivalent to [USV], whereas Python's `str` is not. (The value Python decodes in the example above is a representable but illegal value.))
zzo38computer · 56m ago
> 1e2 is a valid JSON number, but YAML 1.1 requires it to be written as 1.0e+2
Then a program can be made that writes it as "1.0e+2", which is also valid in JSON as well as YAML, regardless of what the reader expects. (However, some formats will not need numbers that will need the scientific notation anyways.)
It does not help if you are trying to use a YAML parser to parse a JSON file, but at least, it avoids a different problem.
If you are making your own which does not need to work with an existing format, then you do not necessarily need to use YAML nor JSON; you can choose the appropriate format for it. You can consider what numbers and what character sets you intend to use, and if you will use binary data, etc. If you like the structured data but do not need a text format than DER is another way.
sshine · 3h ago
I wonder how widespread YAML 1.1 is.
If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.
This doesn't translate to arbitrary, open environments, but you can make that choice in closed environments.
While JSON numbers are grammatically simple, they're almost always distinct from how you'd implement numbers in any language that has JSON parsers, syntactically, exactness and precision-wise.
So while YAML is a lot more complex, you always need to limit yourself from what kinds of numbers you actually try to express in JSON. This is especially true for scientific numbers, big numbers, and numbers exact down to many digits.
jmillikin · 2h ago
Among ecosystems based on YAML-formatted configuration defaulting to YAML 1.1 is nearly universal. The heyday of YAML was during the YAML 1.1 era, and those projects can't change their YAML parsers' default version to 1.2 without breaking extant config files.
By the time YAML 1.2 had been published and implementations written, greenfield projects were using either JSON5 (a true superset of JSON) or TOML.
> While JSON numbers are grammatically simple, they're almost always distinct
> from how you'd implement numbers in any language that has JSON parsers,
> syntactically, exactness and precision-wise.
For statically-typed languages the range and precision is determined by the type of the destination value passed to the parser; it's straightforward to reject (or clamp) a JSON number `12345` being parsed into a `uint8_t`.
For dynamically-typed languages there's less emphasis on performance, so using an arbitrary-precision numeric type (Python's Decimal, Go's "math/big" types) provide lossless decoding.
The only language I know of that really struggles with JSON numbers is, ironically, JavaScript -- its BigInt type is relatively new and not well integrated with its JSON API[0], and it doesn't have an arbitrary-precision type.
> If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.
Indeed the YAML 1.2 spec says a document without a YAML directive should be assumed to be 1.2[1]:
> A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.
It's only the YAML 1.2 spec that says it's a superset of JSON. The YAML authors weren't aware of JSON when publishing version 1.1[2]:
> The YAML 1.1 specification was published in 2005. Around this time, the developers became aware of JSON. By sheer coincidence, JSON was almost a complete subset of YAML (both syntactically and semantically).
> In 2006, Kyrylo Simonov produced PyYAML and LibYAML. A lot of the YAML frameworks in various programming languages are built over LibYAML and many others have looked to PyYAML as a solid reference for their implementations.
> The YAML 1.2 specification was published in 2009. Its primary focus was making YAML a strict superset of JSON. It also removed many of the problematic implicit typing recommendations.
The middle paragraph there is the reason this is a problem people keep running into: Most implementations are based on LibYAML, which is an implementation of YAML 1.1 that does not really support 1.2[3]. Indeed the last example from the post doesn't actually work for me on Ruby 3.4.4 with LibYAML 0.2.5. It produces the exact same output as the one before it.
Absolutely no truck with this either. If you want another whitespace obsessed bug farm, you can give it a new name.
Stay with XML. It's fine. I wrote a bunch earlier this evening, even though you're not really meant to edit it by hand, and that was fine too. Emacs totally understands what XML is.
joaohaas · 1h ago
YAML sucking is no excuse to keep using XML. JSON, JSON5 and TOML are all great alternatives for projects.
osigurdson · 16m ago
YAML is better than all of those things imo. It is easier for me to read and write and works better with more complex configs when you have mixtures of other types of formats in a single file (e.g. xml, json, bash, etc., which is sometimes useful in Kubernetes).
chowells · 1h ago
On multiple occasions, I've wanted a standard format that allows large multi-line text blocks to be unquoted. JSON, JSON5, and TOML don't do that. You know what does? YAML and XML. I'm not really a fan of either of them, but where's the better option that still gives me large unquoted text blocks?
mdaniel · 10m ago
I hate toml more than most people, but in fairness it does have two kinds of multiline strings: https://toml.io/en/v1.0.0#string
JonChesterfield · 1h ago
XML is also game for blocks of text with clumsy <xsl:value-of select='$thing' /> scattered through it for an ad hoc string substitution in those unquoted text blocks. Lua has a nice large blocks of literal text notation too.
JonChesterfield · 1h ago
Doubtful. JSON has javascript semantics all over it. TOML I've had to look up and seems vaguely fine as such things go, but doesn't have schemas.
XML grows on you. XSL transforms are _probably_ not a good idea but they also kind of grow on you. It turns into html rather easily as well.
Who are these hypothetical lunatics?
Imagine building some small app that reads external config file. You personally only care about yaml files, but your avid users ask for official json support as well because of a few of their files.
It's not high priority, but as you remember the old saying, json is just a subset, so you try a bunch of files, confirm it works well enough, and decide to pipe JSON files to your yaml parser as well. Done and done.
And it's not so big block of code or anything that jumps at the users, the dev just happens to share a parser between formats.
If all you do with YAML is serialize data, from one tool, to the exact same tool, it's fine. For all other purposes you should seek a different data format (if you don't want to deal with the eventual bugs).
(note that I don't mean parser/library, I mean tool. the tool using the library will often use or not-use certain options which increases the complexity of the interactions and leads to more possible failures)
I've done it. We already had a YAML parser in an internal library I maintain since we were already ingesting YAML files for other reasons, so when we later added new files for a different reason that someone decided should be in JSON instead, it was easier and cleaner to keep using the existing YAML parser we already had incorporated rather than add a separate JSON parser along side it.
(PyYAML is not the only library with that reading of the spec, either. Rust's will outright error on the input here, as its `str` type is equivalent to [USV], whereas Python's `str` is not. (The value Python decodes in the example above is a representable but illegal value.))
Then a program can be made that writes it as "1.0e+2", which is also valid in JSON as well as YAML, regardless of what the reader expects. (However, some formats will not need numbers that will need the scientific notation anyways.)
It does not help if you are trying to use a YAML parser to parse a JSON file, but at least, it avoids a different problem.
If you are making your own which does not need to work with an existing format, then you do not necessarily need to use YAML nor JSON; you can choose the appropriate format for it. You can consider what numbers and what character sets you intend to use, and if you will use binary data, etc. If you like the structured data but do not need a text format than DER is another way.
If you assume that YAML 1.2 is the default, you don't need that nasty %YAML header.
This doesn't translate to arbitrary, open environments, but you can make that choice in closed environments.
While JSON numbers are grammatically simple, they're almost always distinct from how you'd implement numbers in any language that has JSON parsers, syntactically, exactness and precision-wise.
So while YAML is a lot more complex, you always need to limit yourself from what kinds of numbers you actually try to express in JSON. This is especially true for scientific numbers, big numbers, and numbers exact down to many digits.
By the time YAML 1.2 had been published and implementations written, greenfield projects were using either JSON5 (a true superset of JSON) or TOML.
For statically-typed languages the range and precision is determined by the type of the destination value passed to the parser; it's straightforward to reject (or clamp) a JSON number `12345` being parsed into a `uint8_t`.For dynamically-typed languages there's less emphasis on performance, so using an arbitrary-precision numeric type (Python's Decimal, Go's "math/big" types) provide lossless decoding.
The only language I know of that really struggles with JSON numbers is, ironically, JavaScript -- its BigInt type is relatively new and not well integrated with its JSON API[0], and it doesn't have an arbitrary-precision type.
[0] See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... for the incantation needed to encode a BigInt as a number.
Indeed the YAML 1.2 spec says a document without a YAML directive should be assumed to be 1.2[1]:
> A version 1.2 YAML processor must accept documents with an explicit “%YAML 1.2” directive, as well as documents lacking a “YAML” directive. Such documents are assumed to conform to the 1.2 version specification.
It's only the YAML 1.2 spec that says it's a superset of JSON. The YAML authors weren't aware of JSON when publishing version 1.1[2]:
> The YAML 1.1 specification was published in 2005. Around this time, the developers became aware of JSON. By sheer coincidence, JSON was almost a complete subset of YAML (both syntactically and semantically).
> In 2006, Kyrylo Simonov produced PyYAML and LibYAML. A lot of the YAML frameworks in various programming languages are built over LibYAML and many others have looked to PyYAML as a solid reference for their implementations.
> The YAML 1.2 specification was published in 2009. Its primary focus was making YAML a strict superset of JSON. It also removed many of the problematic implicit typing recommendations.
The middle paragraph there is the reason this is a problem people keep running into: Most implementations are based on LibYAML, which is an implementation of YAML 1.1 that does not really support 1.2[3]. Indeed the last example from the post doesn't actually work for me on Ruby 3.4.4 with LibYAML 0.2.5. It produces the exact same output as the one before it.
[1]: https://yaml.org/spec/1.2.2/#681-yaml-directives
[2]: https://yaml.org/spec/1.2.2/#12-yaml-history
[3]: https://github.com/yaml/libyaml/issues/20
Stay with XML. It's fine. I wrote a bunch earlier this evening, even though you're not really meant to edit it by hand, and that was fine too. Emacs totally understands what XML is.
XML grows on you. XSL transforms are _probably_ not a good idea but they also kind of grow on you. It turns into html rather easily as well.