Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

28 rept0id-2 33 5/21/2025, 7:00:50 PM github.com ↗

Comments (33)

threeducks · 10h ago

I think strtol is just a badly designed function. The return value should have been an error code and the actual long should have been "returned" via a pointer. Checking the return value is much easier than checking endptr and errno and remembering to set errno before calling strtol.

The fact that the strtol example in the manual is 50 lines long, of which most is error handling, speaks for itself. https://man7.org/linux/man-pages/man3/strtol.3.html#:~:text=...

That being said, I can't imagine many applications where crashing is a good solution.

wahern · 9h ago

OpenBSD added strtonum to <stdlib.h>. It's quite opinionated, but fits the usage patterns OpenBSD developers prefer.

> 50 lines long, of which most is error handling

That's a little exaggerated considering the example is an entire C program, including main. It's more like 14 lines, ignoring the '\0' check (which isn't necessary if you don't want to parse additional items), and even that includes whitespace and stderr logging.

I agree the biggest headache is reliance on errno and the nuanced--albeit conventional, consistent[1], and documented--semantics of only setting errno on error. Some POSIX APIs, at least those defined from scratch (e.g. pthreads, as opposed to incorporated common extensions) prefer returning the error code directly as you recommend. But this would technically only save you a single line, though it might save alot of confusion about semantics.

However, in this case I might keep returning the value directly and take an error pointer, similar to OpenBSD's strtonum. Otherwise you would need separate routines for char, short, int, long, and long long. Though there's still the issue of unsigned integers, and using _Generic it might be possible to at least hide all the type-specific variants behind a single interface.

And there's still the issue of needing to check for trailing garbage, or dropping the ability to use the interface inline with other parsing code. There's alot of dimensions to the problem. Parsing integers may be conceptually simple[2], but designing an interface that's easy to integrate into applications across a variety of contexts is much less simple.

[1] Consistent across libc interfaces, excepting those that may invoke syscalls internally, like printf, where errno may be incidentally modified even on success.

[2] I personally often prefer to just write a simple loop to manually parse integers. It's sometimes easier to integrate the desired error checking inline, or even elide it altogether (garbage-in, garbage-out).

roelschroeven · 6h ago

> conventional, consistent[1], and documented--semantics of only setting errno on error.

From that man page:

> The implementation may also set errno to EINVAL in case no conversion was performed (no digits seen, and 0 returned).

There are error cases where the implementation may set errno to EINVAL. There's not even a guarantee. I did a quick test. errno is only set if you pass an invalid base, or if the string does contain a number but it is out of range. If you pass a string which doesn't even remotely look like a number, errno is not set. You have to check endptr.

  #include <errno.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>

  int main(void)
  {
    const char *s = "this is not a number";
    printf("strtol(\"%s\")\n", s);
    errno = 0;
    long a = strtol(s, NULL, 10);
    printf("errno: %d; strerror: %s\n", errno, strerror(errno));
    printf("result: %d\n", a);
    return 0;
  }

Output:

  strtol("this is not a number")
  errno: 0; strerror: Success
  result: 0

https://godbolt.org/z/TbEKss8zM

That's just rubbish design. In what is in my opinion the most common error case, namely that the string doesn't represent a number while you expect that it does, errno is not set.

spyrja · 4h ago

Personally I'd just go the wrapper approach and ignore errno entirely.

  #include <assert.h>
  #include <ctype.h>
  #include <limits.h>
  #include <stdbool.h>
  #include <stdint.h>
  #include <stdlib.h>

  bool copy_str_to_long_long(const char* str, long long* ptr) {
    while (isspace(*str))
      ++str;
    if (*str == 0)
      return false;
    char* parsed = NULL;
    long long value = strtoll(str, &parsed, 0);
    while (isspace(*parsed))
      ++parsed;
    if (*parsed != 0)
      return false;
    *ptr = value;
    return true;
  }

  long long str_to_long_long(const char* str) {
    long long result;
    assert(copy_str_to_long_long(str, &result));
    return result;
  }

  bool copy_str_to_long(const char* str, long* ptr) {
    long long buffer;
    if (!copy_str_to_long_long(str, &buffer) || buffer < LONG_MIN || buffer > LONG_MAX)
      return false;
    *ptr = buffer;
    return true;
  }

  long str_to_long(const char* str) {
    long result;
    assert(copy_str_to_long(str, &result));
    return result;
  }

  /*
    ...define copy_str_to_short, copy_str_to_int32, etc...
  */

hedora · 6h ago

Crashing on error is almost always the right default behavior. If you do anything else by default, then you get in to the land of data corruption and security holes.

If you do this, and then find your thing crashes in production, then that means you didn't test it well enough.

Before someone mentions safety critical systems, consider the fact that the first apollo landing's computer crashed when it detected it was violating realtime bounds ("alarm 1201" means the OS detected CPU exhaustion + rebooted without clearing process state). At that point, it went into a reboot loop, and got close enough to the surface for Neil Armstrong to nudge the lander to a safe landing.

https://apollo11space.com/apollo-11-computer-problem/

AlotOfReading · 2h ago

The Apollo guidance computer specifically had a pretty unusual definition of "crashing" too. What it would do is save all the critical data, turn everything off, reinitialize the hardware, and restart all the jobs from designated reentry points. Very unlike what we think of as crashing. Regardless, a modern boot chain is much more complicated than anything the Apollo computers had to deal with, where rebooting (as opposed to the safe restart procedure above) simply involved setting the program counter to some default value.

lunar-whitey · 1h ago

The LM computer’s recovery routines for this scenario are more accurately described as “checkpoint/restore” than “crash” (as most people would understand the latter). The former technique is still used in high availability systems today.

lunar-whitey · 2h ago

This summary of events is incorrect. The LM guidance computer’s executive routines detected an overload (which occurred because Buzz Aldrin had left the rendezvous radar on) and continued to operate in a degraded state where lower priority work was not performed. These faults were reported and the landing proceeded with the computer remaining responsive to inputs despite the concerns the alarms produced. Armstrong’s control changes near the end of the descent were made for reasons unrelated to the rendezvous radar error.

Modern systems use more formal techniques to achieve the same goal: graceful degradation.

For a detailed account of computer operations during the Apollo 11 landing, you can read this analysis from an engineer who worked on the program:

https://www.doneyles.com/LM/Tales.html

bmink · 5h ago

I love the Apollo 11 computer stories but by today’s standards it was more of an MCU than a computer. And sure, in the embedded space it is true that in a lot of cases error recovery doesn’t make a whole lot of sense and it makes more sense to reset quickly.

But there are many systems today that take a long time to restart so you can’t just abort if you have a chance to recover.

duneroadrunner · 7h ago

So I don't write much C code these days, but I recently encountered strtol() again and am I mistaken or does the interface also violate const correctness? I mean it takes a const char* as the first parameter and then gives you back a (non-const) char* potentially pointing into the same string, right? Like, does strtol() get a pass because it's old, or is const correctness (still) not generally a concern of C programmers?

tedunangst · 7h ago

The idea is that if the input was not const, it's really inconvenient to get a const endptr back out. If your intention is to break your program, there are easier ways to do so than washing the pointer through strtol.

spyrja · 7h ago

More than a few C library functions do that kind of thing. Like `strstr`, which takes const strings as arguments but returns a readily modifiable pointer to char. Const-correctness just wasn't on the top of the list when they standardized this stuff, I guess. (Heck, back in those days, most PROGRAMS for that matter weren't written with much care for it.)

wahern · 6h ago

It's a consequence of the peculiarity of C type semantics, which disallows implicit conversions of pointer-to-pointer to pointer-to-pointer-to-const. C23 6.5.16.1 EXAMPLE 3 explains why:

  const char **cpp; char *p;
  const char c = ’A’;
  cpp = &p;   // constraint violation
  *cpp = &c;  // valid
  *p = 0;     // valid

  The first assignment is unsafe because it would allow the 
  following valid code to attempt to change the value of the 
  const object c.

There are proposals on the table for C2y to redefine various APIs, including strtol, strchr, memcpy, etc, to preserve const correctness. Implementations might make use of _Generic (there are some issues there, though), newly specified language features, or possibly use internal extensions not available in the language proper, to accomplish this.

jefftk · 7h ago

There are unfortunately a lot of old C library functions that violate const correctness. Consider dirname: https://www.jefftk.com/p/dirname-is-evil

gpderetta · 8h ago

C can actually return structs by value and small structs are actually handled quite efficiently in some ABIs, so a pair result/error would be quite convenient although I guess not idiomatic.

PaulDavisThe1st · 8h ago

> The return value should have been an error code and the actual long should have been "returned" via a pointer.

Oh, you mean like:

    int ret = sscanf (str, "%d", &value);

?

sltkr · 8h ago

Yes, that API is actually great, but the problem with (s)scanf() is that reading an invalid value is undefined behavior. So you can't use it if you don't already know the result fits in &value, which is exactly the situation where you'd use strtol() instead.

CamperBob2 · 7h ago

Presumably 'undefined behavior' means you'll get an undefined int value back -- which you will (of course) range-check -- not that it will wipe out the next 600KB of memory starting at &value or do something similarly hazardous.

roelschroeven · 6h ago

No, that's very much not what undefined behavior means. Undefined Behavior (the man page on my system actually capitalizes both words) means that there are no guarantees at all about the behavior of the whole program. It can very much wipe out whole chunks of memory, or crash (not necessarily in or around the sscanf call), or get stuck in an infinite loop, or whatever.

PaulDavisThe1st · 5h ago

From the man page on my system:

> Use of the numeric conversion specifiers produces Undefined Behavior for invalid input. See C11 7.21.6.2/10 ⟨https://port70.net/%7Ensz/c/c11/n1570.html#7.21.6.2p10⟩. This is a bug in the ISO C standard, and not an inherent design issue with the API. However, current implementations are not safe from that bug, so it is not recommended to use them. Instead, programs should use functions such as strtol(3) to parse numeric input. This manual page deprecates use of the numeric conversion specifiers until they are fixed by ISO C.

CamperBob2 · 5h ago

If the CRTL maintainers don't care, why should I? Such behavior is broken, not "undefined."

As the other poster points out, the bug is in the spec, and I'd be astonished if the library function itself actually misbehaves with any given input.

cmovq · 8h ago

Interestingly, a complete implementation of strtol [1] is shorter than this wrapper. If you don't like strtol's API or error handling, just implement your own.

[1]: https://github.com/gcc-mirror/gcc/blob/master/libiberty/strt...

> If an overflow is detected, it calls abort()

An aside, but this doesn't detect overflows on Windows due to both long and int being 32 bits (you'd want strtoll for that).

rept0id-2 · 11h ago

Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

If an overflow is detected, it calls abort(), you crash and get Aborted (core dumped).

This way, you prioritize memory safety over silently running in a wrong state, without needing to call strtol and check errors manually everywhere.

eqvinox · 5h ago

"long" is not guaranteed to be larger than "int", and in fact it isn't on 64-bit Windows. (Let alone 32-bit platforms in general.)

https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

You need to use "long long" or "intmax_t" (and matching strtoll/strtoimax)

ksherlock · 9h ago

Your errno checks aren't correct.

errno is only set on an error. It's not cleared on success. If errno was previously set, the function will always abort(). So you need to do something like:

    int saved_errno = errno;
    errno = 0;
    ....
    errno = saved_errno;
    return aInt;

kazinator · 6h ago

No ISO C standard library function sets errno to zero, but functions not in the standard library are not so obliged.

Functions that clobber errno to zero make it impossible to call several functions and use a nonzero value of errno to conclude that one or more of them went wrong.

If you don't think that such code is a good idea (for instance because you believe that every function that can fail should be checked for its specific error), then you probably have nothing against functions which clobber errno to zero.

ben0x539 · 9h ago

Doesn't it set errno to 0 first thing?

ksherlock · 8h ago

ahh, you're right. It's still polite to save and restore though :)

kazinator · 6h ago

Where is it documented that atoi wraps strtol?

snarfy · 9h ago

pun driven development

timewizard · 10h ago

"This way, you open yourself up to DDoS attacks, instead of just handling your own errors correctly."

kstrauser · 10h ago

I prefer this approach. It's kind of like using `.expect()` or `.unwrap()` in Rust when there's no plausible way the call should fail. Like if my program writes out a JSON file and then reads it back in, and the file isn't value JSON, panicking is a reasonable way to deal with the situation. Someone the world got itself into a strange state I can't reasonably recover from.

Well, same here. If you're using strtol on data that must be well-formed, and it's not, and you're at an earlier stage of startup like parsing the config file, go ahead and blow up. That's almost certainly better than plowing ahead with invalid data.

LukeShu · 9h ago

> Like if my program writes out a JSON file and then reads it back in, and the file isn't value JSON, panicking is a reasonable way to deal with the situation.

Like how if I hit ctrl-C at just the right moment during the build, the next time I build, cmake will segfault and I have to delete the build directory.

The Windows Subsystem for Linux is now open source (blogs.windows.com)

Baby is healed with first personalized gene-editing treatment (nytimes.com)

AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms (deepmind.google)

Coding without a laptop: Two weeks with AR glasses and Linux on Android (holdtherobot.com)

Spaced repetition systems have gotten better (domenic.me)

Watching AI drive Microsoft employees insane (old.reddit.com)

Have I Been Pwned 2.0 (troyhunt.com)

What is HDR, anyway? (lux.camera)

Veo 3 and Imagen 4, and a new tool for filmmaking called Flow (blog.google)

Zod 4 (zod.dev)

Human (quarter--mile.com)

Thoughts on thinking (dcurt.is)

OpenAI to buy AI startup from Jony Ive (bloomberg.com)

Don't guess my language (vitonsky.net)

DDoSecrets publishes 410 GB of heap dumps, hacked from TeleMessage (micahflee.com)

France Endorses UN Open Source Principles (social.numerique.gouv.fr)

Push Ifs Up and Fors Down (matklad.github.io)

GitHub Copilot Coding Agent (github.blog)

Making video games (without an engine) in 2025 (noelberry.ca)

Jules: An asynchronous coding agent (jules.google)

Ground control to Major Trial (virtualize.sh)

A Research Preview of Codex (openai.com)

Getting AI to write good SQL (cloud.google.com)

Bus stops here: Shanghai lets riders design their own routes (sixthtone.com)

I don't like NumPy (dynomight.net)

Proton threatens to quit Switzerland over new surveillance law (techradar.com)

Deep Learning Is Applied Topology (theahura.substack.com)

InventWood is about to mass-produce wood that's stronger than steel (techcrunch.com)

By default, Signal doesn't recall (signal.org)

Ditching Obsidian and building my own (amberwilliams.io)

Devstral (mistral.ai)

Writing that changed how I think about programming languages (bernsteinbear.com)

Flattening Rust’s learning curve (corrode.dev)

Finland announces migration of its rail network to international gauge (yle.fi)

Claude Code SDK (docs.anthropic.com)

Dilbert creator Scott Adams says he will die soon from same cancer as Joe Biden (thewrap.com)

The unreasonable effectiveness of an LLM agent loop with tool use (sketch.dev)

Litestream: Revamped (fly.io)

Mystical (suberic.net)

A leap year check in three instructions (hueffner.de)

Gemma 3n preview: Mobile-first AI (developers.googleblog.com)

Game theory illustrated by an animated cartoon game (ncase.me)

Coinbase says hackers bribed staff to steal customer data, demanding $20M ransom (cnbc.com)

The recently lost file upload feature in the Nextcloud app for Android (nextcloud.com)

Building my own solar power system (medium.com)

SMS 2FA is not just insecure, it's also hostile to mountain people (blog.stillgreenmoss.net)

A server that wasn't meant to exist (it-notes.dragas.net)

Databricks acquires Neon (databricks.com)

MIT asks arXiv to withdraw preprint of paper on AI and scientific discovery (economics.mit.edu)

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure (github.com)

Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

Comments (33)