You can tell how much they cared about data quality because they never took the time to look at context-dependent glyph equivalencies. And some context-sensitive algorithms might not make the same mistakes as a naive “guess what characters are here” algorithm that just uses glyph shapes. You run into this a LOT with ALPR systems because some of the presses excluded some characters. O and 0 are the most common character equivalency. But only in certain places.
OCR is actually complicated if you’re trying to rely on the data for something.
mensetmanusman · 5h ago
Naming an event after its date will have a limited run.
esafak · 6h ago
tl,dr: It's an OCR error
dahart · 5h ago
Or, sometimes, not; one of the more interesting takeaways was typewritten lowercase ells instead of ones: “When the algorithm read October llth, it was far more correct than we have been giving it credit.”
strogonoff · 4h ago
The latent font designer in me balks at the thought of taking a typeface and intentionally making one character look more like another character.
Was it some technical constraint of the typewriter that caused “1” to become more like “l” come XX century?
adrianmonk · 10m ago
> Was it some technical constraint of the typewriter that caused “1” to become more like “l” come XX century?
The typewriter I grew up with simply didn't have a key for it. It also didn't have a 0 or an exclamation mark or a plus sign. There were well known substitutes:
For the number 1, type lowercase letter l.
For the number 0, type uppercase letter o.
For the exclamation mark, type a period, hit backspace, and type an apostrophe / single quote.
For the plus sign, I'm not aware of a good substitute. You could maybe superimpose a slash on a hyphen, but it would look bad.
There was no division sign, and using a slash to denote division was not yet something I'd ever seen anyone do. You could probably have superimposed a hyphen and a colon to get ÷.
Oddly enough, it did have other characters which you won't find on a standard US keyboard today: ¼, ½, and ¢. The cent sign was useful, and it seems logical to me that if you're going to have $ you should have ¢ too!
thedufer · 4h ago
Typewriter keys cost money, and dropping the 1 allowed them to drop a key without significantly affecting the use of it. As far as I can tell, that's effectively the entire rationale.
This wasn't meaningfully the case prior; the printing press would've just needed more copies of 'l' if they'd dropped the 1s, and letters weren't as significant a portion of the cost of the machine, anyway. And afterwards came computers, which need to distinguish between the characters even if they're displayed the same way.
marcosdumay · 1h ago
> Typewriter keys cost money
They didn't just cost money. They were competing to the limited space around the typing area, what meant they were constrained at the border of a circumference that would be entirely filled with mechanisms. In other words, the cost in both money, size, and weight depended on the square of the number of keys.
hidingfearful · 3h ago
was it that in prior years a reader could usually distinguish 1 from l by context. Even today, very few things cause me to need to te11 a 1 from a l.
(typo 0n purpose)
it matters when reading code and random string (what we now call passwords, though back then passwords were things you could pronounce, unlike say ywtr466Nh%vX).
It doesn't matter for much else.
Though it did make an interesting plot twist in the Mioscene Arrow
bediger4000 · 4h ago
My parents had a typewriter without a 1 or a 0. I always thought it was to provide room for two other valuable characters like the old "cents" c with a bar through it.
https://drhagen.com/blog/the-missing-23rd-of-the-month/
It's one reason median is preferred over mean, at the outset, as well as throwing out outliers just to see what things look like.
https://en.m.wikipedia.org/wiki/Twyman%27s_law
OCR is actually complicated if you’re trying to rely on the data for something.
Was it some technical constraint of the typewriter that caused “1” to become more like “l” come XX century?
The typewriter I grew up with simply didn't have a key for it. It also didn't have a 0 or an exclamation mark or a plus sign. There were well known substitutes:
For the number 1, type lowercase letter l.
For the number 0, type uppercase letter o.
For the exclamation mark, type a period, hit backspace, and type an apostrophe / single quote.
For the plus sign, I'm not aware of a good substitute. You could maybe superimpose a slash on a hyphen, but it would look bad.
There was no division sign, and using a slash to denote division was not yet something I'd ever seen anyone do. You could probably have superimposed a hyphen and a colon to get ÷.
Oddly enough, it did have other characters which you won't find on a standard US keyboard today: ¼, ½, and ¢. The cent sign was useful, and it seems logical to me that if you're going to have $ you should have ¢ too!
This wasn't meaningfully the case prior; the printing press would've just needed more copies of 'l' if they'd dropped the 1s, and letters weren't as significant a portion of the cost of the machine, anyway. And afterwards came computers, which need to distinguish between the characters even if they're displayed the same way.
They didn't just cost money. They were competing to the limited space around the typing area, what meant they were constrained at the border of a circumference that would be entirely filled with mechanisms. In other words, the cost in both money, size, and weight depended on the square of the number of keys.
(typo 0n purpose)
it matters when reading code and random string (what we now call passwords, though back then passwords were things you could pronounce, unlike say ywtr466Nh%vX).
It doesn't matter for much else.
Though it did make an interesting plot twist in the Mioscene Arrow