German has an 'ß' problem of a similar nature. There is a corresponding capital "ẞ" in Unicode, and Germany has officially adopted 'ẞ' as an alternative since, but in Unicode's SpecialCasing.txt the upper of 'ß' is still 'SS'. The lower of 'S' of course being 's', there's no going back after folding to upper cases. Lower of 'ẞ' is however still 'ß'.
So by alternating case you end up with ß→SS→ss or ẞ→ß→SS. Certainly has potential to screw with naive attempts at case-insensitive comparison via case folding. Then again, Unicode adopting 'ẞ' as the upper of 'ß' in some future version would probably only increase that potential further.
I'm interested to hear from people dealing with a lot of German text how much of a problem this is in practice.
virtualritz · 2h ago
There is an esthetic issue here too.
The ‘ß’ is a ligature of the old ‘long s’ [1] which was written ‘ſ’ (because it’s common in old texts there is a Unicode code point for it).
This letter has no upper case version. Capitalized words starting with a long ‘ſ’ always used ‘S’.
Now in German language, to make this lowercase long ‘ſ’ a sharp ‘s’, ‘ſ’ followed by ‘z’ was written: ‘ſz’.
And these two were often typeset as a ligature, ‘ß’, for esthetic reasons.
That ligature then became the common case and eventually a letter recognized in German-speaking countries.
As a hypothetic analogy, imagine a ‘ll’ ligature, as in ‘fallacy’, becoming an English letter – by some twist of history.
As we saw, these were lowercase letters. And there is no uppercase version of ‘ſ’.
So the uppercase ‘ẞ’ that is now official recognized and has a Unicode code point should not look like this.
It's an absolute eye saw because all that was done was somehow make the letter look a bit more like a capital.
But it's nature of being two lowercase letters, originally, still makes it stand out like an eye sore for people with a background in typography, like myself.
IMHO It should look like ‘SZ’ (or ‘SS’), made into a ligature.
And as a type designer, I'd either refrain from filling that code point in a font I design, to protest this, or do the above: create a ligature of ‘SZ’ or ‘SS’ (alternative) and put that there.
First of all, 'ß' was a ligature -- a long time ago. It is a letter today. Disassembling it according to its original construction makes no sense today for any kind of argument about typesetting or Unicode. Further, 'ſ' is not used today in German at all, except for meta discussions like this or to stress how things used to be spelled. It makes no sense to mention it unless you are talking about font design or historic use of German (and other languages, for that matter).
Also, if you do mention it for the sake of talking about font design, in Latin fonts, 'ſs' is actually the basis for the design of 'ß', not 'ſz' -- that was mainly done in Blackletter/Fraktur when the 'z' looked different, maybe a bit like 'ʒ' (I used Unicode's'ezh' here hoping it looks right) so that old style 'ß' looks like a ligature of 'ſʒ'. This can still be seen occasionally, e.g., on Berlin street name signs. It is obsolete for most fonts today (although I quite like it).
Moreover, there is an upper case letter for 'ß': 'ẞ'. And it has existed in fine typology way before being adopted into Unicode. Actually, it's existence was probably the reason why it is now in Unicode. The official German rules are now: either use 'SS' or 'ẞ' for uppercase 'ß'. Most Germans probably do not even know that 'ẞ' exists as a choice today, although it was used on 'DER GROẞE DUDEN' even before Unicode existed.
And finally, how a glyph is designed is not necessarily decided on whether historic parts of an ancient ligature had upper case variants. So that 'ſ' has no upper case equivalent is irrelevant for both Unicode and type design.
But as a font designer or anything else, you can protest. No problem. Everyone has the right to protest. But please don't spill the Internet with wrong information, as there is enough of it already.
And I don't think 'SS'<->'ß' is similar to the Turkish 'I with/without dot' problem, because the default Unicode mapping for 'ß' is correct in all languages, while the Turkish (and also Azerbaijani) problem is correct or broken depending on language setting. This is way more problematic because an assumed universal equivalence does not hold. And you need to carefully distinguish whether a string is language specific or not, e.g., path names or IDs in data bases, etc.
alexey-salmin · 1h ago
> And I don't think 'SS'<->'ß' is similar to the Turkish 'I with/without dot' problem, because the default Unicode mapping for 'ß' is correct in all languages, while the Turkish (and also Azerbaijani) problem is correct or broken depending on language setting.
I don't know if this counts as "correct" but it's still very confusing.
Yes, it's definitely weird. But it is independent of locale, so any programmer has a change to notice this regardless of language setting, instead of their app failing only once it is used by someone from Turkey or Azerbaijan.
yorwba · 1h ago
Indeed, the original Unicode inclusion request justifies the need for an encoding for the character by referencing prior usage going back all the way to 1879: https://www.unicode.org/wg2/docs/n3227.pdf
It may be a typographical abomination, but it's an intentional representation of that particular typographical abomination, just as the ox head in "A" intentionally has its horns pointing down.
JimDabell · 3h ago
Transliterating this character incorrectly resulted in a violent attack causing two deaths:
Based on the murderous reaction from the entire family, I doubt the transliteration issue happening or not wouldn't have changed the outcome much. It's a weird consequence of a transliteration issue, but someone prepared to murder someone else over a rude text is a ticking time bomb regardless.
teddyh · 1h ago
There are multitudes of western people today who are very upset, some to the point of death threats, over an interpretation of a hand gesture.
What I’m saying is, this kind of thing is common.
watwut · 29m ago
You mean ... people upset about nazi salute which was used to express sympathies to the nazi movement and to express alliance with them?
omeid2 · 2h ago
It might seem like an overreaction from a western point of view, but the accusations in the context of Central Asian culture is something so extremely sensitive that people from all walks of life, from nobility to the poor kill and die over it. It is just a different frame of mind.
batuhanicoz · 1h ago
This is an overreaction. It's violence. Trying to justify it by claiming it's part of their culture is not healthy, I think we can have some universal values (don't stab people?) and it is perfectly reasonable to force people to adopt to those values. It's their culture? They can leave the violent parts of culture behind and adopt to the expectations of modern society (not stabbing people).
I'm Turkish. I grew up in Turkey. These things happen, but let's not try to justify them. We should aim to get to a point where people share these "western values" (of not stabbing people).
4gotunameagain · 2h ago
I'm sorry but stabbing someone over a single text message is not cultural difference, is idiocy.
pjc50 · 2h ago
Honor culture makes people do weird and terrible things. The American cultural version would be the same thing but with a gun.
lblume · 36m ago
No matter how much I typically despise American culture, killing people (no matter by which means) over prostitution in an antecedent does not appear to be a part of it.
omeid2 · 1h ago
"A single text" is an absurd reductionism.
People suffer worst than death over words all the time, even in the West. Some folks adhere to honour, some to political groups and ideologies, some religion, some to their social views; there are words that are treated as violence and responded to accordingly in every context.
tobyhinloopen · 2h ago
I think you're giving this character a bit too much credit here, I feel like the violent attack might have some causes unrelated to transliteration of some characters.
ayhanfuat · 2h ago
That doesn't really make sense. "sıkışınca" would become "sikisinca". No one would read it as "sikişince" (the latter a is doing the heavy lifting there). A guy with the same name (I would say not a common name-surname combination considering the same region) was in jail for sexually assaulting a mentally challenged kid. I guess this was just an excuse of a psychopath. https://www.hurriyet.com.tr/gundem/parti-binasinda-ozurlu-ki...
eknkc · 3h ago
Haha yep. I'm Turkish and been using US layout keyboards my entire life. Therefore, I do not use the Turkish characters online. I use S for Ş. G for Ğ and it just works, nobody ever complained.
One word is to get bored that's causing issues.
sık - to bore
sik - to fuck
So if I write "sikildim" to say "I got bored", it actually becomes "I got fucked".
One way around it to capitalize. SIKILDIM is "I got bored" but now you are yelling. Typing "sıkıldım" is a hassle on a US keyboard though.
orphea · 3h ago
The problem was that Emine's cell phone was not localized properly for Turkish and did not have the letter <ı>; when it displayed Ramazan's message, it replaced the <ı>s with <i>s.
Does it make sense? Could the phone arbitrarily replace characters? Or could it more likely that the guy typed dotted i's?
eknkc · 2h ago
I think the article is somewhat fabrication.
There might be some truth to it but it does not make much sense. Technically, ı would probably show up as □ instead of i if the phone had a hard time displaying it.
There is also the suffix not matching that change: sıkışınca vs sikişince. A becomes E in that suffix when you switch from ı to i. Even if the phone fucked up, "sikişinca" would look weird.
foobahhhhh · 1h ago
The family being utterly insane was a minor factor.
Shane there was no concept of self defence.
Karliss · 1h ago
This makes me wonder is a there a programming language which has separate data types for locale aware and locale independent strings. I know that rust has OsString but that's a slightly different usecase.
Problem with the current widely used approach of having global application wide locale setting is that most applications contain mix of User facing strings and technical code interacting with file formats or remote APIs. Doesn't matter if you set it to current language (or just let operating system set it) or force it to language independent locale, sooner or later something is going to break.
If you are lucky a programming language might provide some locale independent string functions, but using them is often clunky and and unlikely to be done consistently across whole code base and all the third party libraries. It's easier to do things correctly if you are forced to declare the intention from the start and any mixing of different context requires an explicit conversion.
sam_lowry_ · 1h ago
С
bob1029 · 3h ago
System.Globalization is quite the feat of engineering. Setting CultureInfo is like getting onto an actual airplane. I don't know of any other ecosystem with docs like:
A classic which breaks lots of applications is the difference between number format "1,234.5" and "1.234,5" (some European countries).
simiones · 2h ago
I've actually been responsible some 10 years ago for introducing a bug like this in an official release of an industry-standard tool for a somewhat niche industry. Some SQL queries we were generating ended up saying `SELECT x FROM t WHERE x < 1,02` if run on an any system with commas as the decimal separator. We found it and fixed a few weeks later, and I don't think we've ever had a complaint from the field about this, but it was still pretty eye opening about locales.
The extra irony is that me and my colleagues live in a country that actually has this kind of locale, but no one in the entire extended team was using it, everyone uses a US locale.
Ylpertnodi · 2h ago
I've had to adust to/ accomodate the difference between 1. and 1, almost daily.
Very expensive if you fuck up.
Very embarrassing if you fuck up, too.
the_mitsuhiko · 2h ago
Over the years this has shown up a few times because PHP internally was using a locale dependent function to normalize the class names, but it was also doing it inconsistently in a few places. The bug was active for years and has resurfaced more than once: https://bugs.php.net/bug.php?id=18556
ndepoel · 3h ago
Ahh yes, been there, done that.
Several years ago we had issues with certification of our game on PS4 because the capitalization on Sony's Turkish translation for "wireless controller" was wrong. The problem being that Turkish dotless I. What was the cause? Some years prior we had had issues with internal system strings (read: stringified enums) breaking on certain international PC's because they were being upper/lowercased using locale-specific capitalization rules. As a quick fix, the choice was made then to change the culture info to invariant globally across the entire game. This of course meant that all strings were now being upper/lowercased according to English rules, including user-facing UI strings. Hence Turkish strings mixing up dotted and dotless I's in several places. The solution? We just pre-uppercased that one "wireless controller" term in our localization sheet, because that was the only bit of text Sony cared about. An ugly fix and we really should have gone through the code to properly separate system strings from UI texts, but it got the job done.
sebstefan · 3h ago
Boy it would sure be easier if the Turkish i was a different unicode character in lowercase too
lifthrasiir · 3h ago
Impossible because the decision was already made by Turkish encodings, which made Unicode to pick only one option (round-trip compatibility with legacy encodings) out of possible trade-offs.
alexey-salmin · 3h ago
What were the other possible trade-offs? I don't really see how lack of round-trip compatibility is worse than what we have now. It's breaking the whole idea of Unicode code points and for what.
thaumasiotes · 40m ago
Actually it reflects the idea of Unicode code points correctly. They are meant to represent graphs, not semantics.
This isn't honored; we have many Unicode code points that look identical by definition and differ only in their secret semantics, but all of those points are in violation of the principles of Unicode. The Turkish 'i' is doing the right thing.
alexey-salmin · 17m ago
How do you define "look identical" outside of fonts which from my understanding were excluded from Unicode consideration on purpose?
E.g. Cyrillic "а" looks the same as Latin "a" most of the time, they both are distant descendants of the Phoenician 𐤀, but they are two different letters now. I'm very glad they have different code points, it would be a nightmare otherwise.
zokier · 1h ago
How would separate code point break round-tripping specifically?
sebstefan · 54m ago
I don't know about round-tripping of anything but suddenly having an entire nation with keyboard outputting utf-8 on outdated national systems probably designed for Latin1 seems like a tough sell to fix this issue
sebstefan · 3h ago
Yep I'm aware
makeitdouble · 3h ago
I'm imagining coding with some random "i" being a different completely undistinguishable character from the English "i". Or people writing your name and not matching in their DB because their local "i" is not your "i".
It's a potential issue already depending on your script, and CJK also has this funny full English alphabet but all in double-width characters that makes it PITA for people who can't distinguish the two. But having it on a character as common as "i" would feel specially hellish to me.
sebstefan · 3h ago
It wouldn't matter
There's already this problem for cyrillic 'e' and latin 'e' and hundreds of other characters
People use it to create lookalike URLs and phish people
Cyrillic 'e' is isolated in that you switch script when writing it. I'd compare it to the greek X.
Turkish isn't on a fully separate script, most letters are standard ascii and only a few are special (it's closer to French or German with the accentuated characters), so you don't have the explicit switch, it's always mixed.
sebstefan · 46m ago
Then you have the greek question mark ;
alexey-salmin · 2h ago
> But having it on a character as common as "i" would feel specially hellish to me.
It does (U+0131 = Latin Small Letter Dotless I, U+0069 = Latin Small Letter I).
The problem is that uppercasing the dotted i outputs a different character depending on your current locale. Using case-insensitive equality checks also break this way (I==i, except in a Turkish locale, so `QUIT ilike quit` is false).
mrspuratic · 39m ago
Irish script traditionally used a dot-less "i", something that persists in current road signage (anecdotally to save confusion with "í", or with adjacent old-style dotted consonants, I can't find a definitive source to cite). It's only an orthographic/type thing, it's semantically an "i", though the Unicode dot-less "i" is sometimes used online to represent it.
rob74 · 3h ago
Yes - the problem is that "i" and "I" are standard ASCII characters, while the dotted I and the dotless i are not. Creating special "Turkish I" and "Turkish i" characters would have been an alternative, but would have had its own issues (e.g. documents where only some "i"s are Turkish and the rest "regular" because different people edited it with different software/settings).
tmtvl · 3h ago
Is it? That's weird, I can't find the code for Latin Small Letter Dotted I. There is a Cyrillic dotted I, but that one doesn't have the dot in capitalised form.
What sebstefan is asking for is a Unicode character which is the non-capitalised form of Latin Capital Letter I With Dot Above (U+0130) which always gets capitalised to U+0130 and which U+0130 gets downcased to.
elevatortrim · 3h ago
Not sure about this. For this to work, one of these would need to happen:
1. Have two "i" characters on Turkish keyboards, one to use when writing in English, one in Turkish. Sounds difficult to get used to. Always need to be conscious about whether writing an "English i", or a "Turkish i".
2. "i" key is interpreted as English "i" when in English locale, as a special unicode character when in Turkish locale. This would be a nightmare as you would then always have to be conscious of your locale. Writing in English? Switch to English locale. Writing code? Switch to English locale. Writing a Turkish string literal in code? Switch to Turkish, then switch back. It would need to be a constant switching between back and forth even though both are Latin alphabet.
JimDabell · 2h ago
> "i" key is interpreted as English "i" when in English locale, as a special unicode character when in Turkish locale. This would be a nightmare as you would then always have to be conscious of your locale.
Isn’t this already the case with other languages? For instance, the same key on the keyboard produces a semicolon (;) in English and a Greek question mark (;) in Greek. These are distinct characters that are rendered the same (and also an easy way to troll a developer who uses an editor that doesn’t highlight non-ASCII confusables).
alexey-salmin · 2h ago
> 1. Have two "i" characters on Turkish keyboards, one to use when writing in English, one in Turkish. Sounds difficult to get used to. Always need to be conscious about whether writing an "English i", or a "Turkish i".
But you have to do that anyway to be able to produce the correct capitalized version: an "English I" or a "Turkish İ".
elevatortrim · 1h ago
Hmm, you are kind of right but not exactly:
Yes, there are two keys, but their function is not to write the character as a "Turkish i" and an "English i". These keys are necessary because there are 4 variations, that need 2 keys to write with caps lock on and off:
Key 1 - Big and small Turkish "I": Caps Lock On: I Caps Lock Off: ı
Key 2 - Big and small Turkish "İ": Caps Lock On: İ Caps Lock Off: i
For small "Turkish i" and "English i" to be different characters, there would need to be a third key.
daveliepmann · 2h ago
No: a Turkish keyboard has separate i/İ and ı/I keys, and Türkish-writing users with an American/international keyboard use a keyboard layout with modifier keys so that the i/I key can be altered to ı/İ. (I do the latter for idiosyncratic reasons.)
The person you're replying to is pointing out that differentiating English-i from Türkish-i requires some other unwieldy workaround. Would you expect manufacturers to add a third key for English i, or for people with Turkish keyboards to use a modifier key (or locale switching) to distinguish i from i? All workarounds seem extraordinarily unlikely.
sebstefan · 2h ago
Ah, that's because I thought turks and azerbaijanis just switched keyboard layouts to type in english and to type in their native language.
elevatortrim · 1h ago
That's a sensible thought but Turkish QWERTY keyboard includes both the English-exclusive (Q, X, W) and Turkish-exclusive characters so switching is rarely required.
hudo · 3h ago
Reminds me to friends old but brilliant project, use Unicode to draw art on stack trace logs!
Enough with boring stack traces in logs, lets make some art there and make life a bit easier for the poor soul thats on support and has to debug latest prod issue. https://medium.com/@ironcev/stack-trace-art-4b700a8817ea
jongjong · 55m ago
This is one of the reasons why software development is so difficult, most people cannot even begin to imagine how complex the user environment can be. Even within very niche problem domains you may have to deal with a broad range of different environments with different locales, different spoken languages, operating systems, programming languages, compilers/transpilers, engine versions, server frameworks, cache engines, load balancers, TLS certificate provisioning, container engines, container image versions, container orchestrators, browsers, browser extensions, frontend frameworks, test environments, transfer protocols, databases (with different client and servers versions), database indexes, schema constraints, rate limiting... I could probably keep going for hours. Now imagine being aware of all these factors (and much more) and being aware of all possible permutations of these; that's what you need in order to be a senior software developer these days. It's a miracle that any human being can produce any working software at all.
As a developer, if some code works perfectly on your own computer, the journey has barely just begun.
So by alternating case you end up with ß→SS→ss or ẞ→ß→SS. Certainly has potential to screw with naive attempts at case-insensitive comparison via case folding. Then again, Unicode adopting 'ẞ' as the upper of 'ß' in some future version would probably only increase that potential further.
I'm interested to hear from people dealing with a lot of German text how much of a problem this is in practice.
The ‘ß’ is a ligature of the old ‘long s’ [1] which was written ‘ſ’ (because it’s common in old texts there is a Unicode code point for it).
This letter has no upper case version. Capitalized words starting with a long ‘ſ’ always used ‘S’.
Now in German language, to make this lowercase long ‘ſ’ a sharp ‘s’, ‘ſ’ followed by ‘z’ was written: ‘ſz’.
And these two were often typeset as a ligature, ‘ß’, for esthetic reasons.
That ligature then became the common case and eventually a letter recognized in German-speaking countries.
As a hypothetic analogy, imagine a ‘ll’ ligature, as in ‘fallacy’, becoming an English letter – by some twist of history.
As we saw, these were lowercase letters. And there is no uppercase version of ‘ſ’.
So the uppercase ‘ẞ’ that is now official recognized and has a Unicode code point should not look like this.
It's an absolute eye saw because all that was done was somehow make the letter look a bit more like a capital.
But it's nature of being two lowercase letters, originally, still makes it stand out like an eye sore for people with a background in typography, like myself.
IMHO It should look like ‘SZ’ (or ‘SS’), made into a ligature.
And as a type designer, I'd either refrain from filling that code point in a font I design, to protest this, or do the above: create a ligature of ‘SZ’ or ‘SS’ (alternative) and put that there.
[1] https://en.m.wikipedia.org/wiki/Long_s
First of all, 'ß' was a ligature -- a long time ago. It is a letter today. Disassembling it according to its original construction makes no sense today for any kind of argument about typesetting or Unicode. Further, 'ſ' is not used today in German at all, except for meta discussions like this or to stress how things used to be spelled. It makes no sense to mention it unless you are talking about font design or historic use of German (and other languages, for that matter).
Also, if you do mention it for the sake of talking about font design, in Latin fonts, 'ſs' is actually the basis for the design of 'ß', not 'ſz' -- that was mainly done in Blackletter/Fraktur when the 'z' looked different, maybe a bit like 'ʒ' (I used Unicode's'ezh' here hoping it looks right) so that old style 'ß' looks like a ligature of 'ſʒ'. This can still be seen occasionally, e.g., on Berlin street name signs. It is obsolete for most fonts today (although I quite like it).
Moreover, there is an upper case letter for 'ß': 'ẞ'. And it has existed in fine typology way before being adopted into Unicode. Actually, it's existence was probably the reason why it is now in Unicode. The official German rules are now: either use 'SS' or 'ẞ' for uppercase 'ß'. Most Germans probably do not even know that 'ẞ' exists as a choice today, although it was used on 'DER GROẞE DUDEN' even before Unicode existed.
And finally, how a glyph is designed is not necessarily decided on whether historic parts of an ancient ligature had upper case variants. So that 'ſ' has no upper case equivalent is irrelevant for both Unicode and type design.
But as a font designer or anything else, you can protest. No problem. Everyone has the right to protest. But please don't spill the Internet with wrong information, as there is enough of it already.
And I don't think 'SS'<->'ß' is similar to the Turkish 'I with/without dot' problem, because the default Unicode mapping for 'ß' is correct in all languages, while the Turkish (and also Azerbaijani) problem is correct or broken depending on language setting. This is way more problematic because an assumed universal equivalence does not hold. And you need to carefully distinguish whether a string is language specific or not, e.g., path names or IDs in data bases, etc.
I don't know if this counts as "correct" but it's still very confusing.
It may be a typographical abomination, but it's an intentional representation of that particular typographical abomination, just as the ox head in "A" intentionally has its horns pointing down.
https://languagelog.ldc.upenn.edu/nll/?p=73
What I’m saying is, this kind of thing is common.
I'm Turkish. I grew up in Turkey. These things happen, but let's not try to justify them. We should aim to get to a point where people share these "western values" (of not stabbing people).
People suffer worst than death over words all the time, even in the West. Some folks adhere to honour, some to political groups and ideologies, some religion, some to their social views; there are words that are treated as violence and responded to accordingly in every context.
One word is to get bored that's causing issues.
sık - to bore sik - to fuck
So if I write "sikildim" to say "I got bored", it actually becomes "I got fucked".
One way around it to capitalize. SIKILDIM is "I got bored" but now you are yelling. Typing "sıkıldım" is a hassle on a US keyboard though.
There might be some truth to it but it does not make much sense. Technically, ı would probably show up as □ instead of i if the phone had a hard time displaying it.
There is also the suffix not matching that change: sıkışınca vs sikişince. A becomes E in that suffix when you switch from ı to i. Even if the phone fucked up, "sikişinca" would look weird.
Shane there was no concept of self defence.
Problem with the current widely used approach of having global application wide locale setting is that most applications contain mix of User facing strings and technical code interacting with file formats or remote APIs. Doesn't matter if you set it to current language (or just let operating system set it) or force it to language independent locale, sooner or later something is going to break.
If you are lucky a programming language might provide some locale independent string functions, but using them is often clunky and and unlikely to be done consistently across whole code base and all the third party libraries. It's easier to do things correctly if you are forced to declare the intention from the start and any mixing of different context requires an explicit conversion.
https://learn.microsoft.com/en-us/windows/apps/design/global...
twitch
A classic which breaks lots of applications is the difference between number format "1,234.5" and "1.234,5" (some European countries).
The extra irony is that me and my colleagues live in a country that actually has this kind of locale, but no one in the entire extended team was using it, everyone uses a US locale.
Very expensive if you fuck up. Very embarrassing if you fuck up, too.
Several years ago we had issues with certification of our game on PS4 because the capitalization on Sony's Turkish translation for "wireless controller" was wrong. The problem being that Turkish dotless I. What was the cause? Some years prior we had had issues with internal system strings (read: stringified enums) breaking on certain international PC's because they were being upper/lowercased using locale-specific capitalization rules. As a quick fix, the choice was made then to change the culture info to invariant globally across the entire game. This of course meant that all strings were now being upper/lowercased according to English rules, including user-facing UI strings. Hence Turkish strings mixing up dotted and dotless I's in several places. The solution? We just pre-uppercased that one "wireless controller" term in our localization sheet, because that was the only bit of text Sony cared about. An ugly fix and we really should have gone through the code to properly separate system strings from UI texts, but it got the job done.
This isn't honored; we have many Unicode code points that look identical by definition and differ only in their secret semantics, but all of those points are in violation of the principles of Unicode. The Turkish 'i' is doing the right thing.
E.g. Cyrillic "а" looks the same as Latin "a" most of the time, they both are distant descendants of the Phoenician 𐤀, but they are two different letters now. I'm very glad they have different code points, it would be a nightmare otherwise.
It's a potential issue already depending on your script, and CJK also has this funny full English alphabet but all in double-width characters that makes it PITA for people who can't distinguish the two. But having it on a character as common as "i" would feel specially hellish to me.
There's already this problem for cyrillic 'e' and latin 'e' and hundreds of other characters
People use it to create lookalike URLs and phish people
https://www.pcmag.com/news/chrome-blocks-crafty-url-phishing...
Turkish isn't on a fully separate script, most letters are standard ascii and only a few are special (it's closer to French or German with the accentuated characters), so you don't have the explicit switch, it's always mixed.
https://en.wikipedia.org/wiki/Dotted_I_(Cyrillic)
The problem is that uppercasing the dotted i outputs a different character depending on your current locale. Using case-insensitive equality checks also break this way (I==i, except in a Turkish locale, so `QUIT ilike quit` is false).
What sebstefan is asking for is a Unicode character which is the non-capitalised form of Latin Capital Letter I With Dot Above (U+0130) which always gets capitalised to U+0130 and which U+0130 gets downcased to.
1. Have two "i" characters on Turkish keyboards, one to use when writing in English, one in Turkish. Sounds difficult to get used to. Always need to be conscious about whether writing an "English i", or a "Turkish i".
2. "i" key is interpreted as English "i" when in English locale, as a special unicode character when in Turkish locale. This would be a nightmare as you would then always have to be conscious of your locale. Writing in English? Switch to English locale. Writing code? Switch to English locale. Writing a Turkish string literal in code? Switch to Turkish, then switch back. It would need to be a constant switching between back and forth even though both are Latin alphabet.
Isn’t this already the case with other languages? For instance, the same key on the keyboard produces a semicolon (;) in English and a Greek question mark (;) in Greek. These are distinct characters that are rendered the same (and also an easy way to troll a developer who uses an editor that doesn’t highlight non-ASCII confusables).
But you have to do that anyway to be able to produce the correct capitalized version: an "English I" or a "Turkish İ".
Yes, there are two keys, but their function is not to write the character as a "Turkish i" and an "English i". These keys are necessary because there are 4 variations, that need 2 keys to write with caps lock on and off:
Key 1 - Big and small Turkish "I": Caps Lock On: I Caps Lock Off: ı
Key 2 - Big and small Turkish "İ": Caps Lock On: İ Caps Lock Off: i
For small "Turkish i" and "English i" to be different characters, there would need to be a third key.
The person you're replying to is pointing out that differentiating English-i from Türkish-i requires some other unwieldy workaround. Would you expect manufacturers to add a third key for English i, or for people with Turkish keyboards to use a modifier key (or locale switching) to distinguish i from i? All workarounds seem extraordinarily unlikely.
As a developer, if some code works perfectly on your own computer, the journey has barely just begun.