Not accepting Accept-Language is one of my major pet peeves. What makes it worse is that many multilingual websites translate their language-switching buttons and the list of languages to the current language .... which is beyond fucking stupid and defeats the purpose. Wikipedia does this right. The button to switch languages is clear, using a universal multilanguage icon, and a list of languages (using the name of the language in that same language) in alphabetical order, with the most likely candidates on the top (presumably based on geoip).
E.g. an English Wikipedia page will present me with the following language suggestions:
Suggested languages
Deutsch
Français
Nederlands
When you assume a language, you make an ass of you and of me. Don't be an ass. Be like Wikipedia.
magicalhippo · 119d ago
A related issue that has me fuming is when, after arriving at a page of interest from a search engine, a modal popup forces me to select the country I'm from, and then promptly redirects me to the homepage of the regional website.
Some have a X button to close said dialog, but many don't which is really aggravating.
wudu · 119d ago
Google does this. I want to check out the new device they just released - "sorry, this product is not available in your country". I just wanted to read the specs, not buy.
jiggawatts · 119d ago
Products don’t get to be informed about the factory in which they are made, or which shop they are to be sold.
Well, it's in an order, but I don't know about alphabetical. I clicked on today's English featured article and looked at the languages: "中文", "Italiano" are "suggested", then the remainder are grouped by geographic region, and aren't particularly alphabetical. They appear to be in groups which are still not alphabetical. Europe seems to have a Cyrillic group but "Қазақша" is shown after "Українська" which isn't accurate in Kazakh and probably also unexpected for anybody who isn't familiar with the letter Қ (Қ isn't a letter in Russian, this is probably why this happens). The Chinese languages don't seem to be in stroke order (no expert here), although Korean is below them (because of course, K for Korean alphabetizes after C for Chinese).
Anyways, no hate for Wikipedia; they do a great job of localizing. Just a bit of nuance/pedantry about how you can't "alphabetize" language names in their own language.
bmn__ · 119d ago
> how you can't "alphabetize" language names
Not so, this sort order has been standardised as part of Unicode for at least 28 years. To see it in action, pipe the list of languages as a text file through a conforming tool like `ucsort`. When Қ is falsely sorted after Ч, then the wrong algorithm or no algorithm at all has been used.
> because of course, K for Korean alphabetizes after C for Chinese
That's not how it works.
adastra22 · 119d ago
Sort rules are different in different locales.
vikingerik · 119d ago
It's a circular dependency: how do you sort and list the locales or languages for someone to pick one, when by definition you don't know their locale yet?
You have to either make some best-guess approximation (IP geo, browser headers, etc) or use a locale-invariant sort, both of which will be wrong in some cases.
notpushkin · 119d ago
We can find a sorting order with the minimal total distance between where we place a language entry and where this entry would be in that language. If there’s no pair of languages A and Ä such that A > Ä in one and Ä > A in the other, then (I guess???) this total distance will be zero.
baobun · 118d ago
> A and Ä
Coincidentally, the expected position of "Ä" can vary wildly. Is it an umlauted A, normalized as AE, or a distinct letter coming after Z?
notpushkin · 118d ago
That’s also part of the reason I’ve chosen it for a placeholder / variable name! The actual placing is not important as long as it’s where speakers of the Ä language expect to find it.
Or suppose there’s languages Ä₁ and Ä₂, where in Ä₁ the ‘ä’ is the umlauted ‘a’ and in Ä₂ it’s a distinct letter. The language list would be displayed as:
A Ä₁ B C Z Ä₂
The only problem / corner case would be such a language Ä₀ that would e.g. sort ‘ä’ before ‘a’. I would still put it after, since it’s where most other readers would expect to find it.
numpad0 · 118d ago
> "Ä"
OT, but this looks like an adorably blushing hen to me
numpad0 · 119d ago
can't you just sort all as int? the codepages usually come roughly sorted, and while no one knows which of 檎 or 橙 comes first, I don't think it'll be particularly offensive whichever way a random app did, to most.
vikingerik · 119d ago
That would be one locale-invariant sort as I said. Sure, you can pick some way of doing it that's least-bad. The codepages are roughly sorted, but what we're debating is the cases where that fails some definition of correctness. The point is there can be no universally correct answer for sorting locales before the user picks one, because that can depend on already knowing the locale itself.
account42 · 112d ago
There is no such thing as standard codepage numbers.
mananaysiempre · 119d ago
Yes, the DUCET is bound to disappoint everybody (especially users of the Latin script with diacritics, as none of them agree on the sort order and everyone’s preferences are tied to the specific subset of diacritics they need), but at least it disappoints everybody more or less equally.
(Do yourself a favor, though, and use the CLDR root collation instead of the raw DUCET—they are basically the same, except, and I’m quoting the standard here[1], “the DUCET is not entirely well-formed”.)
Yes, that’s confusing and probably hard to find a good balance. Someone speaking Greek or Czech may expect to find their language around E (Ελληνικά) or C (Čeština), but nope, on Wiki it’s all the way after Z.
kazinator · 119d ago
The problem may be is that you need to set the locale in order to get certain alphabetization, but setting the locale won't happen until after the language is chosen.
A reasonable approach might be to sort the list of names by using, as the sort keys, the strings projected through a Unicode normalization function, followed by folding to upper case. Then Čestina gets mapped to CESTINA and at least appears among the C's.
nis251413 · 119d ago
Don’t special characters always go after the Latin alphabet? I think this is pretty common, and fairly expected behaviour. Of course nothing is perfect but I feel like the way Wikipedia handles it is consistent.
e-topy · 119d ago
Not in the Czech alphabet:
a, á, b, c, č, d, ď, e, é, ě, ...
Also, we regard 'Ch' as its own letter. So yeah, try sorting alphabetically. I'll wait.
If you want to see bizarre sort rules, look up how french sorts accent characters.
thaumasiotes · 119d ago
> If you want to see bizarre sort rules, look up how french sorts accent characters.
I tried to do this, but there do not appear to be any sources addressing this question.
I did find a French Stack Exchange question asking for this exact information, and complaining that there are no sources (other than an uncited wikipedia page) that address it. There is no answer posted, but there is a comment from a French guy suggesting that there are no official rules.
I notice that post suggests that Académie française specifies that accents should be sorted in reverse, and includes a link over the words "Académie française", and yet that link doesn't go to a supporting document.
A while ago I complained on this forum that Amazon's hyphenation for Kindle ebooks is abysmally bad. (Which is still true.) Someone responded to say that the hyphenation algorithm for English requires this. I pointed out that the hyphenation algorithm for English is a lookup table; each word has its hyphenation defined in the table, and when you need to hyphenate a word, you look up the hyphenation points.
Another response linked me to a paper describing how this table can be stored as a set of rules that provide hyphenation points in arbitrary letter sequences rather than dictionary words. That paper is very clear about its goals; it is an advance in data compression, proposing a method of storing a lookup table that takes less space than the table does. It carefully goes over how to produce the ruleset from the table.
But somewhere along the line, people confused the data compression algorithm (of storing the lookup table as a ruleset) for the hyphenation algorithm. They will now tell you with a straight face that a single ruleset that seems to have gone around represents the hyphenation algorithm for English, even if the word you want to hyphenate wasn't in the table that that ruleset was prepared from. And this is false.
It looks to me like something similar has happened in English speakers' understanding of French sorting order. It's very easy to explain why the example quadruplet has the sorting order it does:
cote
côte
coté
côté
(Note that the Stack Exchange question from 2024 and the blog post from 2004 use exactly the same example.)
These four words have two pronunciations, and the pronunciations are grouped with each other. After that, "cote" comes first by virtue of bearing no accents, and "o" comes before "ô" for the same reason.
What's happening here is that although French generally pretends that "e" and "é" are the same letter, they aren't, which forces -e (not pronounced) to come before -é (pronounced!). "o" and "ô" actually are the same letter, and can be ordered flexibly.
The rule "sort the accents in reverse" arises as a coincidence; it happens to be the case that this distinction is most significant at the end of French words. But French speakers would reject this ordering:
cetot
cétot
cetôt
cétôt
This doesn't come up because those words don't exist.
makapuf · 119d ago
Well in my language "é" is absolutely not special, and should definitively be placed near "e" (to the point that uppercase é is often written E instead of É)
If I recall correctly, the default propose a first list that push items which are guessed most likely what the user expect, then a list more complete, and in any case let you filter by typing. I think it also can change the way it behave if you are connected and tweaked your preferences in the matter for your account.
bawolff · 119d ago
Wikipedia uses UCA sort order in categories (depending on which lang wikipedia you are reading). Most other lists just sort using unicode codepoint order (in NFC). So it depends, but yes, for generated lists other than categories ascii characters usually come first.
philistine · 119d ago
That’s English hegemony. Languages have their own sorting that they expect. You can’t impose rules to other languages.
Of course at some point Unicode needs to be ordered, but you don’t get to impose technical details to people around the world because it matches with how English does it.
That’s where geo-ip guessing becomes relevant. Show a list with the most likely languages at the top.
swiftcoder · 119d ago
Or use the Accept-Language. Since we already know the User understands that one, it's probably a reasonable choice for which sort order they expect too.
adastra22 · 119d ago
That’s not English sort order either.
paulddraper · 119d ago
Sorting by character codes, yes.
But in the language native locale, no.
af78 · 119d ago
I guess the default (when no language is specified) is Unicode order:
U+005A LATIN CAPITAL LETTER Z
U+010C LATIN CAPITAL LETTER C WITH CARON
U+0395 GREEK CAPITAL LETTER EPSILON
soulofmischief · 119d ago
When serving that many languages, a search bar is paramount.
thaumasiotes · 119d ago
> The Chinese languages don't seem to be in stroke order (no expert here)
They are for me. In the Asia section, 中文 ["Chinese"] is listed first, followed by 吴语 ["Wu"] and then 粤语 ["Cantonese"]. Stroke order is first by stroke count and then by an obscure criterion that I don't know (and that, in my experience, Chinese people living in China also don't know), but stroke count is unambiguous and these are in order: 中 4, 吴 7, 粤 12.
Note that they aren't in alphabetical order: 中 Z, 吴 W, 粤 Y.
Japanese appears between Wu and Cantonese for unclear reasons.
matvore · 119d ago
It is sorted FIRST by radical and SECOND by stroke order. This is roughly equivalent to the Unicode codepoint sort if you stay in the basic multilingual plane. The order also puts literary chinese afer wu Chinese, which breaks with a pure stroke-count sort:
中文 - 中 = 丨 + 3 strokes
吴语 - 吴 = 口 + 4 strokes
文言 - 文 = 文 + 0 strokes
日本語 - 日 = 日 + 0 strokes
粵語 - 粵 = 米 + 7 strokes
thaumasiotes · 119d ago
Dictionary lookup is done first by radical and second by stroke count. Collation is not. Stroke count is first.
For example, I have a book of 成语 stories that gives its table of contents in non-alphabetical order. (Since nobody understands the traditional ordering, I also have several such books that put their table of contents in alphabetical order.)
Note that 三's radical is 一, the first Kangxi radical, and that 一 is listed first. Your theory is wrong. 三 isn't even first among the 3-stroke characters, which start (among these) with 口.
Why did you make up a false answer to this question?
matvore · 119d ago
The Wikipedia sort for the languages is as I stated above, with Literary Chinese and Japanese between Wu Chinese and Yue Chinese. I explained why it was sorted that way, because radical is considered first. You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.
I didn't say sorting is never done by stroke count alone. But I have seen radical+residual stroke count much more often than stroke count alone. Probably a result of the content I'm accessing. It's mostly Japanese and not intended for children.
The dictionary and non-dictionary sorting distinction that you make doesn't sound like a real thing. The audience, the country, and the number of items sorted are bigger factors. But you're not wrong in that stroke count is sometimes used alone.
thaumasiotes · 118d ago
> You could not explain why Japanese appeared between Wu and Yue because you insisted and continue to insist that radicals are not used.
I can't explain that because it's part of a different logical group, with its name written in a different script.† This puts it parallel to the Chinese options and to Korean.
> The Wikipedia sort for the languages is as I stated above
I took you to be describing the sort order for characters, not for wikipedia. Wikipedia doesn't obey that order either. You can check the page for Jiangsu, where all of the languages mentioned so far appear before the "Latin alphabet" style languages, but 閩南語 and 閩東語 appear after them.
† I also can't explain why wikipedia seems to have chosen 吴语 but 粵語, 客家語, and 贛語. Jiangsu is on the mainland... and so are Jiangxi and Guangdong.
matvore · 117d ago
> all of the languages mentioned so far appear before the "Latin alphabet"
> style languages, but 閩南語 and 閩東語 appear after them.
Could it have something to do with Minnan and Mindong Chinese articles being written in a Latin script, (despite the language name showing in both Chinese characters and Latin letters) ?
thaumasiotes · 117d ago
As far as I know, sure, it could.
bawolff · 119d ago
> Well, it's in an order, but I don't know about alphabetical. I clicked on today's English featured article and looked at the languages
This depends on whether you are viewing desktop site or mobile site. It also depends on if you have a non-default skin set in your preferences.
Seems like desktop (vector-2023) does the region thing.
Mobile does alphabetical by language name (i imagine codepoint order but i didnt check)
Some other skins are alphabetical by bcp47 code.
e-topy · 119d ago
And it even remembers what you chose last time and pushes it to the top.
That is UX. Being actually helpful and not fucking annoying.
whstl · 119d ago
Oh nice! I never noticed that "Suggested languages" shows languages I previously selected.
But additionally, I like how it's not simply "pushing to the top", it does shows a previously selected language on the top, but it still keeps it duplicate in the list below, in case the user is going by muscle memory.
To me this is the best way.
Either make it VERY OBVIOUS that you're removing the item from the bottom of the list (which wouldn't be possible here), or don't remove it at all.
If I had a cent for each time a SaaS made my life harder by trying to "help me" I would be CEO of every SaaS I use.
TheJoeMan · 119d ago
I can give a perfect bad example: the Youtube app on my iPhone somehow determined to change to Amharic. This is the Google support article: https://support.google.com/youtube/answer/87604 telling me the buttons to press in English. Also, I don't know/speak Amharic, and so at the time had no idea what language it was, and the iPhone translate doesn't even recognize this Ethiopian language. Bit of a pickle that could have easily been mitigated by a universal multilanguage icon.
JumpCrisscross · 119d ago
> the Youtube app on my iPhone somehow determined to change to Amharic
Was this about 6 weeks ago?
TheJoeMan · 119d ago
Yes, actually. Is there an article about it? After updating to iOS 18.4, Amharic was appended to the end of the list of preferred languages in the iPhone Settings app. However, what's interesting is it was ranked below English, and apps are supposed to use the languages list in order, but perhaps Youtube is alphabetically sorting the list?
JumpCrisscross · 119d ago
> Yes, actually. Is there an article about it?
Not that I know of. It just happened to me, too, around then. I thought it had to do with my pet fascination with the Ethiopian civil war and GERD.
distances · 119d ago
Or the ChatGPT app which can be baffling. My phone language is English, I've set ChatGPT app specifically to English in the app settings, I ask my questions in English, and every now and then it still decides to answer in German.
hombre_fatal · 119d ago
You can set the language individually per app in the iOS settings. But I thought it defaulted to your global setting.
tlb · 119d ago
There are two levels of this. If I get some other European language, I can generally figure out which is their word for English and it's not a big deal to switch. But if it gives me a script I don't know, like Bengali or something, it's a problem.
Perhaps every "choose language" menu should include English and Chinese in non-localized form, as an escape hatch, since almost every web users can recognize enough of them to navigate a menu to find their actual language.
adastra22 · 119d ago
Just include languages in their own script only. Why would a user need to select a language from that menu for which they DON’T know the target script? Showing Bengali in Bengali script is exactly what you want.
bawolff · 119d ago
True, but how often do you want to select a language you can't read?
derf_ · 119d ago
My favorite is the sites who do parse Accept-Languages, but then pick the last one in the list instead of the first. I have mine in rough order of my competence in them, which gets me my least-competent language on some sites even when my most-competent is supported.
I get a kick out of it when I see it, because you can understand how it happens. "Well, at least you tried."
bmn__ · 119d ago
It is wrong to blame the server here. For better results in content-negotiation, a user-agent should allow you to assign numeric weights instead of just a list (implying the same level of preference). Example:
Accept-Language: en;q=0.7,pt;q=0.3
If you already send something similar to this, and the server gets it wrong, then this is an outright bug, the software or its operator is out of compliance with HTTP.
drtgh · 119d ago
This parameter, at first glance, appears to be used as numeric weights for automatic translations served by default, what turns browsing very uncomfortable (wrong translations, distorted texts).
Ie. Google, Youtube, Reddit.
Automatic translations should never be served by default, but only be loaded if the user requests it. The classical "do you want translate".
bmn__ · 119d ago
It was and still is used for manually translated text, among other things. Does this help you get the full picture?
drtgh · 118d ago
I don't need help, I'm criticising the use of the automatic translations served by default, and that are being used the weights of this parameter to do it.
The full picture? The weights seem to be more useful for fingerprinting and perhaps for server SEO than to help the users. Users who in the end will have to give the same weight to all the languages, or rewrite the outgoing headers, in order to be able to browse the Internet.
gus_massa · 119d ago
99% agree, but there is a problem on mobile, to switch from Spanish to English when I click the glass to search for alternatives. I have to type "Ing" (that are the first letters of "Inglés") while it shows "English" in the list of matches. It would be better if I can type both "Ing" or "Eng".
netsharc · 119d ago
It's even more amusing when the displayed list looks like it's sorted randomly, but in reality it's sorted alphabetically in a different language to the display language..
e.g.
Nederlands
English
Français
Deutsch
Espagnol
(but if sorted in English: Dutch, English, French, German, Spanish)
elric · 119d ago
I don't understand why you would want to select Inglés instead of English? You want to selecf English, or Español, or Nederlands, or Deutsch, or whatever language. If makes no sense for it to be translated.
gus_massa · 118d ago
Only in a very few weird corner cases. If it's an article about a city in Germany, I may like to see the article in German and use autotranslation to read t in English or Spanish.
Sometimes the article in the local language has more info. I had that problem in comments about places or events in Argentina. Sometimes the English article has less info than the Spanish article, so I made a link to the autotranslation.
lxgr · 119d ago
Ironically, given TFA, it seems to be primarily using the user's IP:
> How does Universal Language Selector determine which languages I may understand
> ULS queries a service that determines your originating country based on your IP address. This is inaccurate in some cases. Based on the country code, most often spoken languages are suggested for you.
Indeed. Wikimedia wikis' language selection feature relies on Unicode CLDR language-territory data. This is very complex to maintain (and there are still many mistakes to fix), because reliable data is expensive to collect.
Funny. The Wikipedia home page has a "Language" button. Like that, in plain English. And it is translated to the language you switch to.
tapia · 119d ago
I also hate the youtube "feature" that translates the titles of videos to your setting's language. This is so annoying. I can understand English and don't need these automatic translations.
lucasoshiro · 119d ago
> I can understand English and don't need these automatic translations
I think it is far worse than that:
1. If I don't understand a language, probably that video is not for me. Most videos targeted for international audience are in English, or at least the author translated it by theirself.
2. Titles are small sentences, and they don't have enough context to be translated. Once I saw a video called something like "Vamos assistir uma conexão com o passado", which in Portuguese means "Let's watch a connection to the past". I needed to de-translate it in my brain to understand that the original title was "Let's play A Link to the Past"
3. Online resources are a great way to exercise a second language. So, please, don't underestimate my capabilities. At least let me try to read in the original language by myself, if I need the translation I how to use Google Translate or a dictionary.
I reckon that this feature makes the access to online content more democratic, it's ok. But at least let me disable that since it makes the experience worse
marcosdumay · 119d ago
There's a video that Youtube keeps sending me with a translated title "O segredo das lavadouras" (what translates to "the secret of washing machines") that is about picking screw washers...
But the real problem is when it decides to translate the titles of some perfectly watchable videos in English into something that uses the Cyrillic alphabet, what has no relation to my accepted languages, and is only used half-way across the world from where I am.
avhception · 119d ago
My computer is set to English even though I'm German, and sometimes Youtube will treat me to this really uncanny machine voice with really weird phrases because it auto-translated some German video or advert.
Lidl is worth it, ja!
FinnLobsien · 119d ago
I absolutely hate this. I have the exact same thing. Even if the technology was good, I speak both languages and want to see the original.
Why is it so hard to just add something as a setting/feature and offer it to people without forcing it on the user?
bunderbunder · 119d ago
> Why is it so hard to just add something as a setting/feature and offer it to people without forcing it on the user?
Office politics. Google is famously "performance-driven", so the manager in charge of that feature needs usage metrics to be high for the sake of their own career.
(Speculating, of course.)
FinnLobsien · 119d ago
That's a funny idea—if the KPI was boosting adoption of a feature and the PM just made that feature the default and suddenly adoption was through the roof.
The sad part is we can't rule that out.
preisschild · 119d ago
This would also be good for movies :D
I can speak german, I don't need forced subtitles for the nazis
jiehong · 119d ago
Sometimes we do, when actors actually don’t speak German very well (or Russian or Chinese or French)
avhception · 119d ago
I wonder if Lidl or the other advertisers know and approve of this.
FinnLobsien · 119d ago
I mean it's probably somewhere, deep in the ToS but pretty sure if you showed that machine voice to the advertisers they wouldn't approve.
dgb23 · 119d ago
Same here. My native tongue is German, I live in Switzerland, but my settings on all devices are English.
I do this on purpose, because I find everything is more searchable. I don't even know any German terms for most technical things I might search or look for. So even if the automatic translations were good, which they aren't, this would be a non-feature.
My browser already tells them what my preferred language is. Just use it.
netsharc · 119d ago
Living in Zurich (German part of Switzerland for those who don't know), Windows 10 in English, the built-in Microsoft Store used to offer content in... French.
Now it's a mix of German and English, e.g. 1 heading is "Spiele-Bestseller", and the next is "Best selling apps". And prices displayed as "28,00 CHF" (correct would be to use the decimal point).
just dislike video and move on. I'm guessing Google wants uploader penalized, and I do feel sorry but it's not my problem.
genocidicbunny · 119d ago
Not only the titles, but also the audio track. There's a few youtubers I regularly watch who are trying to branch out into some additional languages by providing fan-made translated audio tracks, and english is sometimes one of those. Every single time I watch one of those videos, I have to manually set the language back to the original because often the translations lose some of the word play or hidden meaning in the original language. Often it also means I need to rewind the video because it started playing before all the controls have loaded (because youtube hates FF with youtube-related extensions) and I could swap the language track back.
One of the sister replies linked to an extension to help with that, which I'm going to give a try, but it's annoying that there's not a simple toggle in the youtube settings to tell it to always use the original language. On the rare occasion that I want to use the translated audio track, I can do _that_ on my own; I speak enough languages that this is a very rare occasion with the type of content I watch.
This isn't even something I can understand as them being hostile to ad blocking or wanting to push ads. This is a 'convenience' feature that is just poorly implemented. But I'm sure there's some PM that got a pat on the back for it.
vintermann · 119d ago
I'm sure YouTube's algorithm rewards people for using this feature and making their "content accessible", but if you serve me up an ugly machine translated Norwegian title rather than the English one I could read just fine, that's from my experience a signal that your YouTube channel is low quality algorithm-chasing garbage, so I click "never recommend this channel".
bmn__ · 119d ago
What a catastrophe. You punish the wrong person, and even worse, a channel owner will not even receive that signal! The vast majority of channel owners with English content is not aware what's going on. A friendly e-mail to the channel owner explaining the problem and asking to manually disable auto-translation is much more likely to achieve what you want.
If you want to get rid of auto-translation on a systematic level, provide feedback to the operators of Youtube through their official communication.
clan · 119d ago
So what you're saying: Please complain through proper channels and hope they accept your input?
Or should he just keep using the signals he gets and immediately clean up his feed?
I actually see this as a feature. YouTube recommends a lot of garbage. I suggested that they improved my feed but they implemented this signal instead. I use the exactly this method to weed out a lot of content I do not care for.
You cannot tell the 500 pound gorilla anything. I prefer my videos without subtitles. I have that set as a preference. Yet when chromecasting it is common for the subtitles to spontaneously turn on. And has done so for a long time.
English is not my first language and my first language is not widely used. Hence I am not used to dubbed movies/programming and I am used to seeing subtitles.
If a native english speaker could understand the horror show that the machine generated subtitles are. If you are used to subtitles they are extremely hard to ignore. You will then read and get the understanding (often hilariously wrong) before the audio catches up and you might end up rather confused.
I can understand an American might have a hard time watching a subbed German movie. Thats natural because it is not common. But when you grew up with subtitles it is actually effortless. Except when they are poor. Then it becomes worse because of the cognitive load of 2 languages and the effort to figure out what is correct.
Dear english only speakers: Translation is hard. A poor translation is worse that no translation as it obfuscates the message. AI is not there yet at all. Maybe impressive but often not helpful or plain and simply distorts the real message.
eCa · 119d ago
As a fellow non English native speaker, I concur with all of the above. But if you only have time for one sentence:
> A poor translation is worse than no translation
bmn__ · 119d ago
What I wanted to transport is the following idea: attacking a channel owner (who is most likely innocent and did nothing wrong) with a metaphoric sledge-hammer when a more gentle and precise tool will do is not a great way to conduct oneself in society. vintermann and clan have a feed now without content that bothers them, but at the cost of lowering the channel owner's reputation in the eyes of the operators of Youtube, with the effect of slashing recommendations for the videos of the channel owner at large and his earnings. That's not nice, we should be considerate of the consequence of our actions. Does this make sense, do you understand this perspective?
This behaviour rankles me, I think is on the same level as the misuse of the feature "report this as spam (to some upstream entity/3rd party)" for e-mail messages that are not actually spam.
vintermann · 119d ago
Attacking? By saying "don't recommend this", I'm just saying I want to give someone else the chance to be seen, rather than the ones who will make their stuff objectively worse in order to juice their stats for the algorithm.
I'm sure my "don't recommend this" clicks don't in any way make up for Google's promotion of channels that "make their content accessible", because it doesn't even stop them from recommending me more machine-translated videos.
bmn__ · 118d ago
There will be no understanding if you do not even make a token effort to suppress your egocentric worldview and engage in honest conversation.
clan · 118d ago
Did you?
They base the feed on user input. The feed is then (supposedly) adjusted to what I like.
What I call a signal you call an attack.
I signal that I do not like Minecraft videos. But I do not attack them.
Your anger is misdirected. You should be mad at YouTube because they do not seem to understand that there can be multiple signals at once.
The chances that I click on a Minecraft video is low. Autotranslated even lower.
So we differ strongly in opinion on how the platform should work. I read your "attack" argument as I should write to the Minecraft creators and tell them their content would be better if they played Minesweeper instead.
I do not punish anyone. I just pursue a clean and (for me) high quality feed.
If you are up in arms that I punish your channel that is another signal that I am probably not your target audience.
When dealing with audiences at scale you need to listen to these signals as handling personal opinions in mails from the discerning viewer is not feasible.
vintermann · 119d ago
The vast majority of videos are not translated into borked machine-Norwegian, so if this isn't something you opt in to, it's something everyone opts out of (I doubt it).
> A friendly e-mail to the channel owner explaining the problem and asking to manually disable auto-translation is much more likely to achieve what you want.
No it isn't, because I see what kinds of channels do this, over and over again. They're very clearly publishers who don't care that they make something objectively worse as long as the algorithm rewards them for it.
> If you want to get rid of auto-translation on a systematic level, provide feedback to the operators of Youtube through their official communication.
Ha, as if they ever read that. Probably more Google employees will read this comment than will ever read any of my (many) "please stop translating things without asking, I know where to find machine translation if I need it, doing it without asking that means I have to translate back from broken Norwegian into English in order to understand what the hell you were trying to say" feedback reports.
bluGill · 119d ago
The channel I'm interested in would be of great interest to English speakers. There are only a few people brave/stupid enough to travel to dangerous places (Ukraine near the front) to do a documentary. I cannot blame the author for turning on the translate, it likely overall expands his reach and is a good thing for those who are not interested in his native language. However I'm trying to learn his native language and getting dropped to English out of my control is not helpful to me.
SpicyLemonZest · 119d ago
Youtube really doesn't make it obvious that a title got auto-translated. I now realize that I've seen this happen before, with a video that had a different title on my TV than on my computer, but up until this very second I thought it was my TV's fault.
Even being aware of this - how do I know that it's an auto-translation, rather than someone making AI slop in my native language, without watching the video?
Narishma · 119d ago
One way to tell is when the video has text in the thumbnail. If it's in a different language than the title it has likely been auto translated.
gus_massa · 119d ago
I guess it's the default option. I've seen a few good channels that have that "feature" enabled. I hate it too.
sph · 119d ago
Now Reddit results are translated as well in Google, Kagi, so you think you have found a relevant response in your language, but it's just a machine translation from an English post.
hengheng · 119d ago
Leads to foreign-language posts on English-speaking small subreddits as well. I see plenty of Portugese, Spanish, Italian and German in communities that barely have enough traffic to debate in a single language.
But nobody pays to get answers, so it's alright.
aequitas · 119d ago
At least for Kagi they seem to working on solution[0]. But Reddit seems to be fighting back by translating server side so it's no longer detectable.
Thanks for the link, good to know. Gives me a fuzzy feeling to pay for a search engine whose devs you can actually interact with and are actually working on improving their product.
fifnir · 119d ago
I've been noticing the same, this completely breaks searching for reddit results for me
nilslindemann · 119d ago
Try "Reddit Untranslate" addon.
reddalo · 119d ago
I'm a bit fed up with having to use a million plugins to make the web usable.
xeeeeeeeeeeenu · 119d ago
You can filter them out by adding this operator to the query:
-inurl:?tl=
qiine · 119d ago
duckduckgo seems to do it as well
whstl · 119d ago
Yep, this is coming from Reddit itself. It's using different URLs, and they seem to be making an effort to SEO-rank those translations.
int_19h · 118d ago
It's interesting that the quality is so low. You can do very good translations for many languages today even with fairly cheap LMs, but for some reason (cost?) automated translation online seems to be still mostly at Google Translate level.
qiine · 117d ago
its impressive how poor the end result turnout, when zero effort want to be spent
captainpiggies · 119d ago
It's even worse for videos with "official" dubs. I have been jump scared by German and French dubs on certain videos recently, I distinctly remember MrBeast, Mark Rober and Nick DiGiovanni. I have set my language to English and my Region to US (worldwide) I don't know what gave YT the idea to preselect these dubs for me, I have seldomly even watched a video that is not English.
whstl · 119d ago
Yep. Youtube is the worst:
- If I select German subtitles for a German video, it will auto-translate all English subtitles to German in the future.
- If I select subtitles for an English video, same.
- If the video has an Arabic, Hindi, French human-made subtitle to help that audience, it shows it to me instead of the automatic captions
Horrible.
Freak_NL · 119d ago
And you can't turn it off. I really hate this non-feature.
Using Brave on iOS I haven't encountered it yet. Perhaps it strips some information? But with the official YT app I have, and it was both fascinating and annoying.
echoangle · 119d ago
I don't even get the point of that. If I need a translation of the title, I won't be able to watch the video anyways. At least ot makes some sense with the horribly auto-translated videos now, but they had the title translation for a long time while the video was still the original language.
immibis · 119d ago
There's been automatic subtitle translation for a long time.
echoangle · 119d ago
Good point, I didn’t even think about that because I would never watch a video in a foreign language with auto-translated subtitles.
anton-c · 119d ago
I get you and normally wouldn't either. I was a little impressed when I could switch to like 10 diff languages on the fly. As a foreign language learner seems like it could be pretty helpful.
It's probably most useful for utility or news content. Not 'high effort videos' about an interesting topic. I'm imagining you find a video in another language that fixes a problem you have and can switch to your language to watch.
gus_massa · 119d ago
I sometime watch videos in English with the automatic subtitles. Sometimes I can't understand a few words and the subtitles help me. Most of the time I watch them without subtitles, and rewind the video a few seconds to rewatch a short part with the subtitles enabled.
tirant · 119d ago
Worse is the auto-dubbing in some channels. Which cannot be disabled. That has resulted in me stopping to watch a channel completely due to the inability to select the original language (youtube mobile website).
Thanks! This is great. Although embarrassing for YouTube.
nilslindemann · 119d ago
I can also recommend FreeTube
franga2000 · 119d ago
What I don't get is how the feature works. I see it for veeery few videos and those are usually highly profitable clickbait and/or big budget productions, so my assumption has been this is actually something the uploader has to enable or even fill out. My language is very "small", so it makes sense that only the broadly-popuar and highly-profitable would be worth translating the titles for.
Unfortunately, all the translations are machine-translated garbage and there is no setting to turn this off as a viewer, so it's just incredibly annoying.
world2vec · 119d ago
Gosh this annoys me so much. I am native Portuguese speaker but have all my settings in English. It always tries to auto-dub Portuguese content into English, how do I turn that off?
Unai · 119d ago
Can't recommend "DeArrow" browser extension enough. YouTube is a miserable experience without it (and its sister extension SponsorBlock).
nilslindemann · 119d ago
It is not just that these translations are not needed, they are often - in my case German - of a low quality, contain errors and lose information which the original language contained. And the roboter voices loses all the interesting modulations of the original voice. Even a Fireship video sounds terrible when translated.
preisschild · 119d ago
Even reddit does that now and for some reason shows the translated version by default when I search a post through google.
Very annoying, because instead of just seeing the english post, which I'm easily able to understand, I see half-broken german...
littlecranky67 · 119d ago
So much this. I suspect the idea that a person speaks more than one language is absent in US silicon valley. Else I can't explain why youtube only lets you set one language. Heck, even google allows you to configure all spoken languages in your account, the very same google account you use for youtube. Yet youtube ignores it and has its own settings.
tuetuopay · 119d ago
I think the issue is not speaking more than one language, but not preferring your native language over the original content's language. This is a very American-English-centric view of the world, where content is made for your language and your demographic. Consumption from outside the US is the exception.
In the rest of the world, and especially in Europe, this is the norm, not the exception. On one hand there is the prevalence of US English media (hello hollywood), US english literature (esp. in tech); and on the other hand cross-consumption between EU countries is much more common.
Oh, and the content ends up being in English too, because that's how to reach many people. We don't want those to be translated, because we don't want a double translation. This is something that the US / Silicon Valley mind cannot comprehend.
troupo · 119d ago
> I suspect the idea that a person speaks more than one language is absent in US silicon valley.
Which has been baffling to me considering how many foreigners work at these companies.
genocidicbunny · 119d ago
I think it's more a matter of "why would they have their system language set to X if they speak Y? If they want Y, they should just set their system language to Y!"
It's the idea that the user has a preference for something, and it applies always and everywhere, even when it's not applicable.
littlecranky67 · 119d ago
It should be absolutely clear, when i speak English and German, do not auto-translate any video title in those languages to the other. You wouldn't believe how bad the translations are, and how unwanted by me (the user). Worse when you speak a third or fourth language, and tend to watch videos. It gets messy.
genocidicbunny · 119d ago
Yeah I know, I watch videos in six different languages and the automatic translation are pretty universally bad.
Denvercoder9 · 119d ago
> why would they have their system language set to X if they speak Y? If they want Y, they should just set their system language to Y!
If only they respected my system language. All my language settings are set to English, yet I routinely get autotranslated crap to my native language.
genocidicbunny · 119d ago
It was more of an example in how they pick up on _some_ signal about a users language preference and then arrogantly assume they're correct in their decision, and that it's the user's fault if they assumed wrong.
int_19h · 118d ago
This is actually something that foreigners working at Big Tech US companies should be able to understand very well, because English as a system language is often how software developers set things up for themselves regardless of their native language.
But they don't make those decisions. It's a UX thing, which means that in practice whoever is in charge of "driving up the numbers" is going to be making the decision; the engineers just get to cuss while implementing it.
fifnir · 119d ago
> I suspect the idea that a person speaks more than one language is absent in US silicon valley.
Exactly, it's like they've never left their own state levels of ignorance
b3lvedere · 119d ago
While i appreciate the effort that Mark Rober puts in his Youtube videos making them multilanguage, i absolutely hate that native voice. It's one of the few Youtube shorts i have to play twice because of it.
joseda-hg · 119d ago
Jesus H Christ, I once a month google to see if there's a proper way of stopping this (You can block this with TamperMonkey)
2 things that absolutely kill my experience,
1. Messing up with titles, specially if the contents of the video are still in a different language, Which Kurzgesagt will I get today? Only YouTube knows, this is annoying if I know the youtuber could use a different language in the title to make a joke
2. Messing up with the default audio tracks, I don't mind if the YouTuber has a dubbed track, that's awesome for getting more exposure, but I already know and expect a specific voice and it's extremely jarring
I know what Mark Rober sound like, leave it be
mft_ · 119d ago
By the by, one quirk of poor language handling that I think it probably harming YouTube is advert language.
Every one of my subscriptions is an English language channel, and my language choice on all Google properties (where possible) including YouTube is English. It's not hard to judge that English is my favoured language.
And yet... every video advert I receive when travelling is served in the local country's language. It doesn't especially bother me, since I actively avoid listening to adverts (and indeed, now pay for Premium lite to avoid them almost altogether) but it's a weird not-so-edge case that I'd have thought a company as large as Google might have addressed already. They've absolutely got the tech to deliver adverts in any language. (And it could be powerful: imagine receiving adverts for local businesses in your native language while on holiday.)
harshreality · 119d ago
Do you have en-US or en-GB as an alternate, lower-priority language?
If an English variant is in your Accept-Lang: headers, I'd hope YT wouldn't auto-translate English titles.
The other thing that Google might properly use is account-specific language settings. But if they're using GeoIP as has been suggested, I agree they're doing it wrong.
lxgr · 119d ago
> If an English variant is in your Accept-Lang: headers, I'd hope YT wouldn't auto-translate English titles.
Your hope is unfortunately entirely misplaced. Google is one of the worst offenders for assuming language and region from users' IP.
thrance · 119d ago
Hear! Hear! It enrages me. They also automatically turn the subtitles ON, making you constantly have to disable them. There is no way for multilingual users to add a list of the languages they understand, which is an insane limitation that's been driving me crazy for years at this point. Wtf are they even working on at youtube's HQ? Making video thumbnails larger still?
Jotalea · 119d ago
And most of the times, the translation misses a core part of the title, making it harder or even impossible to understand. A recent example is "I booted windows from Google Drive (part 2)", which got translated to "inicié ventanas de Google Drive", which misses the whole point. Luckily for me, the miniature said what the video was about, and I could understand and watch it.
About the translation, sure, "boot" ≈ "iniciar", "window" = "ventana", but for (microsoft) windows, and other names in foreign languages, the same name must be kept.
cenamus · 119d ago
I mean, I can deal with the titles, but recently it has been auto selecting machine translated sound tracks, without any way to disable it. And they're bad, like maybe one level above 2010 phone TTS system
bunderbunder · 119d ago
So much this. I also hate the implicit assumption that everyone understands just one language that's baked into this kind of design. I can comfortably read in four languages, and naturally prefer the original language to (typically) bad localization. So your guess at what language I would prefer based on my IP is virtually guaranteed to be wrong. Seriously, just look at what languages I'm already telling you work for me. There's no reason to assume I'm not smart enough to correctly configure my browser's language settings.
It gets even worse with YouTube and their awful AI dubbing that's always on by default. So now for solidly half the videos I watch, I need to (1) open it, (2) click through the settings to turn off the AI dubbing, and then (3) rewind back to the beginning and start over. It doesn't take a lot of time, but it's incredibly annoying.
danhau · 119d ago
YouTube‘s AI dubbing is truly awful. It took me 5 minutes to realize the audio I was hearing was coming from the video itself, and not some random ad in a different tab.
I also don‘t like that video titles are shown translated. It‘s so weird when I‘m watching a video in spoken English, yet the title is in a different language.
tuetuopay · 119d ago
It's especially painful since they don't translate all subtitles.
I'm french, but my browsers are configured for English. Some (but not all!) english titles are translated to french, and some french titles are translated to english. This actually caused me to miss a few uploads from channels I'm subscribed to, as the language of the title is part of my mind-filtering for those channels.
bluGill · 119d ago
I even have Spanish set in Google as one of my known languages and it still auto translates everything to English. I'm not fluent in Spanish, but one of the well known good ways to get there is to consume native content, but YouTube makes that hard.
doix · 119d ago
My biggest annoyance is with Google. They know who I am, they know I am traveling, they know my language preferences (English) and yet I still get language based on my location on certain pages.
I let you track me Google, please use it for some good UX and not just advertising.
kiliancs · 119d ago
Indeed. Catalan speakers have Spanish forced down their throat no matter if Spanish has never been associated to the Google account in any way, nor in the system or browser language preferences.
In my case, I live in the United States, but Google is determined to serve me Spanish results even for Catalan-related queries. E.g. preferring the Spanish Wikipedia. The search engine's behavior has had ups and downs over the years, but it has never been great.
This is very much a problem for my children, who don't understand Spanish, as well as for the Catalan-speaking regions of the world that are not in Spain, including Andorra.
In my experience, Gemini easily flags any Catalan content as unsafe and prevents the conversation from continuing. Even for prompts like "summarize this article". This may have improved lately, but still.
Google used to be an example in sensitivity to the world's diversity, being a responsible major player. Way back. Now, although I applaud some efforts multiple teams continue making, it is obvious this is no longer a priority.
Jotalea · 119d ago
>In my experience, Gemini easily flags any Catalan content as unsafe and prevents the conversation from continuing.
I'm curious, what are the keywords that trigger that?
dkjaudyeqooe · 119d ago
Indeed, somehow Google is the worst offender with this.
Lately they've decided that auto translating the local language into English in Maps reviews is the wrong thing to do. They translate every other language into English but somehow since I live in this place I must speak the local language too, so I don't need that in English.
Ditto for search results. Surely you want Wikipedia in the local language! I mean you've been there for so long! You search for things in the local language, surely that's a sign of your preference and not the fact that searching for things locally requires use of the local language.
This also applies to so much other "we must make our software so smart and guess all your preferences". Google fails so consistently at this I cannot understand why they persist other than some sort of misplaced corporate self regard.
edarchis · 119d ago
I've had this argument with a Google Developer.
He told me that for efficiency, they had different stages in the content rendering and that the main page structure didn't have your user information yet. That's rubbish IMHO because the accept language header should be readily available in that phase.
MoreQARespect · 119d ago
I've seen similar dysfunctions in other big orgs where a feature or bugfix would need to cross team boundaries and the outcome inhabits zones of vaguely defined responsibility.
The guy you argued with sounds like they were semi-justifying this with the typical "noogler" rose colored glasses.
Zak · 119d ago
> He told me that for efficiency, they [had to make a broken product]
That's called premature optimization.
scotty79 · 119d ago
My worst experience was that after arriving in a new country the Play store didn't show local apps because my Google account was assigned to the old country. And changing the country wasn't easy and meant abandoning the old country and it's apps. Since I travel a bit back and forth I ended up buying a second phone and creating an account for the new country.
tamirzb · 119d ago
This is indeed extremely annoying and I never understood why so many apps are configured to only be available in specific countries. Like what at all do they stand to gain doing this?
Google will then go on to complain about users installing APKs from shady sources but this practice pushes users to do so. I'm sure a decent amount of users ended up with malware on their phones just because they wanted to install an app that wasn't available in their listed country.
makeitdouble · 119d ago
You solve it the best way to fit your case I guess. On android I created a set of alternative accounts that each belong to a different country.
All accounts can be active at the same time on the same phone, there is a dropdown to switch in the Store app, and that works even with a work profile on the side. I've yet to see real downsides, except for course remembering which account is on which country and manually switching.
scotty79 · 119d ago
Thank you. I may try the same when the time comes to ditch the old phone.
aembleton · 119d ago
Its strange that Google knows I live in the UK and speak English. When I'm signed in to a TV in a hotel room in Spain watching English YouTube videos it then shows me a Spanish advert. Just feels really silly when I don't understand it and they know full well that I don't understand it - still they can charge the advertisers.
pona-a · 119d ago
When I was in Romania for my IELTS, I could not use Google Maps. Despite my Google account specifying my preferred languages as English, Ukrainian, Russian in that order and my Accept Language header set only to English, that was not enough to not discount those preferences as a configuration error and serve me Romanian.
Using Google search, which luckily did not decide to show me "local" results to an English query like it often does home, I found a support thread suggesting I set my Accept Language to have something other than English as a second language. Lo and behold, the page decided to now respect it.
knorker · 119d ago
Yeah it's amazing that Google is the worst offender.
I think this is because half of Google live their entire career in California, so they don't know about other languages, units, time zones at all.
It's weird, because they employ SO many foreigners, bringing them to California. But somehow upon arrival they all get memory wiped about the existence of anything outside the bay area.
Other companies do this right. Google is user hostile.
No. I will NEVER navigate by bike, foot, or public transport in these strange America-only units.
theshrike79 · 117d ago
About a decade ago Google decided that all maps in Finland should have the street names in Swedish.
Which is kinda valid, in the southern and south-western parts this is done because there is a significant Swedish-speaking minority so most cities and streets have names in both languages.
But at the time I lived in central Finland, where the streets DIDN'T have official Swedish names, they just ... translated them. Which was super fun for navigating.
yonatan8070 · 119d ago
That's so annoying, every time I'm on a new device/browser, Google and all their services start in Hebrew. Even though I'm signed in and have changed it to English a million times already. It's not that I can't read it, I'd just rather have everything in a universal language rather than a translation
lblume · 119d ago
What incentives does Google have to improve UX in this way? I absolutely agree that it should be the case, but the people for whom it matters are (1) completely insignificant wrt to the whole user base and (2) mostly care about tracking and try to circumvent it.
plastic3169 · 119d ago
There are 700+ million people living in Europe. The countries are tiny, most have bunch of official languages. The fix would be to use users selected language and not to flip flop it based on location. IP based location guessing doesn’t work even down to right country in here.
bluGill · 119d ago
That is not an incentive. There is nothing in to for Google.
It is of course useful for those 700+million, but they are not customers of Google, they users/the product. So long as you won't go elsewhere (in mass) you don't matter.
account42 · 112d ago
They have the same incentive as they have for adding localization in the the first place.
OtherShrezzing · 119d ago
>(1) completely insignificant wrt to the whole user base and
At any one time, there's got to be tens of millions of people accessing Google from a country which has a primary language unknown to the traveller. Even if this number is insignificant compared to Google's full user base, the cost for Google to service 20-30mn people with a feature is presumably lower than their annual ad revenues across 20-30mn people.
sneak · 119d ago
Sad fact: most people don’t go anywhere.
People like us are an edge case.
j16sdiz · 119d ago
One don't need to travel to be multilingual.
Many EU country have more than one official languages.
Most previous colony is bilingual.
sneak · 119d ago
I was replying to doix, not TFA.
account42 · 112d ago
Still applies. I don't want to have google forcing the "local" language on me even in the country I live in (which is also the country on my passport).
xorcist · 119d ago
Much more importantly: Never ever auto-translate content to the user's language.
Present what languages you actually have the data in. The user is smart enough to click the "translate" button in the web browser should they want. That translation is also likely to be better quality.
English is not my first language. Or my second. But I understand it well enough to work in it every day. And I never ever want to wade through auto-translated garbage just to find the right button to read the original English version. Because for some reason this is only a problem for English, web sites using other locales never do it, which should be indication enough that international visitors hate it.
If you ever think about using machine translation tools for you web site, first you must do a full translation round trip for every language before publishing. Translate, paste back the result and translate back. That is roughly what you intended to publish. Don't do it.
dominicrose · 119d ago
I hate reddit because it auto-translates to French and users have a specific style that would be hard or impossible to really human-translate anyway.
Even if I didn't speak English the translated content would not be worth it.
The feature to change the language or show original content is hard to find and it depends on wether it's the app or mobile web or desktop web.
They also try too hard to make us download the app but that's an other issue.
carlosjobim · 119d ago
Machine translation has been good enough for a few years, that even native speakers don't notice it.
lesostep · 117d ago
Which makes it soo much worse, actually.
I often go through a few sentences or even entire paragraphs before stumbling onto something that doesn't make sense. And then I have to go back and reread everything.
Microsoft documentation is the worst offender. Their "some parts were translated automatically" notice is hard to, well, notice. And their translation is great! — until it doesn't work, because UI has different translation.
carlosjobim · 117d ago
How would it be better if you couldn't read it or write it?
account42 · 112d ago
The whole point is that in many cases we can read the original language just fine and have our browser set up accordingly, but these companies think they know better.
int_19h · 118d ago
If done with SOTA LLMs, yes, depending on the language (although even then I would dispute the "even native speakers don't notice it" part).
But SOTA LLMs cost money, so what you usually get in practice for auto-translation at scale (like YouTube) is around Google Translate level, which is hardly good enough.
carlosjobim · 118d ago
I don't know what level you consider DeepL or Kagi Translate, but those two can translate at a level which is indistinguishable from a native speaker.
E.g. an English Wikipedia page will present me with the following language suggestions:
When you assume a language, you make an ass of you and of me. Don't be an ass. Be like Wikipedia.Some have a X button to close said dialog, but many don't which is really aggravating.
Well, it's in an order, but I don't know about alphabetical. I clicked on today's English featured article and looked at the languages: "中文", "Italiano" are "suggested", then the remainder are grouped by geographic region, and aren't particularly alphabetical. They appear to be in groups which are still not alphabetical. Europe seems to have a Cyrillic group but "Қазақша" is shown after "Українська" which isn't accurate in Kazakh and probably also unexpected for anybody who isn't familiar with the letter Қ (Қ isn't a letter in Russian, this is probably why this happens). The Chinese languages don't seem to be in stroke order (no expert here), although Korean is below them (because of course, K for Korean alphabetizes after C for Chinese).
Anyways, no hate for Wikipedia; they do a great job of localizing. Just a bit of nuance/pedantry about how you can't "alphabetize" language names in their own language.
Not so, this sort order has been standardised as part of Unicode for at least 28 years. To see it in action, pipe the list of languages as a text file through a conforming tool like `ucsort`. When Қ is falsely sorted after Ч, then the wrong algorithm or no algorithm at all has been used.
> because of course, K for Korean alphabetizes after C for Chinese
That's not how it works.
You have to either make some best-guess approximation (IP geo, browser headers, etc) or use a locale-invariant sort, both of which will be wrong in some cases.
Coincidentally, the expected position of "Ä" can vary wildly. Is it an umlauted A, normalized as AE, or a distinct letter coming after Z?
Or suppose there’s languages Ä₁ and Ä₂, where in Ä₁ the ‘ä’ is the umlauted ‘a’ and in Ä₂ it’s a distinct letter. The language list would be displayed as:
A Ä₁ B C Z Ä₂
The only problem / corner case would be such a language Ä₀ that would e.g. sort ‘ä’ before ‘a’. I would still put it after, since it’s where most other readers would expect to find it.
OT, but this looks like an adorably blushing hen to me
(Do yourself a favor, though, and use the CLDR root collation instead of the raw DUCET—they are basically the same, except, and I’m quoting the standard here[1], “the DUCET is not entirely well-formed”.)
[1] https://www.unicode.org/reports/tr10/#Well_Formed_DUCET
A reasonable approach might be to sort the list of names by using, as the sort keys, the strings projected through a Unicode normalization function, followed by folding to upper case. Then Čestina gets mapped to CESTINA and at least appears among the C's.
Also, we regard 'Ch' as its own letter. So yeah, try sorting alphabetically. I'll wait.
If you want to see bizarre sort rules, look up how french sorts accent characters.
I tried to do this, but there do not appear to be any sources addressing this question.
I did find a French Stack Exchange question asking for this exact information, and complaining that there are no sources (other than an uncited wikipedia page) that address it. There is no answer posted, but there is a comment from a French guy suggesting that there are no official rules.
https://french.stackexchange.com/questions/54217/french-dict...
How were you imagining I would look this up?
Or a more technical version at https://www.unicode.org/reports/tr10/#Backward
Another case that is kind of weird is thai https://www.unicode.org/reports/tr10/#Rearrangement
I notice that post suggests that Académie française specifies that accents should be sorted in reverse, and includes a link over the words "Académie française", and yet that link doesn't go to a supporting document.
A while ago I complained on this forum that Amazon's hyphenation for Kindle ebooks is abysmally bad. (Which is still true.) Someone responded to say that the hyphenation algorithm for English requires this. I pointed out that the hyphenation algorithm for English is a lookup table; each word has its hyphenation defined in the table, and when you need to hyphenate a word, you look up the hyphenation points.
Another response linked me to a paper describing how this table can be stored as a set of rules that provide hyphenation points in arbitrary letter sequences rather than dictionary words. That paper is very clear about its goals; it is an advance in data compression, proposing a method of storing a lookup table that takes less space than the table does. It carefully goes over how to produce the ruleset from the table.
But somewhere along the line, people confused the data compression algorithm (of storing the lookup table as a ruleset) for the hyphenation algorithm. They will now tell you with a straight face that a single ruleset that seems to have gone around represents the hyphenation algorithm for English, even if the word you want to hyphenate wasn't in the table that that ruleset was prepared from. And this is false.
It looks to me like something similar has happened in English speakers' understanding of French sorting order. It's very easy to explain why the example quadruplet has the sorting order it does:
(Note that the Stack Exchange question from 2024 and the blog post from 2004 use exactly the same example.)These four words have two pronunciations, and the pronunciations are grouped with each other. After that, "cote" comes first by virtue of bearing no accents, and "o" comes before "ô" for the same reason.
What's happening here is that although French generally pretends that "e" and "é" are the same letter, they aren't, which forces -e (not pronounced) to come before -é (pronounced!). "o" and "ô" actually are the same letter, and can be ordered flexibly.
The rule "sort the accents in reverse" arises as a coincidence; it happens to be the case that this distinction is most significant at the end of French words. But French speakers would reject this ordering:
This doesn't come up because those words don't exist.Of course at some point Unicode needs to be ordered, but you don’t get to impose technical details to people around the world because it matches with how English does it.
That’s where geo-ip guessing becomes relevant. Show a list with the most likely languages at the top.
But in the language native locale, no.
U+005A LATIN CAPITAL LETTER Z
U+010C LATIN CAPITAL LETTER C WITH CARON
U+0395 GREEK CAPITAL LETTER EPSILON
They are for me. In the Asia section, 中文 ["Chinese"] is listed first, followed by 吴语 ["Wu"] and then 粤语 ["Cantonese"]. Stroke order is first by stroke count and then by an obscure criterion that I don't know (and that, in my experience, Chinese people living in China also don't know), but stroke count is unambiguous and these are in order: 中 4, 吴 7, 粤 12.
Note that they aren't in alphabetical order: 中 Z, 吴 W, 粤 Y.
Japanese appears between Wu and Cantonese for unclear reasons.
中文 - 中 = 丨 + 3 strokes
吴语 - 吴 = 口 + 4 strokes
文言 - 文 = 文 + 0 strokes
日本語 - 日 = 日 + 0 strokes
粵語 - 粵 = 米 + 7 strokes
For example, I have a book of 成语 stories that gives its table of contents in non-alphabetical order. (Since nobody understands the traditional ordering, I also have several such books that put their table of contents in alphabetical order.)
Here is the collation order in the book:
一 七 八 入 九 人 口 千 小 三 亡 大 不 专 天 井 见 毛 月 文 风 为 心 水 四 ...
Note that 三's radical is 一, the first Kangxi radical, and that 一 is listed first. Your theory is wrong. 三 isn't even first among the 3-stroke characters, which start (among these) with 口.
Why did you make up a false answer to this question?
I didn't say sorting is never done by stroke count alone. But I have seen radical+residual stroke count much more often than stroke count alone. Probably a result of the content I'm accessing. It's mostly Japanese and not intended for children.
The dictionary and non-dictionary sorting distinction that you make doesn't sound like a real thing. The audience, the country, and the number of items sorted are bigger factors. But you're not wrong in that stroke count is sometimes used alone.
I can't explain that because it's part of a different logical group, with its name written in a different script.† This puts it parallel to the Chinese options and to Korean.
> The Wikipedia sort for the languages is as I stated above
I took you to be describing the sort order for characters, not for wikipedia. Wikipedia doesn't obey that order either. You can check the page for Jiangsu, where all of the languages mentioned so far appear before the "Latin alphabet" style languages, but 閩南語 and 閩東語 appear after them.
† I also can't explain why wikipedia seems to have chosen 吴语 but 粵語, 客家語, and 贛語. Jiangsu is on the mainland... and so are Jiangxi and Guangdong.
This depends on whether you are viewing desktop site or mobile site. It also depends on if you have a non-default skin set in your preferences.
Seems like desktop (vector-2023) does the region thing.
Mobile does alphabetical by language name (i imagine codepoint order but i didnt check)
Some other skins are alphabetical by bcp47 code.
But additionally, I like how it's not simply "pushing to the top", it does shows a previously selected language on the top, but it still keeps it duplicate in the list below, in case the user is going by muscle memory.
To me this is the best way.
Either make it VERY OBVIOUS that you're removing the item from the bottom of the list (which wouldn't be possible here), or don't remove it at all.
If I had a cent for each time a SaaS made my life harder by trying to "help me" I would be CEO of every SaaS I use.
Was this about 6 weeks ago?
Not that I know of. It just happened to me, too, around then. I thought it had to do with my pet fascination with the Ethiopian civil war and GERD.
Perhaps every "choose language" menu should include English and Chinese in non-localized form, as an escape hatch, since almost every web users can recognize enough of them to navigate a menu to find their actual language.
I get a kick out of it when I see it, because you can understand how it happens. "Well, at least you tried."
Ie. Google, Youtube, Reddit.
Automatic translations should never be served by default, but only be loaded if the user requests it. The classical "do you want translate".
The full picture? The weights seem to be more useful for fingerprinting and perhaps for server SEO than to help the users. Users who in the end will have to give the same weight to all the languages, or rewrite the outgoing headers, in order to be able to browse the Internet.
e.g.
(but if sorted in English: Dutch, English, French, German, Spanish)Sometimes the article in the local language has more info. I had that problem in comments about places or events in Argentina. Sometimes the English article has less info than the Spanish article, so I made a link to the autotranslation.
> How does Universal Language Selector determine which languages I may understand
> ULS queries a service that determines your originating country based on your IP address. This is inaccurate in some cases. Based on the country code, most often spoken languages are suggested for you.
(from https://www.mediawiki.org/wiki/Universal_Language_Selector/F...)
https://www.mediawiki.org/wiki/Universal_Language_Selector/F...
I think it is far worse than that:
1. If I don't understand a language, probably that video is not for me. Most videos targeted for international audience are in English, or at least the author translated it by theirself.
2. Titles are small sentences, and they don't have enough context to be translated. Once I saw a video called something like "Vamos assistir uma conexão com o passado", which in Portuguese means "Let's watch a connection to the past". I needed to de-translate it in my brain to understand that the original title was "Let's play A Link to the Past"
3. Online resources are a great way to exercise a second language. So, please, don't underestimate my capabilities. At least let me try to read in the original language by myself, if I need the translation I how to use Google Translate or a dictionary.
I reckon that this feature makes the access to online content more democratic, it's ok. But at least let me disable that since it makes the experience worse
But the real problem is when it decides to translate the titles of some perfectly watchable videos in English into something that uses the Cyrillic alphabet, what has no relation to my accepted languages, and is only used half-way across the world from where I am.
Why is it so hard to just add something as a setting/feature and offer it to people without forcing it on the user?
Office politics. Google is famously "performance-driven", so the manager in charge of that feature needs usage metrics to be high for the sake of their own career.
(Speculating, of course.)
The sad part is we can't rule that out.
I can speak german, I don't need forced subtitles for the nazis
I do this on purpose, because I find everything is more searchable. I don't even know any German terms for most technical things I might search or look for. So even if the automatic translations were good, which they aren't, this would be a non-feature.
My browser already tells them what my preferred language is. Just use it.
Now it's a mix of German and English, e.g. 1 heading is "Spiele-Bestseller", and the next is "Best selling apps". And prices displayed as "28,00 CHF" (correct would be to use the decimal point).
Like Van Halen's brown M&Ms, it just shows how sloppily this thing is programmed: https://www.snopes.com/fact-check/brown-out/
At least I know I didn't mess anything on my WebOS TV.
just dislike video and move on. I'm guessing Google wants uploader penalized, and I do feel sorry but it's not my problem.
One of the sister replies linked to an extension to help with that, which I'm going to give a try, but it's annoying that there's not a simple toggle in the youtube settings to tell it to always use the original language. On the rare occasion that I want to use the translated audio track, I can do _that_ on my own; I speak enough languages that this is a very rare occasion with the type of content I watch.
This isn't even something I can understand as them being hostile to ad blocking or wanting to push ads. This is a 'convenience' feature that is just poorly implemented. But I'm sure there's some PM that got a pat on the back for it.
If you want to get rid of auto-translation on a systematic level, provide feedback to the operators of Youtube through their official communication.
Or should he just keep using the signals he gets and immediately clean up his feed?
I actually see this as a feature. YouTube recommends a lot of garbage. I suggested that they improved my feed but they implemented this signal instead. I use the exactly this method to weed out a lot of content I do not care for.
You cannot tell the 500 pound gorilla anything. I prefer my videos without subtitles. I have that set as a preference. Yet when chromecasting it is common for the subtitles to spontaneously turn on. And has done so for a long time.
English is not my first language and my first language is not widely used. Hence I am not used to dubbed movies/programming and I am used to seeing subtitles.
If a native english speaker could understand the horror show that the machine generated subtitles are. If you are used to subtitles they are extremely hard to ignore. You will then read and get the understanding (often hilariously wrong) before the audio catches up and you might end up rather confused.
I can understand an American might have a hard time watching a subbed German movie. Thats natural because it is not common. But when you grew up with subtitles it is actually effortless. Except when they are poor. Then it becomes worse because of the cognitive load of 2 languages and the effort to figure out what is correct.
Dear english only speakers: Translation is hard. A poor translation is worse that no translation as it obfuscates the message. AI is not there yet at all. Maybe impressive but often not helpful or plain and simply distorts the real message.
> A poor translation is worse than no translation
This behaviour rankles me, I think is on the same level as the misuse of the feature "report this as spam (to some upstream entity/3rd party)" for e-mail messages that are not actually spam.
I'm sure my "don't recommend this" clicks don't in any way make up for Google's promotion of channels that "make their content accessible", because it doesn't even stop them from recommending me more machine-translated videos.
They base the feed on user input. The feed is then (supposedly) adjusted to what I like.
What I call a signal you call an attack.
I signal that I do not like Minecraft videos. But I do not attack them.
Your anger is misdirected. You should be mad at YouTube because they do not seem to understand that there can be multiple signals at once.
The chances that I click on a Minecraft video is low. Autotranslated even lower.
So we differ strongly in opinion on how the platform should work. I read your "attack" argument as I should write to the Minecraft creators and tell them their content would be better if they played Minesweeper instead.
I do not punish anyone. I just pursue a clean and (for me) high quality feed.
If you are up in arms that I punish your channel that is another signal that I am probably not your target audience.
When dealing with audiences at scale you need to listen to these signals as handling personal opinions in mails from the discerning viewer is not feasible.
> A friendly e-mail to the channel owner explaining the problem and asking to manually disable auto-translation is much more likely to achieve what you want.
No it isn't, because I see what kinds of channels do this, over and over again. They're very clearly publishers who don't care that they make something objectively worse as long as the algorithm rewards them for it.
> If you want to get rid of auto-translation on a systematic level, provide feedback to the operators of Youtube through their official communication.
Ha, as if they ever read that. Probably more Google employees will read this comment than will ever read any of my (many) "please stop translating things without asking, I know where to find machine translation if I need it, doing it without asking that means I have to translate back from broken Norwegian into English in order to understand what the hell you were trying to say" feedback reports.
Even being aware of this - how do I know that it's an auto-translation, rather than someone making AI slop in my native language, without watching the video?
But nobody pays to get answers, so it's alright.
[0] https://kagifeedback.org/d/5212-low-quality-translated-reddi...
- If I select German subtitles for a German video, it will auto-translate all English subtitles to German in the future.
- If I select subtitles for an English video, same.
- If the video has an Arabic, Hindi, French human-made subtitle to help that audience, it shows it to me instead of the automatic captions
Horrible.
It's probably most useful for utility or news content. Not 'high effort videos' about an interesting topic. I'm imagining you find a video in another language that fixes a problem you have and can switch to your language to watch.
Unfortunately, all the translations are machine-translated garbage and there is no setting to turn this off as a viewer, so it's just incredibly annoying.
Very annoying, because instead of just seeing the english post, which I'm easily able to understand, I see half-broken german...
In the rest of the world, and especially in Europe, this is the norm, not the exception. On one hand there is the prevalence of US English media (hello hollywood), US english literature (esp. in tech); and on the other hand cross-consumption between EU countries is much more common.
Oh, and the content ends up being in English too, because that's how to reach many people. We don't want those to be translated, because we don't want a double translation. This is something that the US / Silicon Valley mind cannot comprehend.
Which has been baffling to me considering how many foreigners work at these companies.
It's the idea that the user has a preference for something, and it applies always and everywhere, even when it's not applicable.
If only they respected my system language. All my language settings are set to English, yet I routinely get autotranslated crap to my native language.
But they don't make those decisions. It's a UX thing, which means that in practice whoever is in charge of "driving up the numbers" is going to be making the decision; the engineers just get to cuss while implementing it.
Exactly, it's like they've never left their own state levels of ignorance
2 things that absolutely kill my experience,
1. Messing up with titles, specially if the contents of the video are still in a different language, Which Kurzgesagt will I get today? Only YouTube knows, this is annoying if I know the youtuber could use a different language in the title to make a joke
2. Messing up with the default audio tracks, I don't mind if the YouTuber has a dubbed track, that's awesome for getting more exposure, but I already know and expect a specific voice and it's extremely jarring
I know what Mark Rober sound like, leave it be
Every one of my subscriptions is an English language channel, and my language choice on all Google properties (where possible) including YouTube is English. It's not hard to judge that English is my favoured language.
And yet... every video advert I receive when travelling is served in the local country's language. It doesn't especially bother me, since I actively avoid listening to adverts (and indeed, now pay for Premium lite to avoid them almost altogether) but it's a weird not-so-edge case that I'd have thought a company as large as Google might have addressed already. They've absolutely got the tech to deliver adverts in any language. (And it could be powerful: imagine receiving adverts for local businesses in your native language while on holiday.)
If an English variant is in your Accept-Lang: headers, I'd hope YT wouldn't auto-translate English titles.
The other thing that Google might properly use is account-specific language settings. But if they're using GeoIP as has been suggested, I agree they're doing it wrong.
Your hope is unfortunately entirely misplaced. Google is one of the worst offenders for assuming language and region from users' IP.
About the translation, sure, "boot" ≈ "iniciar", "window" = "ventana", but for (microsoft) windows, and other names in foreign languages, the same name must be kept.
It gets even worse with YouTube and their awful AI dubbing that's always on by default. So now for solidly half the videos I watch, I need to (1) open it, (2) click through the settings to turn off the AI dubbing, and then (3) rewind back to the beginning and start over. It doesn't take a lot of time, but it's incredibly annoying.
I also don‘t like that video titles are shown translated. It‘s so weird when I‘m watching a video in spoken English, yet the title is in a different language.
I'm french, but my browsers are configured for English. Some (but not all!) english titles are translated to french, and some french titles are translated to english. This actually caused me to miss a few uploads from channels I'm subscribed to, as the language of the title is part of my mind-filtering for those channels.
I let you track me Google, please use it for some good UX and not just advertising.
In my case, I live in the United States, but Google is determined to serve me Spanish results even for Catalan-related queries. E.g. preferring the Spanish Wikipedia. The search engine's behavior has had ups and downs over the years, but it has never been great.
This is very much a problem for my children, who don't understand Spanish, as well as for the Catalan-speaking regions of the world that are not in Spain, including Andorra.
In my experience, Gemini easily flags any Catalan content as unsafe and prevents the conversation from continuing. Even for prompts like "summarize this article". This may have improved lately, but still.
Google used to be an example in sensitivity to the world's diversity, being a responsible major player. Way back. Now, although I applaud some efforts multiple teams continue making, it is obvious this is no longer a priority.
I'm curious, what are the keywords that trigger that?
Lately they've decided that auto translating the local language into English in Maps reviews is the wrong thing to do. They translate every other language into English but somehow since I live in this place I must speak the local language too, so I don't need that in English.
Ditto for search results. Surely you want Wikipedia in the local language! I mean you've been there for so long! You search for things in the local language, surely that's a sign of your preference and not the fact that searching for things locally requires use of the local language.
This also applies to so much other "we must make our software so smart and guess all your preferences". Google fails so consistently at this I cannot understand why they persist other than some sort of misplaced corporate self regard.
He told me that for efficiency, they had different stages in the content rendering and that the main page structure didn't have your user information yet. That's rubbish IMHO because the accept language header should be readily available in that phase.
The guy you argued with sounds like they were semi-justifying this with the typical "noogler" rose colored glasses.
That's called premature optimization.
Google will then go on to complain about users installing APKs from shady sources but this practice pushes users to do so. I'm sure a decent amount of users ended up with malware on their phones just because they wanted to install an app that wasn't available in their listed country.
All accounts can be active at the same time on the same phone, there is a dropdown to switch in the Store app, and that works even with a work profile on the side. I've yet to see real downsides, except for course remembering which account is on which country and manually switching.
Using Google search, which luckily did not decide to show me "local" results to an English query like it often does home, I found a support thread suggesting I set my Accept Language to have something other than English as a second language. Lo and behold, the page decided to now respect it.
I think this is because half of Google live their entire career in California, so they don't know about other languages, units, time zones at all.
It's weird, because they employ SO many foreigners, bringing them to California. But somehow upon arrival they all get memory wiped about the existence of anything outside the bay area.
Other companies do this right. Google is user hostile.
No. I will NEVER navigate by bike, foot, or public transport in these strange America-only units.
Which is kinda valid, in the southern and south-western parts this is done because there is a significant Swedish-speaking minority so most cities and streets have names in both languages.
But at the time I lived in central Finland, where the streets DIDN'T have official Swedish names, they just ... translated them. Which was super fun for navigating.
It is of course useful for those 700+million, but they are not customers of Google, they users/the product. So long as you won't go elsewhere (in mass) you don't matter.
At any one time, there's got to be tens of millions of people accessing Google from a country which has a primary language unknown to the traveller. Even if this number is insignificant compared to Google's full user base, the cost for Google to service 20-30mn people with a feature is presumably lower than their annual ad revenues across 20-30mn people.
People like us are an edge case.
Many EU country have more than one official languages.
Most previous colony is bilingual.
Present what languages you actually have the data in. The user is smart enough to click the "translate" button in the web browser should they want. That translation is also likely to be better quality.
English is not my first language. Or my second. But I understand it well enough to work in it every day. And I never ever want to wade through auto-translated garbage just to find the right button to read the original English version. Because for some reason this is only a problem for English, web sites using other locales never do it, which should be indication enough that international visitors hate it.
If you ever think about using machine translation tools for you web site, first you must do a full translation round trip for every language before publishing. Translate, paste back the result and translate back. That is roughly what you intended to publish. Don't do it.
The feature to change the language or show original content is hard to find and it depends on wether it's the app or mobile web or desktop web.
They also try too hard to make us download the app but that's an other issue.
I often go through a few sentences or even entire paragraphs before stumbling onto something that doesn't make sense. And then I have to go back and reread everything.
Microsoft documentation is the worst offender. Their "some parts were translated automatically" notice is hard to, well, notice. And their translation is great! — until it doesn't work, because UI has different translation.
But SOTA LLMs cost money, so what you usually get in practice for auto-translation at scale (like YouTube) is around Google Translate level, which is hardly good enough.