Wikipedia: Signs of AI Writing

34 FergusArgyll 8 7/27/2025, 11:38:37 AM en.wikipedia.org ↗

Comments (8)

mnaimd · 7m ago
There are two major problems with Wikipedia doing this:

1. False Positives: phrases like "on the other hand", "not only x but y" are definitely used by humans. You can't simply accuse others for using AI by just checking some phrases to be in text. I mean AI itself is trained on text written by humans, so the reason it uses those phrases is because they are more common in it's training set.

2. By making a set of what seems like AI, they give people the opportunity to just tell AI what phrases NOT to use. Every person who prompts to AI, can use it to make it more like human. Ironically, what the wikipedia itself was trying to stop.

wronex · 28m ago
This is purely anecdotal, but I think I’ve seen ChatGPT insert special space characters other than normal space. It also likes to use the different dash characters (en, em and hyphen) more than would appear in normal text.
hackermeows · 34m ago
Cool , I just include this in the prompt when writing for wiki. And ask the llm specifically to not write like this . What am i missing?
serialNumber · 28m ago
The fact that it’s still highly likely to write like this and hallucinate information.
sertraline · 53m ago
AI models were 'taught' English language by using cheap Indian and African workers who would rate most suitable words to use, so this is not as much an article about detecting AI as it is an article describing the way Indians and Africans talk in English language.
constantcrying · 1h ago
I think this is actually a bad idea, especially the language and tone part.

You can not detect AI writing by the language and tone, all LLMs are trained and prompted to write in a very particular style. You can just tell them to write in a different style and they will. What is worse that the default LLM writing style is actually quite common. If you read through that list you will also see that many of these are very much human errors.

Trying to detect what is and isn't LLM generated text will only lead to people chasing ghosts, either accusing innocent people or putting faith in text which is the result of more careful prompting.

rgoulter · 1h ago
> You can just tell them to write in a different style and they will.

I'm guessing the priorities are to have contributions which stick to Wikipedia's guidelines. The LLM tendencies cited are in violation of those.

I don't think the game is strictly "we only want human contributions", where you can imagine a sophisticated LLM-user crafting a reasonable contribution which doesn't get rejected.

The "accidental disclosure" section indicates that some of these bad contributions are just very low effort.

supriyo-biswas · 1h ago
Not in this particular case; the point of Wikipedia is to surface objective and factual information (we could debate what "objective/factual" information are, but that's a different issue).

The issue with LLMs is that they try to insert a lot of judgement about the subject matter without quantification or comparison. A lot of this is already covered by Wikipedia's other rules, such as those about weasel words, verifiability etc. but it is useful to have rules that specifically detect AI content, and by proxy, also take out all the bad human writing along with it.

For example, when asked about person X who discovered a method to do Y, a LLM may try to write "As a testament to X's ingenuity, he also discovered method Y, which helps achieve Z in a rapid and effective manner"; it doesn't really matter whether it was written by a LLM as this writing style is unsuited for Wikipedia. Instead, one may have to quantify it by writing "He/she discovered method Y, a method to do Z, which was regarded as an improvement over historical methods such as P and Q", with references to X discovering Y, and research that cites that improvement.

LLMs could adopt that latter writing style and cite references, but the issue there is that a large market that wants to simply use it to decompress their documents to satisfy the intricacies of the social structure they are embedded in. As an example, someone may want to prove to their manager that they produced a well researched report, but since their manager may have to conduct said research in order to know whether it meets their bar and instead use the document length as a proxy. LLMs meet a lot of such use cases and it'd be difficult to take away this "feature".