Wikipedia is using (some) generative AI now

1 handfuloflight 4 5/1/2025, 11:06:04 PM theverge.com ↗

Comments (4)

jjmarr · 20h ago
There's a ton of boring and rote maintenance tasks on the English Wikipedia that are too complicated for a bot but are simple enough for regex to mostly be accurate. Stuff like fixing spelling errors, links to article names, etc.

https://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser

Right now the tool (AutoWikiBrowser, linked above) is slow, old, Windows-only, and tough to maintain. AI will easily replace this. Heck, I'm already using it to generate the regex rules.

mubou · 20h ago
Automating tasks is exactly what AI/ML should be used for. My concern is that they're going to be using LLMs to "translate" other-language articles into English and vice versa. LLMs are horrendously bad at this, compared to models trained specifically for translation tasks. They make shit up, invent phrases that weren't in the source text, etc., and with how much blind faith people put in ChatGPT, you can be sure a lot of those hallucinations will go unchecked.

The funny part is, Wikipedia is the #1 data set used for all sorts of machine learning training (not just LLMs). I hope they at least mark articles that were translated/edited by AI, because otherwise the AI machine is gonna start feeding back into itself sooner or later.

layer8 · 20h ago
Automation is exactly what current-day generative AI should not be used for, because it lacks reliability and reproducibility.

I agree that it shouldn’t be used to translate Wikipedia pages without thorough review either.

mubou · 20h ago
Sorry, should have been more clear. I meant "AI" in the sense that people refer to anything using machine learning as "AI". (Honestly "AI" is such a meaningless term. LLMs are anything but intelligent.) But I agree. For most tasks, a non-gen AI model trained specifically for that task is significantly better. People are just taking the output of gen AI and using it as-is, rather than treating it as a tool to be leveraged as part of something larger and programmatic, like all ML before it.