Cuss: Map of profane words to a rating of sureness

41 tosh 33 5/31/2025, 10:18:40 AM github.com ↗

Comments (33)

Fnoord · 6m ago
The Dutch word 'kunt' (je kunt = you can) gets censored in WoW because of 'cunt'. That is, if you have mature language filter on. I have this on because I have no interest in raging kids in said game, but I do want to read simple, common Dutch words. Annoys me to this day. CS gave the obvious answer (WONTFIX, with obvious workaround disabling the mature language filter altogether). It could be solved easily by looking at context instead of simple blacklisting. I connect from a Dutch IPv4. I sometimes talk Dutch. The same would be true for the other endpoint.
donatj · 3h ago
Something we have had to deal with in managing educational software with a writing aspect is trying to manage what is offensive to who, in what context and where is not universal at all.

One of the most prime examples, at one point a number of terms related to homosexuality had made it onto the list at the request of a larger district. These are also terms that are being reclaimed, and it was... a difficult problem to try to satisfy everyone, and it did upset other districts. I believe their patterns were all but removed eventually.

We have a fought over the list of definitions and every change provoked controversy. Our current solution is just that we mark items for teacher review but don't tell them why. We don't say they are offensive, we don't say what the problematic words are. We just say it might need review. That's worked pretty well so far.

All this is to say, policing speech is a problem best avoided.

bee_rider · 3h ago
Unfortunately, whether or not a term is really offensive is a combination of what it is, who said it, and when/where (at least in the common-sense definition). Unfortunately, because this is directly opposed to our (at least in the US, and in most countries rooted in liberalism) sense of fairness which says that rules should be applicable universally, across all people and in all contexts.

Which is to say… policing speech is a problem best avoided!

morkalork · 3h ago
I worked in completely different field and I had to give up on flagging any variations of "shit". Turns out there's working-class boomers will utter some form or another in every other sentence. Nothing harmful just like "my brother is full of horse shit", "my job is bullshit".
kps · 2m ago
I'm post-boomer, but let me tell you, shit's still fucked.
bee_rider · 3h ago
“Shit” is pretty good because it is crass but not offensive (in the sense that it doesn’t target any particular group). And of course it describes a lot of what’s happening nowadays.
gherkinnn · 1h ago
Shitty to limit the use of shit to working-class boomers.
SamBam · 26m ago
I'm confused as to the purpose of all the zeros. Since this is far, far from a complete list of all English words, what's the difference between a word not being on the list vs a word being a zero?

I can kind of see "was this a word they considered and scored, vs. not considered?" when trying to assess whether the project is comprehensive, but from a programming standpoint, it just seems like it's going to have a lot of useless overhead, since by the time I'm looking up the word I don't care whether it's a zero or a miss.

(I also find the scoring of "2" for many of the words to be weird, like "yank," "chug," "looser" etc. as they can all have perfectly normal meanings.)

kevin_thibedeau · 6m ago
Australian is on the list as a zero for some reason. Belgians are missing altogether. Then there's the unfortunate word ????ardly which is also scored zero, has no historical use as an offensive word, but nobody gets to use it today.
blueflow · 4h ago
Typical cuss filter UX:

types something in live chat

some random word from the sentence gets censored out

"Why did this just got censored out?"

check urban disctionary

"Why?????"

Bonus points if its regular ethnonyms that are classified as profanities, so people from that place are having big trouble to tell where they are from.

donatj · 3h ago
I have vivid memories of Digg back in the day censoring out absolutely baffling things in the middle of otherwise regular words.
genewitch · 6m ago
Go gently caress yourself
AriedK · 22m ago
Helpful tool for car makers.

Would have probably saved them from the Mitsibishi Pajero, Ford Pinto, Mazda Laputa

Downside is, it doesn’t analyze phonetics afaict. The hebrew Volkswagen Beetle (Hipushit) would have passed as fine.

PaulHoule · 4h ago
Was really amused to see that a paper had English's most prominent profane word in it's abstract on arXiv last month for the first time:

https://arxiv.org/search/?query=fuck&searchtype=all&source=h...

though somebody did slip in a use in a comment earlier.

Aachen · 2h ago
It seems to require specifying all spelling variants of a word https://github.com/words/cuss/blob/6bab3fef250481e34ba55bc40...

And then fails to do that for words that are not uncommonly written with a space https://github.com/words/cuss/blob/6bab3fef250481e34ba55bc40...

Making this a complete list will probably be a challenge when it needs to be a byte-for-byte match

Blackarea · 5h ago
Could have been in a language agnostic format (eg. csv)
weinzierl · 3h ago
I think the value add here is being a software package. The lists exist elsewhere and the package authors supplied sources. If you really need a combined list it should be trivial to generate it from the code.
CSMastermind · 5h ago
It's certainly an interesting data set, though it has no concept of severity. As far as I can tell, "doodoo" is the same as some racial slurs: we're 100% certain they're bad words.
swah · 4h ago
If I type the word 'doodoo' I'm pretty sure I'm not swearing... Most probably telling someone about baby sounds.
mdaniel · 3h ago
oh, it's about baby something but that sentence didn't end the way I thought it would
mdaniel · 3h ago
I legit thought this said "... rating of success" meaning how likely the project was to be successful on some metric based on the profane words therein. I recall there was a study(?) akin to that for the Linux kernel, as a frame of reference
jollyllama · 2h ago
"Beaver" unlikely to be used in profanity, eh?
Night_Thastus · 35m ago
Pretty unlikely. I can't remember the last time I heard anyone use it aside from talking about the actual animal it refers to.
weinzierl · 3h ago
Good to know that "This package is safe."

When it comes to security, the only thing that beats warm fuzzy words are shiny security seals.

tgv · 3h ago
Where does the rating come from? Do you understand what all those words mean? It looks like you copied someone's rather subjective opinion. Because e.g. "bollo" and "caliente" aren't inherently profane in Spanish. Or do people think the hot water tap is leering at them? "Oye, tia, que caliente qu'et-ta!"
bitcurious · 3h ago
Based on a list of (in part) profane words which includes:

addict africa amateur american angry arab

Aachen · 1h ago
I assume this is meant as criticism, but to be fair to the list, it classifies 5 out of these 6 as 0, which apparently means {Use as a profanity = unlikely, Use in clean text = likely}, and the 6th one (addict) is a 'maybe' on both scales which seems fair to me: wouldn't a respectful source speak of addiction, addictive substances, people who are addicted, etc.?

From just this short list and a handful of other words I looked at, they seem to have done a reasonable job of classifying them, even if I see other issues such as completeness and what even is the purpose

pimlottc · 3h ago
“Sureness” is a not really a word, I had to read through to understand what they meant. “Certainty” or “confidence” would be clearer.
articulatepang · 2h ago
Sureness is most certainly a word! It has been used by writers of the stature of Emerson ("the law holds with equal sureness for all right action") and Edith Wharton ("“The moment the reader loses faith in the author’s sureness of foot the chasm of improbability gapes.”)

It used to mean "certainty", as when T. H. Howard writes, "Uncertainty about our religious condition is quite as unsatisfactory as any doubt about our most sacred domestic relationships. Sureness is vital to peace, and the truly sanctified soul will live in the region of certainty."

But in more modern usage the word has a connotation slightly different from what the author of this library intends. Its meaning is closer to "assuredness": confidence matched with ability. For example, "Proust had an incredible sureness of touch in shedding this prophetic ray on his characters." (again from Edith Wharton).

pimlottc · 11m ago
Point taken. It's not really a common word in modern usage, and probably not really the word the author wants here.
thuanao · 36m ago
Somewhat related: What is with the rampant cursing nowadays? In the US people are openly saying f-word in professional settings, in public to strangers or acquaintances, in writing and video... seemingly everywhere even in calm normal conversations.

I don't remember it being like this decades ago. Is it just me? I remember people used to curse only in private conversation, when angry, and never at the office in meetings and professional contexts.

SamBam · 21m ago
Yeah, there's been a pretty big generational shift, I think mostly from GenZ. I'd posit that texting/social media may be a reason.

I first went to grad school ~20 years ago, and no one cursed in class, especially not the professors.

I recently went back to school and got another masters, and nearly all the mid-20-year-olds drop f-bombs in regular classroom talk to the professor constantly, like they don't even hear that they're doing it. Some professors don't mind, and even respond in kind (though much more self-consciously), some are clearly displeased, but the students barely notice.

smitelli · 3h ago
Never realized the "chunky" in my chunky peanut butter was so profane. /s