I’ve used the reverse. When making a new module I’ve let the llm make up an api and I’ve used the names suggested as inspiration of what might come natural to others, to make the use more intuitive.
8n4vidtmkvmk · 1h ago
I too am keen on letting the AI pick names. I'll often be vague about what I want so that it can fill in the blank before I refine it. Comes up with some good stuff.
kouteiheika · 11h ago
> LLMs hallucinated a package named "huggingface-cli" [...] it is not the name of the package [...] software is correctly installed with [...] huggingface_hub
It would be a good idea to disallow registering packages which only differ by '-'/'_'. Rust's crates.io does this, so if you register `foo-bar` you cannot register `foo_bar` anymore.
voidUpdate · 11h ago
That wouldnt help in this case though, one is -cli, the other is _hub
mapmeld · 10h ago
It is a command line tool "huggingface-cli", it's just installed with a differently-named pypi package. I wouldn't call this a full hallucination because anyone could make this mistake.
Hackbraten · 8h ago
If a human user makes this mistake, they’d go to https://pypi.org/project/huggingface-cli/ and see that it a) either doesn’t exist, b) is an unrelated package, or c) that the verified list of its maintainers is unrelated than the maintainers on Hugging Face, Inc.’s GitHub repository.
woodruffw · 10h ago
That’s already how Python packages work. The problem here isn’t the hyphen/underscore.
Pxtl · 11h ago
Case insensitivity is so important.
Underscore is just capital hyphen.
ikari_pl · 10h ago
I’d rather say underscore is a capital space :D
Pxtl · 9h ago
I'm tempted to make a keybind for this. Then I might actually start using snake_case instead of CamelCase, which on a certain level I know is better I just hate typing it.
glenstein · 10h ago
>In May 2025, the potential and prevalence of slopsquatting was detailed in the academic paper "We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs".[1][11] Some of the paper's main findings are that 19.7% of the LLM recommended packages did not exist
At the risk of perhaps misunderstanding or committing a category error, I wonder if there's such a thing as a category of "correct" hallucinating, distinct from things that are, in some sense, "known" via training (e.g. I read about prompting of one model showing it was able to accurately recreate most of the text of Harry Potter, so clearly it's "in there" somewhere).
An interesting upshot of that could be that models "grow" their own knowledge in an evolutionary way via hallucinations that are retained rather than pruned as part of routine filtering and training.
Though I'm sure some might suggest "hallucinating correctly" is just one of the same with ordinary b function. I wouldn't agree with that but I could at least see the argument.
I'm asking more how this meets the notability requirements for a separate Wikipedia article.
I'm all for having lots of small Wikipedia articles, but the past few years they've been tending toward combining small articles together. And this is more like a dictionary entry than an encyclopedia entry.
Mistletoe · 10h ago
I think it’s much better like this. It is it’s own fascinating thing and not lost in a long general Wikipedia entry.
nemomarx · 11h ago
Right but why not a redirect to that one and a subsection?
I feel like this is going to fall under notability eventually
blueflow · 11h ago
It needs to be a big thing on its own so security people can be onto something important.
cestith · 8h ago
I guess it’s quicker to type than 'hallucinationsquatting'.
ysofunny · 10h ago
all I want to figure out
is how to "manually" (semi-manually) tweak the LLMs parameters so we can alter what it 'knows for sure'
is this doable yet??? or is this one of those questions whose answer is best kept behind NDAs and other such practices?
jen729w · 10h ago
This question makes no sense (logically, not grammatically) in the context of LLMs.
They don't 'know' anything. They are a many-dimensional matrix of the next most likely syllable given all syllables that have come before (roughly speaking).
To ask what it 'knows' is to ask why a chicken crossed the road.
iaiuse · 10h ago
Correct—they don’t “know” in the epistemic sense, but they do encode a latent world model that shows up as useful priors.
Put differently: GPT-4 isn’t a knowledge base, it’s a *Bayesian autocomplete* over dense vectors. That’s why it can draft Python faster than many juniors, yet fail a trivial chain-of-thought step if the token path diverges.
The trick in production is to sandwich it: retrieval (facts) LLM (fluency) rule checker (logic). Without that third guardrail, you’re betting on probability mass, not truth.
zwnow · 10h ago
We still barely know how LLMs really work. Hard to tune things you dont understand.
Inbefore people telling me "akshually we know all about bla bla bla..." no we dont.
aklemm · 10h ago
It could be that we do know how they work, we just don’t know how to contextualize their output (as in, we ascribe way too much humanity to it) and that obscures what we think must be happening.
ysofunny · 9h ago
which is why in order to appear smart and well informed, it's recommended to point out how my question is nonensense rather than legitimately trying to figure it out.
zwnow · 8h ago
Just tech things. This place isn't different from Stack Overflow attitude wise.
tekno45 · 7h ago
you're not entitled to an explanation of your nonsense.
mouse_ · 11h ago
What distro/repo though?
Invictus0 · 11h ago
We need to have a professional software engineering license, at least for applications that are handling sensitive data. Why is it that it takes 1000 hours of study to cut people's hair, but anyone off the street can write some software that collects people's driver's licenses? (Looking at you, Tea app developers)
EvanAnderson · 10h ago
> Why is it that it takes 1000 hours of study to cut people's hair...
Protectionism by a de facto trade guild was always my assumption.
There are a lot of activities where bad practitioners present significant danger to society and licensure makes sense. I never understood how cutting hair rises to that level. I'd love to know how licensure in the barber profession is anything other than a bald-faced attempt at building a moat. It seems like the market could correct for a bad practitioner in the barber space pretty easily, and with little risk to society.
fragmede · 8h ago
> It seems like the market could correct for a bad practitioner in the barber space pretty easily,
Why do you assume that? I bet most people don't know their barber personally, and just go to the shop to get a cut. Should getting a haircut be fraught with having to go online and read a bunch of reviews, followed by the inevitable bickering between fake reviews and fake responses on top of that? No, I just want to get a decent cut for a decent price. We can nitpick over how much training is reasonable, and sure there's an element of protectionism there, but if the Internet had taught us anything, it's that online reviews are bullshit. I would hate to have to rely on them to correct for a bad practitioner when they aren't really able to do anything about bad doctors, which has a much higher bar to practice.
8n4vidtmkvmk · 1h ago
My wife just paid $300 for a root touchup and the stylist did an awful job. Apparently money and licenses and who knows how much experience aren't guarantees of a good result either.
I've had bad haircuts too. And I have the simplest hair cut ever. Just buzz it off. But noooo.. on multiple occasions they've missed way too many strands of hair.
snapcaster · 11h ago
I think something like professional licenses are easy to see benefits of but really hard to see the downsides. How many wonderful things _wont_ be created when you start gatekeeping something? Maybe it is worth it but it's not some free win
prmoustache · 11h ago
Mandating a professional license for hairdressers to work professionally does not prevent you from cutting your partner/friends/family hair as long as you don't ask any money in return.
Aurornis · 10h ago
> easy to see benefits of but really hard to see the downsides
I think like most hypothetical discussions, the commenters proposing these ideas aren’t interested in practical versions of the idea with tradeoffs. They imagine a perfect version of it in their minds with no downsides that accomplishes everything they want.
The demand for professional licensure doesn’t even make sense in this context. Is professional licensing supposed to stop developers from naming their packages names that LLMs produce? Is it going to force the package repos to check that everyone has a professional license before submitting packages from the United States (or other countries with licensure)? Can it be worked around by changing your country in the drop-down box to a country that doesn’t have licensing?
The calls for software licensure never seem to take into account the global nature of the Internet and software development.
8n4vidtmkvmk · 1h ago
Yes. If they nefariously typosquat, that could be grounds for losing your license.
Adding a link to your verified license in your package.json or personal website so that installers can check that the author of the package they are using does have a license sounds perfectly fine.
Proving you reside or are licensed in some country before you can publish to that countries repository sounds very doable too.
We don't even have to do this perfectly. It's not about preventing people from skirting the system, it's about giving users and developers the option to install from only verified sources.
Would you rather get heart surgery from a licensed doctor or an unlicensed one? What if both existed where you live? I'd probably ask to see their license before going through with it.
vincnetas · 10h ago
no one is prohibiting hackers from hacking. i do my haircut at home without any licenses. what you need license for is to provide services to other people for money.
Invictus0 · 11h ago
Like I said, the license should be for handling sensitive data. You're free to make doodle jump if you like.
snapcaster · 8h ago
I just don't like when comments on things like this don't engage with the downsides. Gatekeeping isn't "free" or strictly better
Invictus0 · 6h ago
The status quo is that anyone can make an application that leaks a million drivers licenses, with no oversight, penalty, or restrictions whatsoever. This is good?
If hairdressers have to take time to learn how to not cut people's ears off, people publishing applications should have to learn basic security practices. I think you will find that no one finds this controversial. And yet, we are moving to a world where AI is making it easier than ever for an army of vibe coders to make apps without knowing the literal first thing about security.
8n4vidtmkvmk · 1h ago
I'm thinking about this selfishly. I'd have to go and take a test. But my employer would probably pay for it. Maybe if I wasn't already employed, it'd be on my own dime, but even that isn't too unlike the school I've already paid for. And I'm sure I'd pass the test. As long as I don't have to recertify too frequently, it probably wouldn't be awful. And also selfishly, if it keeps some riffraft out, I wouldn't hate that either.
I guess my biggest concern, with parallels to that time I sold life insurance, is that they test for one thing and then in practice you do a different thing. I hear the same is true for realtors. So.. it becomes an exercise in memorizing some BS that you won't use again after the test. If we do this, the software engineering test would need to be updated at least annually, and better be written by some well respected security researchers.
jen729w · 10h ago
In 2023 I was at a talk at the National Press Club in Canberra, Australia, by the deputy head of one of our national intelligence services.
This was just after the Optus leak. Some hundreds of thousands of customers' data, down to the passport and DOB level, leaked. Again. I was going to ask him whether we, the collected IT consultants in the room, simply couldn't be trusted any more.
We've proven that we can't. I firmly believe that independent companies should no longer, by law, be able to collect my identifying information. If you must identify me, the state should provide a service. You hand off to them, they validate me, they send you a token back, I'm validated.
Sadly the microphone never made it to my corner of the room.
lan321 · 11h ago
Uni graduates do that still. I wouldn't trust myself to set that up either, as a matter of fact.
Optimally, you'd probably have seniors do some "Security Compliance Certification" and the company do it, then the product has to be approved by the certified, and if an issue arises, the certified get to be reprimanded, especially the company certification in some exponentially scaling manner so that it doesn't become the cost of doing business.
readthenotes1 · 1h ago
The company was at had to change all of the job titles at one point because the local regulatory agency said that you cannot call yourself an engineer unless you pass the engineering license, which most software developers could not even hope to do with a few years of study
Aurornis · 11h ago
What does this have to do with the article? Naming a package to match LLM patterns has nothing to do with licensing.
Invictus0 · 6h ago
You could imagine that to get a SWE license, you would have to have learned basic security practices which could include things like dependency management.
8n4vidtmkvmk · 1h ago
I was thinking the opposite direction. Squatting is malicious and could cost you your license.
ysofunny · 10h ago
then you will need to input your license everytime you open develop tools...
sooner or later command line interfaces will require background checks and be limited to a close select group of government approved individuals, e.g. like guns in japan.
add-sub-mul-div · 10h ago
1. Most everyone needs haircuts and licensure allows us to take it for granted that we'll get someone with basic competence and that there won't be a citywide outbreak of lice. So much so that we've forgotten the point of the licensure and cite it as folly.
2. Computing was new and mysterious and developed faster than lawmakers could understand it, and by now it's given so much power to the top 1% that they're for all intents and purposes above the law. Cosmetology licensure is from a time when legislation still helped us.
It would be a good idea to disallow registering packages which only differ by '-'/'_'. Rust's crates.io does this, so if you register `foo-bar` you cannot register `foo_bar` anymore.
Underscore is just capital hyphen.
At the risk of perhaps misunderstanding or committing a category error, I wonder if there's such a thing as a category of "correct" hallucinating, distinct from things that are, in some sense, "known" via training (e.g. I read about prompting of one model showing it was able to accurately recreate most of the text of Harry Potter, so clearly it's "in there" somewhere).
An interesting upshot of that could be that models "grow" their own knowledge in an evolutionary way via hallucinations that are retained rather than pruned as part of routine filtering and training.
Though I'm sure some might suggest "hallucinating correctly" is just one of the same with ordinary b function. I wouldn't agree with that but I could at least see the argument.
> Slopsquatting is a type of cybersquatting.
I'm all for having lots of small Wikipedia articles, but the past few years they've been tending toward combining small articles together. And this is more like a dictionary entry than an encyclopedia entry.
I feel like this is going to fall under notability eventually
is how to "manually" (semi-manually) tweak the LLMs parameters so we can alter what it 'knows for sure'
is this doable yet??? or is this one of those questions whose answer is best kept behind NDAs and other such practices?
They don't 'know' anything. They are a many-dimensional matrix of the next most likely syllable given all syllables that have come before (roughly speaking).
To ask what it 'knows' is to ask why a chicken crossed the road.
Put differently: GPT-4 isn’t a knowledge base, it’s a *Bayesian autocomplete* over dense vectors. That’s why it can draft Python faster than many juniors, yet fail a trivial chain-of-thought step if the token path diverges.
The trick in production is to sandwich it: retrieval (facts) LLM (fluency) rule checker (logic). Without that third guardrail, you’re betting on probability mass, not truth.
Inbefore people telling me "akshually we know all about bla bla bla..." no we dont.
Protectionism by a de facto trade guild was always my assumption.
There are a lot of activities where bad practitioners present significant danger to society and licensure makes sense. I never understood how cutting hair rises to that level. I'd love to know how licensure in the barber profession is anything other than a bald-faced attempt at building a moat. It seems like the market could correct for a bad practitioner in the barber space pretty easily, and with little risk to society.
Why do you assume that? I bet most people don't know their barber personally, and just go to the shop to get a cut. Should getting a haircut be fraught with having to go online and read a bunch of reviews, followed by the inevitable bickering between fake reviews and fake responses on top of that? No, I just want to get a decent cut for a decent price. We can nitpick over how much training is reasonable, and sure there's an element of protectionism there, but if the Internet had taught us anything, it's that online reviews are bullshit. I would hate to have to rely on them to correct for a bad practitioner when they aren't really able to do anything about bad doctors, which has a much higher bar to practice.
I've had bad haircuts too. And I have the simplest hair cut ever. Just buzz it off. But noooo.. on multiple occasions they've missed way too many strands of hair.
I think like most hypothetical discussions, the commenters proposing these ideas aren’t interested in practical versions of the idea with tradeoffs. They imagine a perfect version of it in their minds with no downsides that accomplishes everything they want.
The demand for professional licensure doesn’t even make sense in this context. Is professional licensing supposed to stop developers from naming their packages names that LLMs produce? Is it going to force the package repos to check that everyone has a professional license before submitting packages from the United States (or other countries with licensure)? Can it be worked around by changing your country in the drop-down box to a country that doesn’t have licensing?
The calls for software licensure never seem to take into account the global nature of the Internet and software development.
Adding a link to your verified license in your package.json or personal website so that installers can check that the author of the package they are using does have a license sounds perfectly fine.
Proving you reside or are licensed in some country before you can publish to that countries repository sounds very doable too.
We don't even have to do this perfectly. It's not about preventing people from skirting the system, it's about giving users and developers the option to install from only verified sources.
Would you rather get heart surgery from a licensed doctor or an unlicensed one? What if both existed where you live? I'd probably ask to see their license before going through with it.
If hairdressers have to take time to learn how to not cut people's ears off, people publishing applications should have to learn basic security practices. I think you will find that no one finds this controversial. And yet, we are moving to a world where AI is making it easier than ever for an army of vibe coders to make apps without knowing the literal first thing about security.
I guess my biggest concern, with parallels to that time I sold life insurance, is that they test for one thing and then in practice you do a different thing. I hear the same is true for realtors. So.. it becomes an exercise in memorizing some BS that you won't use again after the test. If we do this, the software engineering test would need to be updated at least annually, and better be written by some well respected security researchers.
This was just after the Optus leak. Some hundreds of thousands of customers' data, down to the passport and DOB level, leaked. Again. I was going to ask him whether we, the collected IT consultants in the room, simply couldn't be trusted any more.
We've proven that we can't. I firmly believe that independent companies should no longer, by law, be able to collect my identifying information. If you must identify me, the state should provide a service. You hand off to them, they validate me, they send you a token back, I'm validated.
Sadly the microphone never made it to my corner of the room.
Optimally, you'd probably have seniors do some "Security Compliance Certification" and the company do it, then the product has to be approved by the certified, and if an issue arises, the certified get to be reprimanded, especially the company certification in some exponentially scaling manner so that it doesn't become the cost of doing business.
sooner or later command line interfaces will require background checks and be limited to a close select group of government approved individuals, e.g. like guns in japan.
2. Computing was new and mysterious and developed faster than lawmakers could understand it, and by now it's given so much power to the top 1% that they're for all intents and purposes above the law. Cosmetology licensure is from a time when legislation still helped us.