I legitimately dont understand why anyone would want a 4B model.
They might as well call all models 4B and smaller after psychedelics because they be hallucinating.
mdaniel · 6h ago
Plus, this specific use case is also "to detect legally relevant text like license declarations in code and documentation" so I guess they really bought into that regex adage about "and now you have two problems" and thought they'd introduce some floating point math instead
hammyhavoc · 7h ago
And yet so many people on HN are adamant that the more tokens, the better, and it's all just a matter of throwing more money at it, and it's inevitable it will somehow "get better", because there's "so much money riding on it".
I wonder when the penny will drop?
incomingpain · 4h ago
I was just testing this a bit more.
I grabbed qwen3:4b. Cranked it to the max of 32k tokens.
It's fast to be sure, and im struggling to get it to hallucinate; but it is giving me a ton of 'The provided context does not include specifics'
Resource-wise its like running 12-16B, but faster. But soon as you expand the 12B to like 10k tokens it's clearly better for barely anymore resources.
incomingpain · 6h ago
I wonder if my understanding is flawed. I've tested this using lm studio. Lots of dials are involved.
They might as well call all models 4B and smaller after psychedelics because they be hallucinating.
I wonder when the penny will drop?
I grabbed qwen3:4b. Cranked it to the max of 32k tokens.
It's fast to be sure, and im struggling to get it to hallucinate; but it is giving me a ton of 'The provided context does not include specifics'
Resource-wise its like running 12-16B, but faster. But soon as you expand the 12B to like 10k tokens it's clearly better for barely anymore resources.