People often don't understand why LLMs can be non deterministic even with deterministic seeding, temperature, sampling. This paper shows how bad it can be with different hardware and gpu hosts.
incomingpain · 5h ago
Q4 is plenty for me, I dont have the budget for FP32 lol.
If money wasnt a thing, id probably not be going above Q8.
If money wasnt a thing, id probably not be going above Q8.