The best VRAM calculator I have found is https://apxml.com/tools/vram-calculator. It is much more thorough than this one. For example, it understands different models' attention schemes for correct KV cache size calculation, and supports quantization of both the model and the KV cache. Also, fine-tuning. It has its own limitations, such as only supporting specific models. In practice though, the generic calculators are not very useful because model architectures vary (mainly the KV cache) and end up being way off. (Not sure whether or not it would be better to discuss it separately, but I submitted it at https://news.ycombinator.com/item?id=44677409)
zeroq · 4h ago
This one is indeed much better and it instantly answers my immediate feedback I wanted to leave for the one originally posted, which is - instead of calculating an artificial scenario I would like to state what can I run on the hardware I actually have at hand.
Thanks!
funfunfunction · 7h ago
This is a cheap marketing ploy for a GPU reseller with billboards on highway 101 into SF.
ChadNauseam · 3h ago
Hate those ads. "Inference isn't just a buzzword". Who thought it was? (No comment on whether the linked post is a useful tool, I haven't played with it enough to know)
amanzi · 6h ago
I would have liked to see the RTX 5060 Ti with 16GB mentioned. I can't tell if it's omitted because it won't work, or if it's excluded for some other reason?
amatecha · 5h ago
Yeah, weird miss, but maybe just because it came out more recently. It can be used for ~anything a 5070 could be used for, no? Maybe slower, but still.
LorenDB · 7h ago
Where's AMD support? I have a 9070 XT and would love to see it listed on here.
mdaniel · 3h ago
> 0 Model Available
Who in the world is expected to populate 11 select/text fields with their favorite model data points they just happen to have lying around, only to see an absolutely meaningless "295% Inference" outcome
What a dumpster
timothyduong · 6h ago
Where's 3090? Or should that fall in the 4090 (24GB VRAM) category?
chlobunnee · 7h ago
I built a calculator to help researchers and engineers pick the right GPUs for training and inference workloads!
It helps compare GPU options by taking in simple parameters (# of transformer layers, token size, etc) and letting users know which GPUs are compatible + their efficiency for training vs inferencing.
The idea came from talking with ML researchers frustrated by slow cluster queues or wasting money on overkill GPUs.
I'd love feedback on what you feel is missing/confusing!
Some things I'm thinking about incorporating next are
>Allowing users to directly compare 2 GPUs and their specs
>Allowing users to see whether a fraction of the GPU can complete their workload
I would really appreciate your thoughts/feedback! Thanks!
snvzz · 6h ago
Rather than GPU calculator, this is an NVIDIA calculator.
nodesocket · 6h ago
In case you’ve been living in a cave, Nvidia is the defacto standard for LLM compute.
jakogut · 2h ago
Llama.cpp supports Vulkan, which is supported by all GPU vendors that care about standards and interoperability.
The default should be open and portable APIs, not needlessly furthering a hegemony that is detrimental to us all.
Who in the world is expected to populate 11 select/text fields with their favorite model data points they just happen to have lying around, only to see an absolutely meaningless "295% Inference" outcome
What a dumpster
It helps compare GPU options by taking in simple parameters (# of transformer layers, token size, etc) and letting users know which GPUs are compatible + their efficiency for training vs inferencing.
The idea came from talking with ML researchers frustrated by slow cluster queues or wasting money on overkill GPUs.
I'd love feedback on what you feel is missing/confusing!
Some things I'm thinking about incorporating next are >Allowing users to directly compare 2 GPUs and their specs >Allowing users to see whether a fraction of the GPU can complete their workload
I would really appreciate your thoughts/feedback! Thanks!
The default should be open and portable APIs, not needlessly furthering a hegemony that is detrimental to us all.