speed Free Speed Estimator
GPU Tokens Per Second Calculator: LLM Speed Estimator
Pick your GPU (or Mac) and a model — get an estimated tokens-per-second, then see how it stacks up against other hardware.
tune Your Setup
info How we calculate this
Speed: (Memory bandwidth × ~70% efficiency) ÷ active model weight size. This models the memory-bandwidth-bound nature of LLM inference.
Note: Real-world speed varies by engine (Ollama, llama.cpp, vLLM, MLX), driver version, and system load. Treat this as a planning estimate, not a guarantee.
Live Speed Readout
speed
Select your hardware and a model on the left to estimate speed.
0
Tokens / Second
trending_up
Good
0 w/s
≈ Words / Second
0 GB/s
Memory Bandwidth