TheAITechPulse Logo
speed Free Speed Estimator

GPU Tokens Per Second Calculator: LLM Speed Estimator

Pick your GPU (or Mac) and a model — get an estimated tokens-per-second, then see how it stacks up against other hardware.

tune Your Setup

info How we calculate this

Speed: (Memory bandwidth × ~70% efficiency) ÷ active model weight size. This models the memory-bandwidth-bound nature of LLM inference.

Note: Real-world speed varies by engine (Ollama, llama.cpp, vLLM, MLX), driver version, and system load. Treat this as a planning estimate, not a guarantee.

Live Speed Readout
speed

Select your hardware and a model on the left to estimate speed.