Token Speed Estimator
See how fast your GPU will generate text. Tokens-per-second (t/s) is strictly bound by memory bandwidth and model weight size.
Fully fits in VRAM
Estimated Speed
285t/s
Time for 1K Tokens
3.5 sec
Approx. 750 words
VRAM Footprint
5.9 GB
Model: 4.4GB + 1.5GB OS
How is this calculated?
Local inference is almost entirely a memory bandwidth bound operation. In a single token generation step, the entire model weight file must be swept through the GPU memory once. We take the 1792 GB/s memory bandwidth of the NVIDIA NVIDIA GeForce RTX 5090, multiply it by a 70% real-world efficiency factor, and divide it by the 4.4 GB file size of the 8B (Llama 3) model running at 4-bit (Fastest / Recommended).