MEMORY BANDWIDTH CALCULATOR

Token Speed Estimator

See how fast your GPU will generate text. Tokens-per-second (t/s) is strictly bound by memory bandwidth and model weight size.

Target GPU

Model Parameters

Quantization Level

Fully fits in VRAM

Estimated Speed

285t/s

Time for 1K Tokens

3.5 sec

Approx. 750 words

VRAM Footprint

5.9 GB

Model: 4.4GB + 1.5GB OS

How is this calculated?

Local inference is almost entirely a memory bandwidth bound operation. In a single token generation step, the entire model weight file must be swept through the GPU memory once. We take the 1792 GB/s memory bandwidth of the NVIDIA NVIDIA GeForce RTX 5090, multiply it by a 70% real-world efficiency factor, and divide it by the 4.4 GB file size of the 8B (Llama 3) model running at 4-bit (Fastest / Recommended).