What Can I Run?
Select your GPU to see every Ollama-compatible local AI model that fits in its VRAM โ from tiny 1B chat models to frontier-class 70B reasoners.
NVIDIA GeForce RTX 5090
32
GB VRAM
99
Models Fit
NVIDIA GeForce RTX 5080
16
GB VRAM
81
Models Fit
NVIDIA GeForce RTX 5070 Ti
16
GB VRAM
81
Models Fit
NVIDIA GeForce RTX 5070
12
GB VRAM
73
Models Fit
AMD Radeon RX 9070 XT
16
GB VRAM
81
Models Fit
AMD Radeon RX 9070
16
GB VRAM
81
Models Fit
NVIDIA GeForce RTX 4090
24
GB VRAM
99
Models Fit
NVIDIA GeForce RTX 4080 Super
16
GB VRAM
81
Models Fit
NVIDIA GeForce RTX 4070 Ti Super
16
GB VRAM
81
Models Fit
NVIDIA GeForce RTX 4070 Super
12
GB VRAM
73
Models Fit
NVIDIA GeForce RTX 3090
24
GB VRAM
99
Models Fit
NVIDIA GeForce RTX 3060 12GB
12
GB VRAM
73
Models Fit
How VRAM Compatibility Works
Quantization: Models are compressed using Q4_K_M quantization โ roughly halving VRAM usage with minimal quality loss vs. full FP16.
KV Cache: Our recommendations include ~1โ3 GB of VRAM headroom for the KV cache needed during inference. Tighter fits may limit context length.
One Command: All models shown can be launched via Ollama with a single terminal command โ no CUDA setup, no driver headaches.