VRAM COMPATIBILITY // LOCAL AI

What Can I Run?

Select your GPU to see every Ollama-compatible local AI model that fits in its VRAM โ€” from tiny 1B chat models to frontier-class 70B reasoners.

How VRAM Compatibility Works

01

Quantization: Models are compressed using Q4_K_M quantization โ€” roughly halving VRAM usage with minimal quality loss vs. full FP16.

02

KV Cache: Our recommendations include ~1โ€“3 GB of VRAM headroom for the KV cache needed during inference. Tighter fits may limit context length.

03

One Command: All models shown can be launched via Ollama with a single terminal command โ€” no CUDA setup, no driver headaches.

As an Amazon Associate, I earn from qualifying purchases.