What Can the AMD Radeon RX 9070 XT Run?
81
Compatible Models
16GB
VRAM (GDDR6)
640
GB/s Bandwidth
250W
TDP
CogVideoX-5B
THUDM · 5B video · BF16
6-second 720P clips on a single 16GB GPU — the easiest entry point into local video generation. Apache 2.0 commercial license with active ComfyUI integration. Best for social media automation and rapid prototyping.
16GB / 16GB VRAM
Open-Sora
HPC-AI Tech · 4B video · BF16
Full open-source Sora replication — weights, training code, and full data pipeline are all public. The most transparent video generation model. Essential for researchers studying text-to-video architectures and training dynamics.
16GB / 16GB VRAM
LFM2-24B-A2B
Liquid AI · 24B · Q4_K_M
A large hybrid model family designed specifically for efficient on-device deployment.
15GB / 16GB VRAM
Nemotron 3 22B
NVIDIA · 22B · Q4_K_M
A general-purpose reasoning and chat model trained from scratch by NVIDIA, featuring a low-latency MoE architecture.
14.5GB / 16GB VRAM
Codestral 22B
Mistral AI · 22B · Q4_K_M
Mistral's dedicated code model. Industry-leading performance on FIM (fill-in-middle) and complex code generation.
14GB / 16GB VRAM
Mistral Small 22B
Mistral AI · 22B · Q4_K_M
Highly capable 22B model from Mistral AI. Excellent instruction following for enterprise chat applications.
14GB / 16GB VRAM
Ernie-4.5
Baidu · 32B (MoE) · Q4_K_M
A medium-sized Mixture-of-Experts foundation model from Baidu.
14GB / 16GB VRAM
HunyuanVideo 1.5
Tencent · 13B video · BF16
Cinematic-quality video generation accessible at just 14GB VRAM with model offloading. The most democratized cinematic video tool available. Produces Hollywood-grade motion blur, depth of field, and lighting consistency.
14GB / 16GB VRAM
FLUX.1 Schnell
Black Forest Labs · 12B diffusion · BF16
4-step generation with Apache 2.0 commercial license. The fastest high-quality local image model — produces studio-grade output in under 3 seconds on a 24GB GPU. The go-to for commercial product photography pipelines.
12GB / 16GB VRAM
Stable Diffusion 3.5 Large
Stability AI · 8B diffusion · BF16
Best text-in-image rendering and the largest open LoRA fine-tune ecosystem with 50,000+ community models. The creative industry's preferred foundation for style transfer, brand consistency, and character sheet generation.
12GB / 16GB VRAM
MusicGen Large
Meta AI · 4B · BF16
Meta's flagship 4B music model with melody conditioning from reference audio clips. Best for cinematic scoring and mood-driven generation. CC-BY-NC-4.0. Industry standard for AI-assisted film and game soundtrack production.
12GB / 16GB VRAM
gpt-oss
OpenAI · 16B · Q4_K_M
OpenAI's open-weight LLM, supporting configurable reasoning efforts (low, medium, high).
11GB / 16GB VRAM
seed-oss
ByteDance · 16B · Q4_K_M
An advanced reasoning model with flexible 'thinking budget' control and self-reflection capabilities.
10.5GB / 16GB VRAM
StarCoder2 15B
BigCode · 15B · Q4_K_M
Specialized code completion model trained on 600+ programming languages. Top-tier for in-IDE completions.
10GB / 16GB VRAM
Devstral 2
Mistral AI · 14B · Q4_K_M
Second-generation Devstral for agentic coding, built for tool use, multi-file editing, and software engineering agents with vision support.
9.5GB / 16GB VRAM
Qwen3-VL
Alibaba · 14B · Q4_K_M
A vision-language model featuring upgrades to visual perception, spatial reasoning, and image understanding.
9.5GB / 16GB VRAM
DeepSeek R1 14B
DeepSeek · 14B · Q4_K_M
Qwen-2.5 distilled reasoning model. Strong chain-of-thought and math at an accessible VRAM cost.
9GB / 16GB VRAM
Qwen 2.5 14B
Alibaba · 14B · Q4_K_M
Exceptional 14B all-rounder. Competitive with many 30B+ models on reasoning and coding benchmarks.
9GB / 16GB VRAM
Qwen 2.5 Coder 14B
Alibaba · 14B · Q4_K_M
Top-tier code model for 12GB+ GPUs. Strong at agentic coding, multi-file edits, and complex refactors.
9GB / 16GB VRAM
Phi 4 14B
Microsoft · 14B · Q4_K_M
Microsoft's flagship small model. Trained on synthetic data with exceptional reasoning and STEM performance.
9GB / 16GB VRAM
Qwen 3.5 14B
Alibaba · 14B · Q4_K_M
Integrates breakthroughs in multimodal learning, architectural efficiency, and reinforcement learning scale.
9GB / 16GB VRAM
phi-4-reasoning
Microsoft · 14B · Q4_K_M
A lightweight open model focused on high-quality, reasoning-dense synthetic data.
9GB / 16GB VRAM
Llama 3.2 Vision 11B
Meta AI · 11B · Q4_K_M
State-of-the-art open-weight vision model. Analyze charts, read documents, describe complex scenes.
8GB / 16GB VRAM
Mistral NeMo 12B
Mistral AI / NVIDIA · 12B · Q4_K_M
128K context window in a 12B model. Joint Mistral AI & NVIDIA collaboration — excellent for long-document tasks.
8GB / 16GB VRAM
Gemma 3 12B
Google · 12B · Q4_K_M
Google's mid-tier Gemma 3. Multimodal capable, 128K context, strong multilingual reasoning.
8GB / 16GB VRAM
SDXL-Lightning
ByteDance · 3.5B diffusion · FP16
50-step SDXL quality in just 4 steps via adversarial diffusion distillation. Fully Apache 2.0 commercial. ComfyUI-native workflow. The fastest path from creative brief to production asset on 8GB VRAM hardware.
8GB / 16GB VRAM
Voxtral TTS
Mistral AI · 7B · Q4_K_M
Matches or beats ElevenLabs Flash on prosody naturalness. 3-second voice cloning from a reference clip with no fine-tuning required. The top locally-run TTS for podcast production, audiobooks, and voice-over automation.
8GB / 16GB VRAM
ACE-Step 1.5
ACE-Step · 2B · BF16
Best local music model for 2026. Generates up to 10 minutes of audio with precise genre, instrument, tempo, and lyrics control. Apache 2.0 commercial license. The definitive tool for indie game composers and content creators.
8GB / 16GB VRAM
GLM-4.6V-Flash
Zhipu AI · 9B · Q4_K_M
A 9B vision-language model optimized for local deployment and low-latency applications.
7GB / 16GB VRAM
GLM-4.7
Zhipu AI · 9B · Q4_K_M
Open source coding models specializing in coding and tool calling, based on a new base model.
6.5GB / 16GB VRAM
FunctionGemma 9B
Google · 9B · Q4_K_M
A lightweight, open model built as a foundation for creating specialized function calling models.
6.5GB / 16GB VRAM
Olmo 3
Allen AI · 10B · Q4_K_M
A family of open language models designed to enable scientific research into language modeling.
6.5GB / 16GB VRAM
Devstral
Mistral AI · 8B · Q4_K_M
A coding model from Mistral AI designed for codebase exploration and engineering agents.
6GB / 16GB VRAM
PixArt-Σ
PixArt-alpha · 600M diffusion · FP16
Tiny model with 4K native output capability — best image-per-VRAM-dollar ratio available. Apache 2.0. Runs on GTX 1070 8GB. The only model sub-1GB that produces print-resolution imagery with coherent composition.
6GB / 16GB VRAM
MuseTalk
Tencent · 500M · FP16
Photorealistic lip sync at 30+ FPS in real time. The best model for live avatar streaming and talking-head video creation. MIT license with active Discord community. Integrates natively with OBS and streaming tools.
6GB / 16GB VRAM
SadTalker
OpenTalker · 300M · FP16
One photo → talking head video with natural head movement, blinks, and mouth articulation. MIT license. No video sample required — just a single still image and an audio clip. The most accessible local avatar creation tool.
6GB / 16GB VRAM
LLaVA 7B
Haotian Liu et al. · 7B · Q4_K_M
The classic vision-language model. Describe images, answer visual questions locally. Proven and reliable.
5.5GB / 16GB VRAM
Rnj-1
Essential AI · 8B · Q4_K_M
A family of open-weight, dense models trained from scratch by Essential AI.
5.5GB / 16GB VRAM
Ministral 3 8B
Mistral AI · 8B · Q4_K_M
A highly cost-effective, high-performing 8B instruction tuned model.
5.5GB / 16GB VRAM
olmOCR 2
Allen AI · 7B · Q4_K_M
A specialized Vision Language Model (VLM) for optical character recognition tasks.
5.5GB / 16GB VRAM
gpt-oss-safeguard
OpenAI · 7B · Q4_K_M
Open safety models built on the gpt-oss foundation to help classify and filter text content.
5.5GB / 16GB VRAM
Granite 4.0
IBM · 8B · Q4_K_M
Lightweight open models supporting multilingual tasks, RAG, coding, and tool use.
5.5GB / 16GB VRAM
MiniCPM-V 2.6
OpenBMB · 8B · Q4_K_M
Video + multi-image + text understanding at 8B parameters. The best vision model for 8GB VRAM setups — handles 40-frame video clips, multi-image comparison, and document understanding in a single context window.
5.5GB / 16GB VRAM
CodeGemma 7B
Google · 7B · Q4_K_M
Google's code-tuned Gemma variant. Excellent at code completion tasks inside IDEs.
5GB / 16GB VRAM
Llama 3.1 8B
Meta AI · 8B · Q4_K_M
Meta's powerhouse 8B model with 128K context. Excellent all-rounder for chat, code, and reasoning.
5GB / 16GB VRAM
DeepSeek R1 8B
DeepSeek · 8B · Q4_K_M
Llama-3 distilled reasoning model. Outperforms GPT-4o on several math benchmarks at 8B scale.
5GB / 16GB VRAM
OpenCoder-8B
Infly · 8B dense · Q4_K_M
The 'OLMo of coding' — fully transparent training pipeline with HumanEval 83.5%. Every component is open: weights, data, and methodology. The most trustworthy small coding model for compliance-sensitive teams.
5GB / 16GB VRAM
NVIDIA Nemotron Nano 8B
NVIDIA · 8B dense · Q4_K_M
Math Index 91.0 — the highest math score at the 8GB VRAM tier. NVIDIA's distilled Llama-3.1 with proprietary reward model training. Ideal for STEM tutoring and quantitative analysis on a single mid-range GPU.
5GB / 16GB VRAM
Qwen3-Embedding-8B
Alibaba · 8B · Q4_K_M
Top-ranked self-hosted embedding on MTEB English — outperforms all sub-72B models. 32K context window for ultra-long document encoding. The upgrade path from BGE-M3 for teams needing maximum retrieval precision.
5GB / 16GB VRAM
Mistral 7B
Mistral AI · 7B · Q4_K_M
The model that proved smaller can beat bigger. Mistral 7B outperforms many 13B models with blazing fast speed.
4.5GB / 16GB VRAM
DeepSeek R1 7B
DeepSeek · 7B · Q4_K_M
Distilled reasoning power in a 7B package. Excels at math, logic, and step-by-step problem solving.
4.5GB / 16GB VRAM
Qwen 2.5 7B
Alibaba · 7B · Q4_K_M
Highly competitive 7B model with long context and strong multilingual support. A top value pick.
4.5GB / 16GB VRAM
Qwen 2.5 Coder 7B
Alibaba · 7B · Q4_K_M
Best-in-class 7B code model. Excellent at multi-language completion, bug fixing, and code explanation.
4.5GB / 16GB VRAM
MagicCoder-S-DS-6.7B
UIUC · 6.7B dense · Q4_K_M
HumanEval 76.8% at just 6.7B — beats models 10× its size through OSS-Instruct training on real open-source code. The best option for code completion on 6GB VRAM with quality that defies the parameter count.
4GB / 16GB VRAM
WhisperX
OpenAI · 1.5B · FP16
The de facto standard for local speech-to-text. Word-level timestamps, speaker diarization, and 99 language support. Essential for transcription pipelines, meeting summarization, and building voice-first AI interfaces.
4GB / 16GB VRAM
SAM 2
Meta AI · 300M · FP16
Click anywhere on an image or video → instant object segmentation. Apache 2.0. Universal segmentation model used in medical imaging, autonomous driving datasets, and content creation. Zero training required for any object class.
4GB / 16GB VRAM
Depth Pro
Apple · 300M · FP16
Single image → metrically accurate 3D depth map in under 0.3 seconds. Free for research use. Powers 3D scene reconstruction, bokeh simulation, and AR/VR depth estimation pipelines without any calibration data.
4GB / 16GB VRAM
Gemma 3 4B
Google · 4B · Q4_K_M
Google's strong 4B model with multimodal capability and 128K context. One of the best small models.
3GB / 16GB VRAM
Phi 3.5 Mini
Microsoft · 3.8B · Q4_K_M
Microsoft's multilingual tiny model with enormous 128K context window. Exceptional for document tasks.
2.5GB / 16GB VRAM
Phi-4-mini
Microsoft · 3.8B dense · Q4_K_M
Microsoft's best edge release. Fits 8GB RAM and runs fast on M1 MacBook Air in airplane mode. Exceptional at structured reasoning for its size — the top choice for on-device personal assistants and document Q&A.
2.3GB / 16GB VRAM
Qwen 2.5 Coder 3B
Alibaba · 3B · Q4_K_M
Compact code-specialized model. Strong at code completion and debugging on very limited hardware.
2.2GB / 16GB VRAM
gemma-3n
Google · 3B · Q4_K_M
A generative AI model optimized for use in everyday devices like laptops and phones.
2.2GB / 16GB VRAM
Llama 3.2 3B
Meta AI · 3B · Q4_K_M
Best-in-class 3B model with 128K context. Outperforms many 7B models on common benchmarks.
2GB / 16GB VRAM
Qwen 2.5 3B
Alibaba · 3B · Q4_K_M
Compact and fast. Excellent multilingual and instruction-following performance at tiny VRAM cost.
2GB / 16GB VRAM
F5-TTS
SWivid · 300M · FP16
Flow-matching TTS with no duration modeling — produces the most natural prosody and sentence rhythm of any local voice model. MIT license. The preferred choice for giving local AI agents a human-sounding voice interface.
2GB / 16GB VRAM
ColPali
Vidore · 3B · Q4_K_M
Encodes PDF pages as images — bypasses all broken PDF parsers for perfect scanned document retrieval. Apache 2.0. The breakthrough for RAG on government forms, research papers, and historical archives with graphical content.
2GB / 16GB VRAM
Moondream 2
vikhyatk · 1.8B · Q4_K_M
Tiny but capable vision language model. Describe images, read text, answer visual questions locally.
1.5GB / 16GB VRAM
Qwen3.5-2B
Alibaba · 2B dense · Q4_K_M
Runs on iPhone in airplane mode. First sub-3B model with native multimodal support — a landmark for on-device AI. Perfect for privacy-preserving mobile apps that need real conversational capability without a server.
1.3GB / 16GB VRAM
DeepSeek R1 1.5B
DeepSeek · 1.5B · Q4_K_M
Smallest reasoning model you can run locally. Surprising chain-of-thought performance for its size.
1.1GB / 16GB VRAM
Qwen 2.5 1.5B
Alibaba · 1.5B · Q4_K_M
Excellent multilingual capabilities for its size. Particularly strong in Chinese and coding tasks.
1.1GB / 16GB VRAM
SmolLM2-1.7B
HuggingFace · 1.7B dense · Q4_K_M
Runs in-browser via WebGPU — no installation required. Best for Electron apps and Raspberry Pi deployments. HuggingFace's most downloaded edge model with an Apache 2.0 license and full community model ecosystem.
1.1GB / 16GB VRAM
Llama 3.2 1B
Meta AI · 1B · Q4_K_M
Compact Llama 3.2 with impressively long 128K context window. Perfect for edge deployment.
1GB / 16GB VRAM
Gemma 3 1B
Google · 1B · Q4_K_M
Google's smallest Gemma 3. Runs on virtually any GPU or even CPU — great for on-device applications.
0.9GB / 16GB VRAM
mxbai-embed-large
MixedBread AI · 335M · FP32
State-of-the-art embedding model for retrieval tasks. Ranks #1 on multiple MTEB categories.
0.7GB / 16GB VRAM
Florence-2
Microsoft · 770M · FP16
Captioning, object detection, grounding, OCR, and segmentation in one 770M model — MIT license. The Swiss Army knife of computer vision. Runs on almost any GPU and powers automated image tagging pipelines at scale.
0.5GB / 16GB VRAM
BGE-M3
BAAI · 568M · FP16
The default local RAG embedding model for 2026. 100 languages, 8K context, and three retrieval modes (dense, sparse, multi-vector) in one model. MIT license. Used in production by thousands of enterprise RAG pipelines.
0.5GB / 16GB VRAM
Qwen 2.5 0.5B
Alibaba · 0.5B · Q4_K_M
Smallest Qwen 2.5 — blazing fast on any hardware. Surprisingly capable for its size on simple tasks.
0.4GB / 16GB VRAM
Nomic Embed Text
Nomic AI · 137M · FP32
High-quality embedding model with 8K context. Outperforms OpenAI text-embedding-ada-002 on MTEB benchmark.
0.3GB / 16GB VRAM
LFM2-350M
Liquid AI · 350M · FP16
Non-Transformer architecture with linear context scaling — never degrades on long sequences. Achieves 40,400 tokens/sec on Apple Silicon. The fastest local model for structured extraction pipelines and IoT edge nodes.
0.3GB / 16GB VRAM
all-MiniLM-L6
Sentence Transformers · 22M · FP32
Ultra-compact sentence embedding model. Perfect for semantic search and RAG pipelines on any hardware.
0.1GB / 16GB VRAM
Kokoro-82M
hexgrad · 82M · CPU
CPU-only TTS that runs on Raspberry Pi. The best quality-per-watt ratio of any voice model — 82M parameters producing studio-quality speech synthesis. Apache 2.0 commercial license with no GPU requirement whatsoever.
0GB / 16GB VRAM
How to Run These Models
- 1Install Ollama — Download from ollama.com for Windows, macOS, or Linux. No CUDA setup required.
- 2Pick a model — Click any copy button above to get the run command.
- 3Run it — Paste the command into your terminal. Ollama downloads and launches the model automatically on your AMD Radeon RX 9070 XT.