Local Models Library
The era of cloud dependency is over. Explore the high-performance, open-weight models that power the local AI revolution.
Showing 124 Models
Mistral AI
Devstral 2
14B | Q4_K_MSecond-generation Devstral for agentic coding, built for tool use, multi-file editing, and software engineering agents with vision support.
๐ปCode๐ป Code
9.5GB VRAM
Alibaba
Qwen 3.5 14B
14B | Q4_K_MIntegrates breakthroughs in multimodal learning, architectural efficiency, and reinforcement learning scale.
๐ฌChat๐ Top Pick
9GB VRAM
Alibaba
Qwen3-Coder-Next
80B (MoE) | Q4_K_MAn 80B MoE model with 3B active parameters, designed for coding agents and complex tool usage.
๐ปCode๐ป Code
24GB VRAM
Alibaba
Qwen3 Next
80B (MoE) | Q4_K_MA high-sparsity Mixture-of-Experts 80B model utilizing a hybrid attention architecture.
๐ฌChat๐ Top Pick
24GB VRAM
MiniMax
minimax-m2
230B (MoE) | Q4_K_MA 230B MoE model built specifically for coding and agentic workflows.
๐ปCode๐ป Code
85GB VRAM
Alibaba
Qwen3-VL
14B | Q4_K_MA vision-language model featuring upgrades to visual perception, spatial reasoning, and image understanding.
๐๏ธVision๐๏ธ Vision
9.5GB VRAM
Alibaba
Qwen3-72B
72B | Q4_K_MThe latest version of the Qwen3 family, including both 'thinking' and 'non-thinking' variants.
๐ฌChat๐ Top Pick
43GB VRAM
Google
gemma-3n
3B | Q4_K_MA generative AI model optimized for use in everyday devices like laptops and phones.
๐ฌChatโก Fast
2.2GB VRAM
Zhipu AI
GLM-5
744B (MoE) | Q4_K_MThe #1 model on Artificial Analysis Intelligence Index with a score of 50. Trained entirely on Huawei Ascend 910B chips โ zero US hardware dependency. Achieves SWE-bench 77.8% and leads on multi-turn long-context tasks.
๐ฌChat๐ Top Pick
160GB VRAM
MiniMax
MiniMax M2.5
230B (MoE) | Q4_K_MThe most-used open-weight model on OpenRouter. 256K native context with interleaved thinking. Runs at 60+ tokens/sec on Apple M5 Max 128GB unified memory โ the top Mac inference pick.
๐ฌChat๐ Top Pick
128GB VRAM
Alibaba
Qwen3.5-27B
27B dense | Q4_K_MCommunity top pick for a single 24GB GPU. Supports 201 languages natively with vision-language capability built in. IFBench 76.5 and AIME 91.3 โ outperforms GPT-5.2 on instruction following.
๐ฌChat๐ Top Pick
17GB VRAM
DeepSeek
DeepSeek V3.2
685B (MoE) | Q4_K_MSurpasses GPT-5 on AIME 2025 and HMMT competition math. LiveCodeBench 90%. The dominant open-weight model for advanced math and competitive programming with MIT-licensed weights.
๐ฌReasoning๐ฌ Reasoning
160GB VRAM
Meta AI
Llama 4 Scout
109B (MoE) | Q4_K_MHolds the record for largest context window of any open-weight model at 10 million tokens โ enough for an entire codebase. MoE architecture means only 17B parameters are active per forward pass.
๐ฌChat๐ Top Pick
48GB VRAM
Moonshot AI
Kimi-K2.5
1T (MoE) | Q4_K_MPowers Cursor Composer 2 โ the industry benchmark for agentic coding. #2 on the Artificial Analysis Intelligence Index. 1T total parameters with 32B active, delivering near-frontier reasoning at extreme scale.
๐ฌReasoning๐ฌ Reasoning
300GB VRAM
OpenAI
GPT-OSS 120B
117B (MoE) | Q4_K_MOpenAI's first open-weight release since GPT-2. Matches o4-mini on AIME and MMLU despite being open. Proves that open-weight frontier quality is achievable โ a landmark for the local AI community.
๐ฌReasoning๐ฌ Reasoning
80GB VRAM
DeepSeek
DeepSeek R2
236B (MoE) | Q4_K_MHybrid thinking model with toggleable chain-of-thought. MATH benchmark 96.6 and AIME 92.1. The most capable local reasoning model for a dual 40GB GPU setup โ beats all previous open reasoning models.
๐ฌReasoning๐ฌ Reasoning
64GB VRAM
Shanghai AI Lab
InternLM3-72B
72B dense | Q4_K_MBest bilingual Chinese/English 72B model available. MATH benchmark 90.1 and CEval 92.1. 262K context window makes it the top choice for legal, academic, and enterprise document processing across languages.
๐ฌChat๐ Top Pick
44GB VRAM
Alibaba
Qwen3.5-397B-A17B
397B (MoE) | Q4_K_MBeats GPT-5.2 on IFBench (76.5 vs 75.4) and scores 72.2 on BFCL-V4 tool use. A massive MoE that activates only 17B parameters per token for high efficiency.
๐ฌChat๐ Top Pick
128GB VRAM
Zhipu AI
GLM-4.7-Flash
355B (MoE) | Q4_K_MWinner of the 24GB VRAM agentic coding challenge. A 355B MoE model that fits a single consumer GPU by activating only 30B parameters. LiveCodeBench 89% โ the most capable coding model per dollar of VRAM.
๐ปCode๐ป Code
24GB VRAM
Agentica
DeepCoder-V2
236B (MoE) | Q4_K_MSWE-bench-lite 61.2% โ one of the highest scores for an open model on real GitHub issue resolution. Built-in native diff generation and test writing. Ideal for fully automated PR creation and code review pipelines.
๐ปCode๐ป Code
64GB VRAM
Alibaba
Qwen3.5-2B
2B dense | Q4_K_MRuns on iPhone in airplane mode. First sub-3B model with native multimodal support โ a landmark for on-device AI. Perfect for privacy-preserving mobile apps that need real conversational capability without a server.
๐ฌChatโก Fast
1.3GB VRAM
Liquid AI
LFM2-350M
350M | FP16Non-Transformer architecture with linear context scaling โ never degrades on long sequences. Achieves 40,400 tokens/sec on Apple Silicon. The fastest local model for structured extraction pipelines and IoT edge nodes.
๐ฌChatโก Fast
0.3GB VRAM
Black Forest Labs
FLUX.2 Dev
32B diffusion | BF16The best photorealism and text-in-image accuracy of any local model in 2026. Multi-reference image conditioning. Handles 1000-character prompts with full semantic fidelity โ the definitive standard for professional AI photography.
๐จImage Gen๐ Top Pick
24GB VRAM
Alibaba
Wan 2.2
14B video | BF16Leading open video model for 2026. 720P with full camera motion controls and best-in-class semantic consistency across frames. The top choice for filmmakers and content creators running a single 24GB GPU workstation.
๐ฌVideo Gen๐ Top Pick
24GB VRAM
Tencent
HunyuanVideo 1.5
13B video | BF16Cinematic-quality video generation accessible at just 14GB VRAM with model offloading. The most democratized cinematic video tool available. Produces Hollywood-grade motion blur, depth of field, and lighting consistency.
๐ฌVideo Gen๐ Top Pick
14GB VRAM
Mistral AI
Voxtral TTS
7B | Q4_K_MMatches or beats ElevenLabs Flash on prosody naturalness. 3-second voice cloning from a reference clip with no fine-tuning required. The top locally-run TTS for podcast production, audiobooks, and voice-over automation.
๐๏ธAudio / TTS๐ Top Pick
8GB VRAM
ACE-Step
ACE-Step 1.5
2B | BF16Best local music model for 2026. Generates up to 10 minutes of audio with precise genre, instrument, tempo, and lyrics control. Apache 2.0 commercial license. The definitive tool for indie game composers and content creators.
๐๏ธAudio / TTS๐ Top Pick
8GB VRAM
Alibaba
Qwen3-Embedding-8B
8B | Q4_K_MTop-ranked self-hosted embedding on MTEB English โ outperforms all sub-72B models. 32K context window for ultra-long document encoding. The upgrade path from BGE-M3 for teams needing maximum retrieval precision.
๐Embedding๐ Embed
5GB VRAM
Google
Gemma 3 1B
1B | Q4_K_MGoogle's smallest Gemma 3. Runs on virtually any GPU or even CPU โ great for on-device applications.
๐ฌChatโก Fast
0.9GB VRAM
DeepSeek
DeepSeek R1 1.5B
1.5B | Q4_K_MSmallest reasoning model you can run locally. Surprising chain-of-thought performance for its size.
๐ฌReasoning๐ฌ Reasoning
1.1GB VRAM
Google
Gemma 3 4B
4B | Q4_K_MGoogle's strong 4B model with multimodal capability and 128K context. One of the best small models.
๐ฌChat๐ Top Pick
3GB VRAM
DeepSeek
DeepSeek R1 7B
7B | Q4_K_MDistilled reasoning power in a 7B package. Excels at math, logic, and step-by-step problem solving.
๐ฌReasoning๐ฌ Reasoning
4.5GB VRAM
DeepSeek
DeepSeek R1 8B
8B | Q4_K_MLlama-3 distilled reasoning model. Outperforms GPT-4o on several math benchmarks at 8B scale.
๐ฌReasoning๐ฌ Reasoning
5GB VRAM
Google
Gemma 3 12B
12B | Q4_K_MGoogle's mid-tier Gemma 3. Multimodal capable, 128K context, strong multilingual reasoning.
๐ฌChat๐ Top Pick
8GB VRAM
DeepSeek
DeepSeek R1 14B
14B | Q4_K_MQwen-2.5 distilled reasoning model. Strong chain-of-thought and math at an accessible VRAM cost.
๐ฌReasoning๐ฌ Reasoning
9GB VRAM
Mistral AI
Mistral Small 22B
22B | Q4_K_MHighly capable 22B model from Mistral AI. Excellent instruction following for enterprise chat applications.
๐ฌChat๐ Top Pick
14GB VRAM
Google
Gemma 3 27B
27B | Q4_K_MGoogle's largest open Gemma model. Competes with 70B-class models from previous generations.
๐ฌChat๐ Top Pick
17GB VRAM
DeepSeek
DeepSeek R1 32B
32B | Q4_K_MThe sweet spot for local reasoning. Competitive with o1-mini on math and coding tasks at 32B scale.
๐ฌReasoning๐ฌ Reasoning
20GB VRAM
Alibaba
QwQ 32B
32B | Q4_K_MAlibaba's advanced reasoning model. Extended thinking and reflection enables GPT-o1-level problem solving.
๐ฌReasoning๐ฌ Reasoning
20GB VRAM
DeepSeek
DeepSeek R1 70B
70B | Q4_K_MFull-scale distilled reasoning model. Matches OpenAI o1 on math olympiad and competitive programming tasks.
๐ฌReasoning๐ฌ Reasoning
35GB VRAM
Zhipu AI
GLM-4.7
9B | Q4_K_MOpen source coding models specializing in coding and tool calling, based on a new base model.
๐ฌChat๐ป Code
6.5GB VRAM
Google
FunctionGemma 9B
9B | Q4_K_MA lightweight, open model built as a foundation for creating specialized function calling models.
๐ปCode๐ป Code
6.5GB VRAM
NVIDIA
Nemotron 3 22B
22B | Q4_K_MA general-purpose reasoning and chat model trained from scratch by NVIDIA, featuring a low-latency MoE architecture.
๐ฌReasoning๐ Top Pick
14.5GB VRAM
Zhipu AI
GLM-4.6V-Flash
9B | Q4_K_MA 9B vision-language model optimized for local deployment and low-latency applications.
๐๏ธVision๐๏ธ Vision
7GB VRAM
Essential AI
Rnj-1
8B | Q4_K_MA family of open-weight, dense models trained from scratch by Essential AI.
๐ฌChatโก Fast
5.5GB VRAM
Mistral AI
Ministral 3 8B
8B | Q4_K_MA highly cost-effective, high-performing 8B instruction tuned model.
๐ฌChat๐ Top Pick
5.5GB VRAM
NVIDIA
Nemotron 3 Super
120B | Q4_K_MA large-scale 120B parameter model designed for high-performance reasoning.
๐ฌReasoning๐ฌ Reasoning
68GB VRAM
Liquid AI
LFM2-24B-A2B
24B | Q4_K_MA large hybrid model family designed specifically for efficient on-device deployment.
๐ฌReasoningโก Fast
15GB VRAM
Allen AI
Olmo 3
10B | Q4_K_MA family of open language models designed to enable scientific research into language modeling.
๐ฌReasoning๐ฌ Reasoning
6.5GB VRAM
Allen AI
olmOCR 2
7B | Q4_K_MA specialized Vision Language Model (VLM) for optical character recognition tasks.
๐๏ธVision๐๏ธ Vision
5.5GB VRAM
IBM
Granite 4.0
8B | Q4_K_MLightweight open models supporting multilingual tasks, RAG, coding, and tool use.
๐ปCode๐ป Code
5.5GB VRAM
ByteDance
seed-oss
16B | Q4_K_MAn advanced reasoning model with flexible 'thinking budget' control and self-reflection capabilities.
๐ฌReasoning๐ฌ Reasoning
10.5GB VRAM
OpenAI
gpt-oss
16B | Q4_K_MOpenAI's open-weight LLM, supporting configurable reasoning efforts (low, medium, high).
๐ฌReasoning๐ฌ Reasoning
11GB VRAM
Mistral AI
Devstral
8B | Q4_K_MA coding model from Mistral AI designed for codebase exploration and engineering agents.
๐ปCode๐ป Code
6GB VRAM
Mistral AI
Magistral
32B | Q4_K_MAn open-weight reasoning model capable of long chains of thought before answering.
๐ฌReasoning๐ฌ Reasoning
20GB VRAM
Microsoft
phi-4-reasoning
14B | Q4_K_MA lightweight open model focused on high-quality, reasoning-dense synthetic data.
๐ฌReasoning๐ฌ Reasoning
9GB VRAM
Microsoft
Phi-4-mini
3.8B dense | Q4_K_MMicrosoft's best edge release. Fits 8GB RAM and runs fast on M1 MacBook Air in airplane mode. Exceptional at structured reasoning for its size โ the top choice for on-device personal assistants and document Q&A.
๐ฌChatโก Fast
2.3GB VRAM
NVIDIA
NVIDIA Nemotron Nano 8B
8B dense | Q4_K_MMath Index 91.0 โ the highest math score at the 8GB VRAM tier. NVIDIA's distilled Llama-3.1 with proprietary reward model training. Ideal for STEM tutoring and quantitative analysis on a single mid-range GPU.
๐ฌReasoning๐ฌ Reasoning
5GB VRAM
Nomic AI
Nomic Embed Text
137M | FP32High-quality embedding model with 8K context. Outperforms OpenAI text-embedding-ada-002 on MTEB benchmark.
๐Embedding๐ Embed
0.3GB VRAM
MixedBread AI
mxbai-embed-large
335M | FP32State-of-the-art embedding model for retrieval tasks. Ranks #1 on multiple MTEB categories.
๐Embedding๐ Embed
0.7GB VRAM
Alibaba
Qwen 2.5 0.5B
0.5B | Q4_K_MSmallest Qwen 2.5 โ blazing fast on any hardware. Surprisingly capable for its size on simple tasks.
๐ฌChatโก Fast
0.4GB VRAM
Meta AI
Llama 3.2 1B
1B | Q4_K_MCompact Llama 3.2 with impressively long 128K context window. Perfect for edge deployment.
๐ฌChatโก Fast
1GB VRAM
Alibaba
Qwen 2.5 1.5B
1.5B | Q4_K_MExcellent multilingual capabilities for its size. Particularly strong in Chinese and coding tasks.
๐ฌChatโก Fast
1.1GB VRAM
vikhyatk
Moondream 2
1.8B | Q4_K_MTiny but capable vision language model. Describe images, read text, answer visual questions locally.
๐๏ธVision๐๏ธ Vision
1.5GB VRAM
Meta AI
Llama 3.2 3B
3B | Q4_K_MBest-in-class 3B model with 128K context. Outperforms many 7B models on common benchmarks.
๐ฌChat๐ Top Pick
2GB VRAM
Alibaba
Qwen 2.5 3B
3B | Q4_K_MCompact and fast. Excellent multilingual and instruction-following performance at tiny VRAM cost.
๐ฌChatโก Fast
2GB VRAM
Microsoft
Phi 3.5 Mini
3.8B | Q4_K_MMicrosoft's multilingual tiny model with enormous 128K context window. Exceptional for document tasks.
๐ฌChatโก Fast
2.5GB VRAM
Alibaba
Qwen 2.5 Coder 3B
3B | Q4_K_MCompact code-specialized model. Strong at code completion and debugging on very limited hardware.
๐ปCode๐ป Code
2.2GB VRAM
Alibaba
Qwen 2.5 7B
7B | Q4_K_MHighly competitive 7B model with long context and strong multilingual support. A top value pick.
๐ฌChat๐ Top Pick
4.5GB VRAM
Alibaba
Qwen 2.5 Coder 7B
7B | Q4_K_MBest-in-class 7B code model. Excellent at multi-language completion, bug fixing, and code explanation.
๐ปCode๐ป Code
4.5GB VRAM
Google
CodeGemma 7B
7B | Q4_K_MGoogle's code-tuned Gemma variant. Excellent at code completion tasks inside IDEs.
๐ปCode๐ป Code
5GB VRAM
Meta AI
Llama 3.1 8B
8B | Q4_K_MMeta's powerhouse 8B model with 128K context. Excellent all-rounder for chat, code, and reasoning.
๐ฌChat๐ Top Pick
5GB VRAM
Meta AI
Llama 3.2 Vision 11B
11B | Q4_K_MState-of-the-art open-weight vision model. Analyze charts, read documents, describe complex scenes.
๐๏ธVision๐๏ธ Vision
8GB VRAM
Mistral AI / NVIDIA
Mistral NeMo 12B
12B | Q4_K_M128K context window in a 12B model. Joint Mistral AI & NVIDIA collaboration โ excellent for long-document tasks.
๐ฌChat๐ Top Pick
8GB VRAM
Alibaba
Qwen 2.5 14B
14B | Q4_K_MExceptional 14B all-rounder. Competitive with many 30B+ models on reasoning and coding benchmarks.
๐ฌChat๐ Top Pick
9GB VRAM
Alibaba
Qwen 2.5 Coder 14B
14B | Q4_K_MTop-tier code model for 12GB+ GPUs. Strong at agentic coding, multi-file edits, and complex refactors.
๐ปCode๐ป Code
9GB VRAM
Microsoft
Phi 4 14B
14B | Q4_K_MMicrosoft's flagship small model. Trained on synthetic data with exceptional reasoning and STEM performance.
๐ฌReasoning๐ฌ Reasoning
9GB VRAM
BigCode
StarCoder2 15B
15B | Q4_K_MSpecialized code completion model trained on 600+ programming languages. Top-tier for in-IDE completions.
๐ปCode๐ป Code
10GB VRAM
Mistral AI
Codestral 22B
22B | Q4_K_MMistral's dedicated code model. Industry-leading performance on FIM (fill-in-middle) and complex code generation.
๐ปCode๐ป Code
14GB VRAM
Alibaba
Qwen 2.5 32B
32B | Q4_K_MNear-frontier performance from a local 32B model. Exceptional multilingual reasoning and instruction following.
๐ฌChat๐ Top Pick
20GB VRAM
Alibaba
Qwen 2.5 Coder 32B
32B | Q4_K_MThe best open-weight code model available. Matches GPT-4o on coding benchmarks, runs locally at 20GB VRAM.
๐ปCode๐ป Code
20GB VRAM
Meta AI
Llama 3.3 70B
70B | Q4_K_MMeta's flagship 70B. Competitive with GPT-4o on many tasks. Requires dual-GPU or Mac Studio for local use.
๐ฌChat๐ Top Pick
40GB VRAM
Meta AI
Llama 3.1 70B
70B | Q4_K_MThe original instruction-tuned 70B Llama 3.1. Multi-lingual, long-context, and highly capable.
๐ฌChat๐ Top Pick
40GB VRAM
Alibaba
Qwen 2.5 72B
72B | Q4_K_MQwen's largest open model. Exceptional multilingual and coding capabilities โ among the best open-weight 70B+ models.
๐ฌChat๐ Top Pick
43GB VRAM
Meta AI
Llama 3.2 Vision 90B
90B | Q4_K_MMeta's best vision model. Near-frontier document understanding and complex scene analysis. Needs multi-GPU or Mac Studio Ultra.
๐๏ธVision๐๏ธ Vision
55GB VRAM
Mistral AI
Mixtral 8x22B
141B (MoE) | Q4_K_MGold standard MoE for performance-per-VRAM on a 48GB system. Battle-tested JSON mode and function calling. The preferred baseline for enterprise RAG pipelines needing 65K context without multi-GPU complexity.
๐ฌChat๐ Top Pick
48GB VRAM
Alibaba
Qwen 2.5 Coder 32B
32B dense | Q4_K_MHumanEval 92% โ the best single-GPU coding model for RTX 5090 owners. Excels at multi-file edits, refactoring, and agentic tool use pipelines. The definitive open coding model at the 20GB VRAM tier.
๐ปCode๐ป Code
20GB VRAM
Infly
OpenCoder-8B
8B dense | Q4_K_MThe 'OLMo of coding' โ fully transparent training pipeline with HumanEval 83.5%. Every component is open: weights, data, and methodology. The most trustworthy small coding model for compliance-sensitive teams.
๐ปCode๐ป Code
5GB VRAM
UIUC
MagicCoder-S-DS-6.7B
6.7B dense | Q4_K_MHumanEval 76.8% at just 6.7B โ beats models 10ร its size through OSS-Instruct training on real open-source code. The best option for code completion on 6GB VRAM with quality that defies the parameter count.
๐ปCode๐ป Code
4GB VRAM
HuggingFace
SmolLM2-1.7B
1.7B dense | Q4_K_MRuns in-browser via WebGPU โ no installation required. Best for Electron apps and Raspberry Pi deployments. HuggingFace's most downloaded edge model with an Apache 2.0 license and full community model ecosystem.
๐ฌChatโก Fast
1.1GB VRAM
Black Forest Labs
FLUX.1 Schnell
12B diffusion | BF164-step generation with Apache 2.0 commercial license. The fastest high-quality local image model โ produces studio-grade output in under 3 seconds on a 24GB GPU. The go-to for commercial product photography pipelines.
๐จImage Genโก Fast
12GB VRAM
ByteDance
SDXL-Lightning
3.5B diffusion | FP1650-step SDXL quality in just 4 steps via adversarial diffusion distillation. Fully Apache 2.0 commercial. ComfyUI-native workflow. The fastest path from creative brief to production asset on 8GB VRAM hardware.
๐จImage Genโก Fast
8GB VRAM
Stability AI
Stable Diffusion 3.5 Large
8B diffusion | BF16Best text-in-image rendering and the largest open LoRA fine-tune ecosystem with 50,000+ community models. The creative industry's preferred foundation for style transfer, brand consistency, and character sheet generation.
๐จImage Gen๐ Top Pick
12GB VRAM
THUDM
CogVideoX-5B
5B video | BF166-second 720P clips on a single 16GB GPU โ the easiest entry point into local video generation. Apache 2.0 commercial license with active ComfyUI integration. Best for social media automation and rapid prototyping.
๐ฌVideo Genโก Fast
16GB VRAM
hexgrad
Kokoro-82M
82M | CPUCPU-only TTS that runs on Raspberry Pi. The best quality-per-watt ratio of any voice model โ 82M parameters producing studio-quality speech synthesis. Apache 2.0 commercial license with no GPU requirement whatsoever.
๐๏ธAudio / TTSโก Fast
0GB VRAM
SWivid
F5-TTS
300M | FP16Flow-matching TTS with no duration modeling โ produces the most natural prosody and sentence rhythm of any local voice model. MIT license. The preferred choice for giving local AI agents a human-sounding voice interface.
๐๏ธAudio / TTS๐ Top Pick
2GB VRAM
OpenAI
WhisperX
1.5B | FP16The de facto standard for local speech-to-text. Word-level timestamps, speaker diarization, and 99 language support. Essential for transcription pipelines, meeting summarization, and building voice-first AI interfaces.
๐๏ธAudio / TTS๐ Top Pick
4GB VRAM
OpenBMB
MiniCPM-V 2.6
8B | Q4_K_MVideo + multi-image + text understanding at 8B parameters. The best vision model for 8GB VRAM setups โ handles 40-frame video clips, multi-image comparison, and document understanding in a single context window.
๐๏ธVision๐๏ธ Vision
5.5GB VRAM
Microsoft
Florence-2
770M | FP16Captioning, object detection, grounding, OCR, and segmentation in one 770M model โ MIT license. The Swiss Army knife of computer vision. Runs on almost any GPU and powers automated image tagging pipelines at scale.
๐๏ธVision๐๏ธ Vision
0.5GB VRAM
Meta AI
SAM 2
300M | FP16Click anywhere on an image or video โ instant object segmentation. Apache 2.0. Universal segmentation model used in medical imaging, autonomous driving datasets, and content creation. Zero training required for any object class.
๐๏ธVision๐๏ธ Vision
4GB VRAM
BAAI
BGE-M3
568M | FP16The default local RAG embedding model for 2026. 100 languages, 8K context, and three retrieval modes (dense, sparse, multi-vector) in one model. MIT license. Used in production by thousands of enterprise RAG pipelines.
๐Embedding๐ Embed
0.5GB VRAM
Mistral AI
Mistral 7B
7B | Q4_K_MThe model that proved smaller can beat bigger. Mistral 7B outperforms many 13B models with blazing fast speed.
๐ฌChat๐ Top Pick
4.5GB VRAM
Haotian Liu et al.
LLaVA 7B
7B | Q4_K_MThe classic vision-language model. Describe images, answer visual questions locally. Proven and reliable.
๐๏ธVision๐๏ธ Vision
5.5GB VRAM
Mistral AI
Mixtral 8x7B
46.7B (MoE) | Q4_K_MMixture-of-Experts: 46.7B total params but only 12.9B active per token. Near-70B quality at lower inference cost.
๐ฌChat๐ Top Pick
24GB VRAM
Haotian Liu et al.
LLaVA 34B
34B | Q4_K_MHigh-quality vision model at 34B scale. Significantly better image analysis than the 7B version.
๐๏ธVision๐๏ธ Vision
22GB VRAM
Sentence Transformers
all-MiniLM-L6
22M | FP32Ultra-compact sentence embedding model. Perfect for semantic search and RAG pipelines on any hardware.
๐Embedding๐ Embed
0.1GB VRAM
Meta AI
Llama 4 Maverick
400B (MoE) | Q4_K_MOutperforms Scout on reasoning, instruction following, and mathematical capabilities while offering 1M context length. The flagship enterprise model from Meta.
๐ฌChat
80GB VRAM
01.AI
Yi-Lightning-2
200B (MoE) | Q4_K_MTopped the LMSYS Arena leaderboard for prolonged periods. Exceptional Chinese/English bilingual fluency and high accuracy on long-document summarization tasks.
๐ฌChat
128GB VRAM
Google
Gemma 3 27B
27B dense | Q4_K_MOutstanding performance at the 24GB VRAM tier with a native Google vision-language encoder integrated directly into the weights. MMLU-Pro score of 67.5.
๐ฌChat
17GB VRAM
OpenAI
gpt-oss-safeguard
7B | Q4_K_MOpen safety models built on the gpt-oss foundation to help classify and filter text content.
๐ฌChat
5.5GB VRAM
Baidu
Ernie-4.5
32B (MoE) | Q4_K_MA medium-sized Mixture-of-Experts foundation model from Baidu.
๐ฌChat
14GB VRAM
Allen AI
OLMo 2 32B
32B dense | Q4_K_MThe most transparent large language model ever released โ weights, training data, code, evaluation harnesses, and training logs all fully public. Essential for research reproducibility and academic citation.
๐ฌChat
20GB VRAM
Mistral AI
Mistral Large 3 (123B)
123B dense | Q4_K_MTop multilingual and European benchmark performer. Unrivaled for local RAG implementations that need to parse documents natively across 23+ global languages.
๐ฌChat
80GB VRAM
TII
Falcon 3 (180B)
180B dense | Q4_K_MThe UAE TII flagship dense model. Currently one of the only truly permissive, royalty-free commercial licenses available at the 150B+ parameter scale.
๐ฌChat
160GB VRAM
Cohere
Command A
111B (MoE) | Q4_K_MThe industry standard for strict JSON output and multi-step agentic workflows. By far the most reliable open model when interacting directly with APIs and backend schemas.
๐ฌChat
80GB VRAM
CohereForAI
Aya Expanse 32B
32B dense | Q4_K_MNative support for 23 languages. The single best open multilingual foundation model available for researchers building localized or translation-heavy pipelines.
๐ฌChat
20GB VRAM
HuggingFace
Zephyr 141B-A39B
141B (MoE) | Q4_K_MA HuggingFace DPO alignment fine-tune built on top of Mixtral 8x22B. Renowned for its warm, helpful tone and high consumer chat preference win rates.
๐ฌChat
48GB VRAM
PixArt-alpha
PixArt-ฮฃ
600M diffusion | FP16Tiny model with 4K native output capability โ best image-per-VRAM-dollar ratio available. Apache 2.0. Runs on GTX 1070 8GB. The only model sub-1GB that produces print-resolution imagery with coherent composition.
๐จImage Gen
6GB VRAM
HPC-AI Tech
Open-Sora
4B video | BF16Full open-source Sora replication โ weights, training code, and full data pipeline are all public. The most transparent video generation model. Essential for researchers studying text-to-video architectures and training dynamics.
๐ฌVideo Gen
16GB VRAM
Tencent
MuseTalk
500M | FP16Photorealistic lip sync at 30+ FPS in real time. The best model for live avatar streaming and talking-head video creation. MIT license with active Discord community. Integrates natively with OBS and streaming tools.
๐๏ธAudio / TTS
6GB VRAM
Vidore
ColPali
3B | Q4_K_MEncodes PDF pages as images โ bypasses all broken PDF parsers for perfect scanned document retrieval. Apache 2.0. The breakthrough for RAG on government forms, research papers, and historical archives with graphical content.
๐๏ธVision
2GB VRAM
Apple
Depth Pro
300M | FP16Single image โ metrically accurate 3D depth map in under 0.3 seconds. Free for research use. Powers 3D scene reconstruction, bokeh simulation, and AR/VR depth estimation pipelines without any calibration data.
๐๏ธVision
4GB VRAM
OpenTalker
SadTalker
300M | FP16One photo โ talking head video with natural head movement, blinks, and mouth articulation. MIT license. No video sample required โ just a single still image and an audio clip. The most accessible local avatar creation tool.
๐๏ธAudio / TTS
6GB VRAM
Meta AI
MusicGen Large
4B | BF16Meta's flagship 4B music model with melody conditioning from reference audio clips. Best for cinematic scoring and mood-driven generation. CC-BY-NC-4.0. Industry standard for AI-assisted film and game soundtrack production.
๐๏ธAudio / TTS
12GB VRAM