This model requires a specialized High-VRAM environment. Ensure you have the latest CUDA Drivers or Metal Framework installed.
Minimum VRAM: 16GB VRAM Recommended
Origins & History
The Codestral 22B model by Mistral AI is a 22B parameter architecture optimized for code tasks. It requires approximately 14GB of VRAM to comfortably run locally using a Q4_K_M quantization. Extending the context window up to 32,768 tokens will dynamically allocate further VRAM, meaning high-bandwidth memory hardware is strictly advised.
Pros
Full privacy and offline inference capabilities
Highly capable 22B parameter structure
Supports impressive 32,768 token context window
Cons
Requires 14GB+ VRAM minimum
Local inference speed depends entirely on memory bandwidth (GB/s)
Architect's Runtime Strategy
For running Codestral 22B at maximum tokens-per-second, we recommend using LM Studio or Ollama with a GGUF quantization (Q4_K_M or Q6_K). If you are multi-GPU, use vLLM to distribute the layers across your VRAM pool for optimal throughput.
Common Questions
What hardware do I need to run Codestral 22B?
You will need a GPU with at least 16GB of VRAM to run the Q4_K_M quantized version smoothly with a moderate context window.
How do I install Codestral 22B locally?
The simplest method is utilizing Ollama by executing 'ollama run codestral' directly in your command line. Alternatively, you can search for the model via LM Studio's interface.