LLM (Large Language Model)
The AI models like Llama, Mistral, and DeepSeek that generate human-like text — the software your GPU runs.
Definition
A Large Language Model (LLM) is a transformer-based neural network trained on massive text datasets to predict the next token in a sequence. Modern LLMs (GPT-4, Llama 3, DeepSeek R1) have billions of parameters and can perform reasoning, coding, translation, creative writing, and instruction-following.
Why It Matters
Foundational. Every other concept on this glossary serves the purpose of running LLMs more effectively on local hardware. Model size (measured in billions of parameters) directly correlates with VRAM requirements.
Real-World Example
Llama 3.3 70B is an LLM with 70 billion parameters. At Q4 quantization, each parameter occupies ~0.5 bytes on average, requiring approximately 40GB of VRAM — achievable on a dual RTX 4090 setup or a single RTX 5090.
History of LLM (Large Language Model)
The transformer architecture was introduced in 'Attention Is All You Need' (Google, 2017). GPT-1 (OpenAI, 2018) demonstrated unsupervised text generation. GPT-3 (2020) shocked the world with its capabilities at 175B parameters. Meta's LLaMA (2023) democratized the field by releasing competitive open weights, spawning the entire local AI ecosystem.
Frequently Asked Questions
Are local LLMs as smart as GPT-4?▼
What happens if I turn off my wifi?▼
Why do some LLMs refuse my prompts locally?▼
Related Concepts
VRAM
The on-GPU memory that stores model weights. Determines which AI models you can run.
Quantization
Compressing model weights from 16-bit to 4-bit precision to massively reduce VRAM usage.
Tokens per Second (TPS)
The universal speed metric for LLMs — how many words (tokens) your GPU generates per second.
Context Window
How much text the AI can 'remember' and process at once — directly tied to VRAM through the KV cache.