LOCAL AI // GLOSSARY

ROCm

AMD's open-source answer to CUDA — enables AI inference on Radeon GPUs.

Definition

ROCm (Radeon Open Compute) is AMD's open-source software stack for GPU computing. It provides CUDA compatibility layers (HIP), math libraries, and ML framework support (PyTorch-ROCm). ROCm enables AI inference and training on AMD RDNA 3/4 and CDNA GPUs.

Why It Matters

Medium. ROCm compatibility has improved significantly in 2024-2025. Tools like llama.cpp, Ollama, and PyTorch now officially support ROCm — making AMD RX 7900 XTX (24GB VRAM) a viable local AI option at lower cost than the RTX 4090. However, debugging is harder than CUDA.

Real-World Example

To run Ollama on an AMD RX 7900 XTX with ROCm: install ROCm 6.x drivers, then 'ollama run llama3.3' works identically to the NVIDIA experience — just with AMD's compute backend.

History of ROCm

AMD open-sourced ROCm in 2016 as its response to CUDA's ecosystem lock-in. Progress was slow until 2022-2023, when ML framework maintainers began officially supporting ROCm wheels. AMD's MI300X (2024) with 192GB HBM3 made ROCm critical for datacenter-scale inference.

Frequently Asked Questions

Can an AMD 7900 XTX be used for AI?▼

Yes, heavily. By leveraging ROCm on Linux, the Radeon 7900 XTX offers 24GB of VRAM at nearly 1,000 GB/s bandwidth. It performs as well or better than an RTX 4080 in raw inference, making it an excellent budget alternative to an RTX 4090.

Is ROCm supported on Windows?▼

AMD has introduced a Windows variation called HIP SDK, but natively running ROCm for the easiest developer experience currently requires running Ubuntu Linux or relying on carefully compiled binaries provided by consumer tools like LM Studio.

Why do developers prefer CUDA over ROCm?▼

NVIDIA had a decade head start in AI developer tooling. As a result, nearly every github repository, academic research paper, and experimental AI tool is written utilizing native CUDA commands perfectly. Adapting them to ROCm often involves extensive patching and troubleshooting.

Related Concepts

VRAM

The on-GPU memory that stores model weights. Determines which AI models you can run.

CUDA

NVIDIA's proprietary parallel computing platform — the secret behind why NVIDIA GPUs dominate AI.

Inference

Running a trained AI model to generate outputs — what your local GPU does when you chat with an LLM.

Browse More Terms

All Terms

VRAM

Quantization

FP8 / FP4

Tokens per Second (TPS)

Memory Bandwidth

KV Cache