Core Concepts/inference

Inference

The process of an AI model generating a response after it has been trained.

What it actually means

Inference is what happens when you send a message to an AI and it generates a reply. Training is the expensive, time-consuming process of building the model. Inference is using that trained model to produce outputs — it happens every time you interact with an AI.

Real-world analogy

Training is like years of studying and practice to become a chef. Inference is actually cooking a meal. The learning is done — now the skill is being applied. Every dish the chef makes is inference; culinary school was training.

Common misconception

Inference is not free or instant. Running large models requires significant compute — GPUs, memory, energy. This is why AI APIs charge per token and why response speed varies. Inference cost is one of the biggest challenges in deploying AI at scale.

Related terms

llm token fine-tuning