Inference
The process of an AI model generating a response after it has been trained.
Inference is what happens when you send a message to an AI and it generates a reply. Training is the expensive, time-consuming process of building the model. Inference is using that trained model to produce outputs — it happens every time you interact with an AI.
Training is like years of studying and practice to become a chef. Inference is actually cooking a meal. The learning is done — now the skill is being applied. Every dish the chef makes is inference; culinary school was training.
Inference is not free or instant. Running large models requires significant compute — GPUs, memory, energy. This is why AI APIs charge per token and why response speed varies. Inference cost is one of the biggest challenges in deploying AI at scale.