ClarityAI LogoClarityAI
Model Architectures/attention-head

Attention Head

A mechanism inside a transformer that decides which words to focus on when processing language.

What it actually means

An attention head is a component inside a transformer model that looks at all the words in a sentence and figures out which ones are most relevant to each other. A model has many attention heads running in parallel, each learning to notice different relationships between words.

Real-world analogy

Imagine reading the sentence "The trophy didn't fit in the suitcase because it was too big." Your brain immediately focuses on "trophy" and "big" to understand what "it" refers to. An attention head does the same — it learns to focus on the right words to resolve ambiguity and understand meaning.

Common misconception

Attention heads don't "understand" language the way humans do. They learn statistical patterns about which words tend to relate to each other across millions of examples. Understanding is a human interpretation of what is fundamentally pattern matching.