AI Blog

AI Blog

by Michele Laurelli

Attention Score

Concept
Definition

Computed similarity between query and key vectors before softmax normalization in attention.

Score = dot_product(Query, Key) / sqrt(d_k). Scaled dot-product prevents gradient issues. Higher scores mean more relevance.

Examples

1

Scaled dot-product attention

2

Query-key similarity

3

Attention computation

Michele Laurelli - AI Research & Engineering