AI Blog

by Michele Laurelli

Attention Score

Concept

Definition

Computed similarity between query and key vectors before softmax normalization in attention.

Score = dot_product(Query, Key) / sqrt(d_k). Scaled dot-product prevents gradient issues. Higher scores mean more relevance.

Scaled dot-product attention

Query-key similarity

Attention computation

A technique allowing models to focus on specific parts of the input when producing output.

Three vectors used in attention mechanisms to compute weighted combinations of input elements.

An activation function that converts a vector of values into a probability distribution summing to 1.

Michele Laurelli - AI Research & Engineering