by Michele Laurelli
Three vectors used in attention mechanisms to compute weighted combinations of input elements.
In attention, inputs are projected to Query (what we're looking for), Key (what each element offers), and Value (actual content). Attention scores are computed as similarity between Query and Keys, then used to weight Values.
Transformer attention
Self-attention computation
Cross-attention in encoder-decoder