by Michele Laurelli
A scalar value indicating how much focus to place on a specific part of the input when producing output.
Attention weights are computed through dot products of query and key vectors, followed by softmax normalization. Higher weights mean greater importance. They enable models to focus on relevant information dynamically.
Translation attention to source words
Image captioning attention to regions
Document summarization