AI Blog

AI Blog

by Michele Laurelli

Causal Masking

Technique
Definition

Masking technique preventing attention to future positions in autoregressive models.

Ensures token at position i can only attend to positions ≤ i. Essential for GPT-style models. Implemented as triangular mask.

Examples

1

GPT autoregressive generation

2

Decoder self-attention

3

Preventing future leakage

Michele Laurelli - AI Research & Engineering