AI Blog

by Michele Laurelli

Causal Masking

Technique

Definition

Masking technique preventing attention to future positions in autoregressive models.

Ensures token at position i can only attend to positions ≤ i. Essential for GPT-style models. Implemented as triangular mask.

GPT autoregressive generation

Decoder self-attention

Preventing future leakage

A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.

A family of large language models developed by OpenAI that use transformer architecture for text generation.

Michele Laurelli - AI Research & Engineering