by Michele Laurelli
A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.
Transformers revolutionized NLP by processing entire sequences in parallel using self-attention. Key components include multi-head attention, positional encoding, and feed-forward networks. Models like GPT and BERT are based on this architecture.
GPT (Generative Pre-trained Transformer)
BERT (Bidirectional Encoder Representations from Transformers)
T5 (Text-to-Text Transfer Transformer)