AI Blog

AI Blog

by Michele Laurelli

Transformer

/trænsˈfɔːrmər/
Architecture
Definition

A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.

Transformers revolutionized NLP by processing entire sequences in parallel using self-attention. Key components include multi-head attention, positional encoding, and feed-forward networks. Models like GPT and BERT are based on this architecture.

Examples

1

GPT (Generative Pre-trained Transformer)

2

BERT (Bidirectional Encoder Representations from Transformers)

3

T5 (Text-to-Text Transfer Transformer)

Michele Laurelli - AI Research & Engineering