AI Blog

by Michele Laurelli

Back to Glossary

Transformer

/trænsˈfɔːrmər/

Architecture

Definition

A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.

Transformers revolutionized NLP by processing entire sequences in parallel using self-attention. Key components include multi-head attention, positional encoding, and feed-forward networks. Models like GPT and BERT are based on this architecture.

Examples

GPT (Generative Pre-trained Transformer)

BERT (Bidirectional Encoder Representations from Transformers)

T5 (Text-to-Text Transfer Transformer)

Related Terms

GPT (Generative Pre-trained Transformer)

A family of large language models developed by OpenAI that use transformer architecture for text generation.

Michele Laurelli - AI Research & Engineering