by Michele Laurelli
Learning paradigm where models create supervision signal from unlabeled data.
Predicts parts of input from other parts (masked LM, image inpainting, next frame). Enables pre-training on massive unlabeled datasets. Foundation of BERT, GPT.
BERT masked language modeling
GPT next token prediction
Image rotation prediction