by Michele Laurelli
Pre-training task where random tokens are masked and model predicts them from context.
15% of tokens masked: 80% [MASK], 10% random, 10% unchanged. BERT's pre-training objective. Enables bidirectional context.
BERT pre-training
Cloze task
Bidirectional language understanding