AI Blog

AI Blog

by Michele Laurelli

Vanishing Gradient

/ˈvænɪʃɪŋ ˈɡreɪdiənt/
Concept
Definition

A problem in deep networks where gradients become extremely small, preventing effective learning in early layers.

Vanishing gradients occur when repeated multiplication of small derivatives (< 1) makes gradients exponentially smaller in backpropagation. Common in deep RNNs and networks with sigmoid activations. Solutions: ReLU, LSTM, ResNet.

Examples

1

Deep RNNs failing to learn long dependencies

2

Sigmoid activation in deep networks

3

Pre-ResNet very deep networks

Michele Laurelli - AI Research & Engineering