AI Blog

by Michele Laurelli

Back to Glossary

Vanishing Gradient

/ˈvænɪʃɪŋ ˈɡreɪdiənt/

Concept

Definition

A problem in deep networks where gradients become extremely small, preventing effective learning in early layers.

Vanishing gradients occur when repeated multiplication of small derivatives (< 1) makes gradients exponentially smaller in backpropagation. Common in deep RNNs and networks with sigmoid activations. Solutions: ReLU, LSTM, ResNet.

Examples

Deep RNNs failing to learn long dependencies

Sigmoid activation in deep networks

Pre-ResNet very deep networks

Related Terms

Backpropagation

An algorithm for training neural networks by calculating gradients of the loss function with respect to weights.

Gradient Descent

An optimization algorithm that iteratively adjusts parameters to minimize a loss function by following the gradient.

ReLU (Rectified Linear Unit)

An activation function that outputs the input if positive, otherwise zero: f(x) = max(0, x).

Michele Laurelli - AI Research & Engineering