by Michele Laurelli
A problem where gradients become extremely large during training, causing unstable updates and divergence.
Exploding gradients occur when repeated multiplication of large derivatives (> 1) makes gradients exponentially larger. This causes massive weight updates that destabilize training. Solutions: gradient clipping, proper initialization, batch normalization.
RNN training divergence
NaN values in weights
Oscillating loss