by Michele Laurelli
An optimization technique that accelerates gradient descent by accumulating a velocity vector in directions of persistent reduction in the loss.
Momentum helps navigate ravines and accelerates convergence. It adds a fraction of the previous update vector to the current one, smoothing optimization trajectory and reducing oscillations.
SGD with momentum 0.9
Nesterov momentum
Accelerating convergence