by Michele Laurelli
A gradient descent variant that updates weights using gradients from a single random training example at a time.
SGD is faster and enables online learning but has noisy gradients. The noise can help escape local minima. Mini-batch SGD balances efficiency and gradient quality.
Online learning
Large-scale training
Escaping local minima