AI Blog

AI Blog

by Michele Laurelli

SGD (Stochastic Gradient Descent)

/stoʊˈkæstɪk ˈɡreɪdiənt dɪˈsɛnt/
Algorithm
Definition

A gradient descent variant that updates weights using gradients from a single random training example at a time.

SGD is faster and enables online learning but has noisy gradients. The noise can help escape local minima. Mini-batch SGD balances efficiency and gradient quality.

Examples

1

Online learning

2

Large-scale training

3

Escaping local minima

Michele Laurelli - AI Research & Engineering