by Michele Laurelli
Hyperparameter controlling how much to adjust model weights during training.
Step size in gradient descent. Too high: unstable training. Too low: slow convergence. Typical values: 0.001, 0.01, 0.1.
Learning rate 0.001
LR too high causes divergence
Adaptive learning rates