by Michele Laurelli
A hyperparameter controlling how much model weights are updated during training.
Learning rate determines the step size in gradient descent. Too high causes instability, too low slows training. Common strategies: fixed, decay, cyclical, adaptive (Adam).
Learning rate 0.001 for Adam
Learning rate decay schedule
Warm-up then decay strategy