AI Blog

AI Blog

by Michele Laurelli

Adagrad

Algorithm
Definition

Adaptive learning rate optimizer that adapts rates per parameter based on historical gradients.

Accumulates squared gradients, larger accumulation means smaller learning rate. Good for sparse data. Can have diminishing learning rates over time.

Examples

1

Sparse gradient optimization

2

NLP embeddings

3

Convex problems

Michele Laurelli - AI Research & Engineering