by Michele Laurelli
Adaptive learning rate optimization algorithm using moving average of squared gradients.
Divides learning rate by exponential moving average of squared gradients. Addresses Adagrad's diminishing learning rates. Good for RNNs and non-stationary problems.
RNN training
Non-convex optimization
Adaptive learning