by Michele Laurelli
Normalizes layer inputs using batch statistics to stabilize and accelerate training.
Normalizes using batch mean/variance, then scales/shifts with learnable parameters. Reduces internal covariate shift. Allows higher learning rates.
After conv layers
Before activation
Stabilizing deep networks