by Michele Laurelli
The number of training examples used in one iteration of model training.
Batch size affects training speed, memory usage, and model convergence. Small batches provide noisier gradients, large batches are more stable but memory-intensive. Common values: 32, 64, 128, 256.
Batch size 32 for limited GPU memory
Batch size 256 for faster training
Mini-batch gradient descent