by Michele Laurelli
An activation function that converts a vector of values into a probability distribution summing to 1.
Softmax is used in multi-class classification output layers. It exponentiates each value and normalizes by the sum, producing interpretable probabilities for each class.
Multi-class classification
Language model next-token prediction
Image classification output