by Michele Laurelli
An activation function that maps inputs to values between 0 and 1: f(x) = 1/(1 + e^(-x)).
Sigmoid is used for binary classification output layers and was historically popular for hidden layers. However, it suffers from vanishing gradient problems in deep networks.
Binary classification output
Logistic regression
Gate mechanisms in LSTM