Easy Last updated on May 7, 2022, 1:11 a.m.
Sigmoid Activation Function is an exponential form of non-linearity leading to positive outputs. As we can see in the figure below, the sigmoid function value ranges between [0,1].
Sigmoid Activation function is widely used as the last layer of neural networks, for their output range of [0 1] that can be used as a probabilistic representation. However, Sigmoid Function has the problem of saturation i.e; Mathematically speaking.
$$ \sigma(x) = \frac{1}{1+e^{-(wx+b)}}$$
As we know sigmoid function has a great differentiation property.
$$ \sigma’(x) = \sigma(x) (1-\sigma(x)) $$
This will result in two cases:
Therefore, using the derivative equation above we will get:
$$ \sigma’ (wx) = 1*(1-1) $$
similarly, using the derivative equation above we will get:
$$ \sigma’ (wx) = 0*(1-0) $$
In both above cases, we can observe a saturation of outcome as well as gradients. Smaller gradients can lead to slow convergence to optima when using the gradient descent algorithm.