All activation function are used to introduce some non-linearity in the network, but depending on the network, some function are able to mitigate some other problems and make the learning process more stable.

There are a lot of activation functions, but the most popular are the following:

Note

The rule of thumb is to use ReLU and see how it goes, then try to use one of its variants (including Maxout) to squeeze out some marginal gains. Don’t use sigmoid or tanh!


tags:#ai/deep-learning