Activation Functions in Neural Networks

Study Notes

Activation Functions

Activation functions are a crucial component of artificial neural networks, introducing non-linearity to the model and enabling it to learn complex relationships between inputs and outputs.

Types of Activation Functions:

Sigmoid:
- Maps input to a value between 0 and 1
- Used for binary classification problems
- Suffers from vanishing gradient problem
ReLU (Rectified Linear Unit):
- Maps all negative values to 0 and all positive values to the same value
- Fast computation and easy to compute
- Does not suffer from vanishing gradient problem
Tanh (Hyperbolic Tangent):
- Maps input to a value between -1 and 1
- Similar to sigmoid, but outputs are centered around 0
- Also suffers from vanishing gradient problem
Softmax:
- Used for multi-class classification problems
- Ensures output probabilities sum to 1
- Often used in output layer
Leaky ReLU:
- A variation of ReLU, allowing a small fraction of the input to pass through
- Helps to avoid dying neurons
Swish:
- A self-gated activation function, introduced in 2019
- Performs better than ReLU and its variants in some cases
Softplus:
- A smooth approximation of ReLU
- Can be used as a drop-in replacement for ReLU
Softsign:
- Similar to sigmoid, but with a more gradual slope
- Can be used for output layers with a large number of classes

Properties of Activation Functions:

Non-linearity: Enables the model to learn complex relationships between inputs and outputs
Differentiability: Allows for backpropagation and optimization
Monotonicity: The output of the function increases or decreases with the input
Computational efficiency: Some activation functions are faster to compute than others

Choosing the Right Activation Function:

Depends on the specific problem and dataset
May require experimentation to find the best activation function for the model
Consider the properties of the activation function and the requirements of the problem

Activation Functions

Activation functions introduce non-linearity to the model, enabling it to learn complex relationships between inputs and outputs.

Types of Activation Functions

Sigmoid: Maps input to a value between 0 and 1, used for binary classification problems, but suffers from vanishing gradient problem.
ReLU (Rectified Linear Unit): Maps all negative values to 0 and all positive values to the same value, fast computation, easy to compute, and does not suffer from vanishing gradient problem.
Tanh (Hyperbolic Tangent): Maps input to a value between -1 and 1, similar to sigmoid, but outputs are centered around 0, and also suffers from vanishing gradient problem.
Softmax: Used for multi-class classification problems, ensures output probabilities sum to 1, and often used in output layer.
Leaky ReLU: A variation of ReLU, allowing a small fraction of the input to pass through, helps to avoid dying neurons.
Swish: A self-gated activation function, introduced in 2019, performs better than ReLU and its variants in some cases.
Softplus: A smooth approximation of ReLU, can be used as a drop-in replacement for ReLU.
Softsign: Similar to sigmoid, but with a more gradual slope, can be used for output layers with a large number of classes.

Properties of Activation Functions

Non-linearity: Enables the model to learn complex relationships between inputs and outputs.
Differentiability: Allows for backpropagation and optimization.
Monotonicity: The output of the function increases or decreases with the input.
Computational efficiency: Some activation functions are faster to compute than others.

Choosing the Right Activation Function

Depends on the specific problem and dataset.
May require experimentation to find the best activation function for the model.
Consider the properties of the activation function and the requirements of the problem.