Perceptrons in Neural Networks

Study Notes

A perceptron is a function that maps D-dimensional vectors to real numbers.
It is a simple binary linear classifier, with the ability to compute the Boolean AND function.
The perceptron model is inspired by the way neurons operate in the brain.
A perceptron computes its output 𝑧 in two steps:
- Step 1: 𝑎 = 𝒘𝑇 𝒙 = σ𝐷 𝑖=0 𝑤𝑖 𝑥𝑖
- Step 2: 𝑧 = ℎ 𝑎
The bias input (𝑥0) is always equal to 1.
The bias weight (𝑤0) is optimised during training.
The activation function (ℎ) used in a perceptron is either a step function or sigmoid function and can be changed based on the requirement.
The sigmoidal function allows the use of gradient descent which is a powerful method for finding the optimal weights given a training dataset
Examples of Boolean functions that can be computed by the perceptron are the AND, OR, and NOT function.
The XOR function cannot be computed by a single perceptron but requires a neural network.

Built by using perceptrons as building blocks
Inputs to some perceptrons are outputs of other perceptrons
Can compute several functions
Consists of units:
- input units
- perceptrons
Each unit is connected to another unit by weights denoted by 𝑤𝑗𝑖.
Weights are optimised by a learning method
Neural networks are organized into layers:
- input layer
- output layer
- hidden layers (can be zero or multiple in between input and output layer)
The outputs of the hidden layers serve as inputs to the next layer. This process continues until the output layer is reached.
The XOR function can be computed by a neural network consisting of three units with two inputs (𝑥1, 𝑥2) and one output.
The output of the XOR function can be represented by the following function: A OR B (unit 3) A AND (NOT B) (unit 5) A AND B (unit 4)

Provides a theoretical foundation for the power of neural networks
A feedforward network with a single layer is sufficient to represent any continuous function
Uses rectified linear units (ReLUs)
This theorem highlights that neural networks are capable of approximating any function, given sufficient complexity (neurons, layers).
The theorem states: "A feedforward network with a single layer is sufficient to represent any function"
This theorem allows us to represent a decision tree with a single layer by using ReLUs, which are piecewise linear functions.
In simpler terms, ReLUs, can can take on a value of zero for any input less than zero and the input value itself if the input is greater than or equal to zero.
These ReLU functions can be used to approximate any function by gradually increasing the number of ReLUs (Neurons in a single layer) by creating a piecewise linear function

Activation functions are crucial for introducing non-linearity into neural networks.
If we didn't have activation functions, neural networks would just be linear transformations, limiting their representational power.
Examples of activation functions: step, sigmoid, ReLU
The chosen activation function can massively impact the network's learning and performance.