AI & Machine Learning 2 PDF

Amar Telidji University –Laghouat Faculty of Sciences Computer Science Department AI & Machine Learning 2 𝑴𝒂𝒔𝒕𝒆𝒓 𝒊𝒏 𝑫𝒂𝒕𝒂 𝑺𝒄𝒊𝒆𝒏𝒄𝒆 & 𝑨𝒓𝒕𝒊𝒇𝒊𝒄𝒊𝒂𝒍 𝑰𝒏𝒕𝒆𝒍𝒍𝒊𝒈𝒆𝒏𝒄𝒆 (𝑫𝑺𝑨𝑰) 𝑴 𝟎𝟐 Dr. Sarra Boudouh [email protected] 2024/2025 Table of contents Chapter 01:Introduction to Deep Learning 01 Introduction 02 Multilayer Perceptron (MLP) 03 Feed-forward Process 04 Loss Calculation 05 Back-propagation Process 06 Dropout 07 Conclusion 01 Introduction Introduction Deep learning has revolutionized the field of artificial intelligence by enabling models, particularly Artificial Neural Networks (ANNs), to learn from vast amounts of data and achieve state-of-the-art results in various domains. This chapter introduces the foundational concepts of deep learning through the study of the Multilayer Perceptron (MLP), a fundamental type of ANN. 02 Multilayer Perceptron (MLP) Multilayer Perceptron (MLP) Multilayer Perceptrons (MLPs) form the foundation of many modern deep- learning models. They are composed of interconnected layers of artificial neurons and are one of the simplest and most widely used architectures in neural networks. MLPs can be used for classification and regression tasks. Perceptron The Perceptron, is a classifier that forms the building block of more complex neural networks. It takes an input vector x = [x1, x2,... , xn] and computes a weighted sum of the inputs, which is then passed through an activation function to produce an output. 𝒏 𝒚 = 𝒇 ෍ 𝒘𝒊 𝒙𝒊 + 𝒃 𝒊=𝟏 where 𝒘𝒊 are the weights, 𝒃 is the bias, and 𝒇 (·) is the activation function. The perceptron is capable of learning linear decision boundaries, making it suitable for linearly separable data. An activation function in a neural network determines whether a neuron should be activated or not by transforming the summed weighted input. It introduces non-linearity, which allows the model to learn complex patterns and relationships in the data. The bias helps shift the activation function, allowing the model to fit the data better by enabling neurons to fire even when the input is zero, thus improving the flexibility and accuracy of predictions. Perceptron 𝒏 𝒚 = 𝒇 ෍ 𝒘𝒊 𝒙𝒊 + 𝒃 b 𝒊=𝟏 Perceptron (Neuron) Output Input Weights x1 𝑛 𝑛 w2 x2 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 y 𝑖=1 𝑖=1 Sum Activation xn The bias in a neural network is like an extra constant that helps adjust the output. It lets the model make better predictions by shifting the activation, even if all the inputs are zero, giving the model more flexibility. Perceptron Linearity means that the relationship between input and output is straightforward and forms a straight line. For example, in Linear Regression, if you increase the input by 1, the output changes by a constant amount. So, it is like drawing a straight line on a graph. Non-linearity means that the relationship between input and output is more complex, and the changes in input don’t always cause proportional changes in output. For example, in Neural Networks, the model can learn more complicated patterns like curves or loops, not just straight lines. So, linear models work for simple relationships (like predicting house prices based on square footage), while non-linear models can handle more complex data (like recognizing objects in an image). Structure of MLP The MLP is a fully connected neural network consisting of an input layer, one or more hidden layers, an output layer. The output of an MLP with L layers is computed as: 𝑿 is the input, 𝒘𝑳 and 𝒃𝑳 are the weights and biases of layer 𝑳, 𝒇 is the activation function. Structure of MLP Input Layer Hidden Layers Output Layer ………........................ Input layer: receives dataset features, no computation, passes data to hidden layers. Hidden layers are where the real computation takes place. Output layer: produces final prediction. Activation Functions Activation functions add non-linearity, allowing neural networks to model complex patterns. Their selection is crucial for performance, with each function suited to specific layers or tasks. 𝒏 𝒇 ෍ 𝒘𝒊 𝒙𝒊 + 𝒃 𝒊=𝟏 Activation Sigmoid Hyperbolic Tangent (tanh) Rectified Linear Unit (ReLU) Leaky ReLU Softmax Sigmoid Function The sigmoid function compresses inputs into the range (0, 1), making it ideal for binary classification to represent probabilities. It is defined as: The sigmoid function maps large negative inputs to values near 0 and large positive inputs to values near 1, making it ideal for output layers in binary models where the output is a probability. Sigmoid Function Example of Usage Use case: Binary classification (e.g., spam detection). Why use it: Sigmoid produces outputs as probabilities, ideal for logistic regression and binary classification in neural networks. Drawback: In deep networks, sigmoid can cause the vanishing gradient problem during backpropagation, slowing or halting learning. Hyperbolic Tangent (tanh) The tanh function maps inputs to the range (-1, 1), providing stronger negative and positive responses than sigmoid. It is defined as: This symmetric range improves input representation compared to sigmoid, especially in hidden layers, where both negative and positive activations aid in learning complex patterns. Hyperbolic Tangent (tanh) Example of Usage Use case: Classification and regression tasks in hidden layers. Why use it: Tanh's symmetric nature improves gradient flow, especially in deeper networks. It is preferred over sigmoid as it centers activations around zero, simplifying optimization. However, like sigmoid, tanh can still face the vanishing gradient issue in very deep networks. Rectified Linear Unit (ReLU) The ReLU function is the most commonly used activation function in deep learning today due to its simplicity and effectiveness. It is defined as: ReLU outputs the input if positive; otherwise, it outputs zero. This addresses the vanishing gradient issue in sigmoid and tanh, as positive values keep a gradient of 1, enabling faster and more effective training. Rectified Linear Unit (ReLU) Example of Usage Use case: Convolutional neural networks (CNNs) for image classification. Why use it: ReLU is computationally efficient, and its sparse activation improves generalization in deep networks by introducing non-linearity without added complexity. However, ReLU may encounter the "dying ReLU problem," where neurons get stuck outputting 0, becoming inactive. Leaky ReLU Leaky ReLU modifies the standard ReLU function by allowing a small, non-zero gradient for negative inputs. This prevents the "dying ReLU" issue by introducing a small slope for negative values: Where 𝜶 is a small constant (e.g., 0.01). By allowing small negative values, Leaky ReLU ensures that neurons never completely "die“. Leaky ReLU Example of Usage Use case: Networks that experience dead neurons with standard ReLU. Why use it: Prevents the "dying ReLU" problem and allows learning to continue even for negative inputs, ensuring the network's capacity to model more complex patterns. Softmax Function The softmax function is commonly used in the output layer for multiclass classification problems. It converts raw output scores (logits) into a probability distribution over multiple classes: The function ensures that the sum of the output probabilities is equal to 1, making it ideal for problems where the output must represent probabilities for mutually exclusive classes (e.g., in image classification). Softmax Function Example of Usage Use case: Multiclass classification (e.g., classifying digits in the MNIST dataset). Why use it: Softmax transforms raw predictions into probabilities, making it easier to interpret model outputs as belonging to specific classes. Choosing the Right Activation Function The choice of activation function depends on the layer type, network depth, and the specific problem being solved. Some general guidelines are: Output layer for binary classification: Sigmoid. Hidden layers in general: ReLU or Leaky ReLU. Output layer for multiclass classification: Softmax. Hidden layers requiring symmetric activation: Tanh. Understanding the advantages and limitations of each activation function allows for better performance tuning of neural networks. 03 Feed-forward Process Feed-forward Process Neural network training involves feed-forward, where input passes through the layers to produce predictions, and back-propagation, which adjusts weights based on prediction errors. Gradients are calculated and weights updated using methods like Gradient Descent or Adam to improve the model. Feed-forward Process Feed-forward Input Layer Hidden Layers Output Layer ………........................ Feed-forward Process The feed-forward process is the first step in training a neural network. Input data passes through each layer, from input to output, with each layer applying weights, biases, and an activation function. The computation for each layer is as follows: 𝒛 𝒍 is the weighted input to layer 𝒍 𝑾 𝒍 is the weight matrix of layer 𝒍 𝒂 𝒍−𝟏 represents the output (activation) from the previous layer, or the input data if it's the first layer 𝒃 𝒍 is the bias vector for layer 𝒍 ✓ The output of the final layer 𝒂 𝑳 , where 𝒇 is the activation function, which introduces non- 𝑳 is the number of layers, serves as the linearity into the model. model’s prediction. ✓ For classification problems, this is often a probability distribution over the output classes. The feed-forward process generates predictions, but learning occurs in back-propagation, where the model adjusts weights based on the error between predicted and actual outputs. Feed-forward Feed-forward Input Layer Hidden Layers Output Layer ………............ Loss............ Minimize the Loss 04 Loss Calculation Loss Functions The loss function measures the error between predicted output and the true target. Training aims to minimize this loss, improving predictions. Different tasks require different loss functions. Mean Squared Error (MSE) is widely used for regression tasks with continuous outputs. It measures the average squared differences between predicted and actual values and is defined as: 𝒚𝒊 is the true value, ෝ𝒊 is the predicted value, 𝒚 𝒏 is the number of data points (nodes). MSE penalizes larger errors more heavily, making it useful for models where minimizing significant deviations is crucial. Loss Functions Cross-Entropy Loss: is used in classification tasks where the output is a probability distribution over multiple classes. For binary classification, the cross-entropy loss is defined as: Where: 𝑦 is the true label (either 0 or 1). 𝑦ො is the predicted probability that the input belongs to class 1 (i.e., the output of the sigmoid function). If the true label 𝑦=1, the loss focuses on −log(ෝ 𝒚), which penalizes the model if 𝒚 ෝ (the predicted probability for class 1) is far from 1. If 𝑦 =0, the loss focuses on −log(1− 𝒚 ෝ is far from 0. ෝ), which penalizes the model if 𝒚 Loss Functions For multiclass classification, the Softmax cross-entropy loss is used: The true label 𝒚𝒊 is 1 for the correct class and 0 for the others. The loss only penalizes the log of the predicted probability for the correct class. The higher the predicted probability for the correct class, the lower the loss, and vice versa. Cross-entropy loss penalizes incorrect predictions more heavily, making it effective for classification problems.

AI & Machine Learning 2 PDF

Document Details

Tags

Related

Summary

Full Transcript