Deep Learning UNIT-1 PDF

**UNIT -1** **Introduction to Deep Learning:** Basics: Biological Neuron, Idea of computational units, McCulloch Pitts unit and Thresholding logic, Linear Perceptron, Perceptron Learning Algorithm, Linear separability. Convergence theorem for Perceptron Learning Algorithm. Example: ![](media/image2.png) Deep learning models are trained using large amounts of data and algorithms that are able to learn and improve over time, becoming more accurate as they process more data. This makes them well-suited to complex, real-world problems and enables them to learn and adapt to new situations. Deep learning is a branch of [machine learning](https://www.coursera.org/learn/machine-learning) that is made up of a neural network with three or more layers: - **Input layer: **Data enters through the input layer. - **Hidden layers: **Hidden layers process and transport data to other layers. - **Output layer: **The final result or prediction is made in the output layer. Neural networks attempt to model human learning by digesting and analyzing massive amounts of information, also known as training data. They perform a given task with that data repeatedly, improving in accuracy each time. It\'s similar to the way we study and practice to improve skills. **Types of Deep Learning** Deep learning encompasses various architectures, each suited to different types of tasks: 1. 1. 1. 1. 1. Biological Neuron: Neuron Definition ----------------- ***"Neurons are the fundamental unit of the nervous system specialized to transmit information to different parts of the body."*** - Neurons are the building blocks of the nervous system. - They receive and transmit signals to different parts of the body. This is carried out in both chemical and electrical forms. - There are several different types of neurons that facilitate the transmission of information. The neurons transmit the information in the following two areas: 1.different parts of the brain 2.the brain and the nervous system Thus, whatever we think, feel, and later do is all due to the working of the neurons. Neuron Structure ---------------- A neuron varies in shape and size depending on its function and location. All neurons have three different parts -- dendrites, cell body and axon. ![](media/image4.png) ### Parts of Neuron Following are the different parts of a neuron: #### Dendrites These are branch-like structures that receive messages from other neurons and allow the transmission of messages to the cell body. #### Cell Body Each neuron has a cell body with a nucleus, Golgi body, endoplasmic reticulum, [mitochondria](https://byjus.com/biology/mitochondria/) and other components. #### Axon Axon is a tube-like structure that carries electrical impulse from the cell body to the axon terminals that pass the impulse to another neuron. #### Synapse It is the chemical junction between the terminal of one neuron and the dendrites of another neuron. Ex:Schematic illustration of neuromorphic system. a) Biological model: the biological neuron receives inputs from other neurons by interconnected synapses. b) Equivalent electronic model: the electronic neuron for accumulating inputs generated by different preneurons through resistive switching memristor (RS) synapses to implement the functions of spiking neural network. Types of neural networks Deep Learning models are able to automatically learn features from the data, which makes them well-suited for tasks such as image recognition, speech recognition, and natural language processing. The most widely used architectures in deep learning are as follows : 1\. **Feedforward neural networks (FNNs**) are the simplest type of ANN, with a linear flow of information through the network. FNNs have been widely used for tasks such as image classification, speech recognition, and natural language processing. 2\. **Convolutional neural networks (CNNs)** are specifically for image and video recognition tasks.CNNs are able to automatically learn features from the images, which makes them well-suited for tasks such as image classification, object detection, and image segmentation. 3\. **Recurrent neural networks (RNNs)** are a type of neural network that is able to process sequential data, such as time series and natural language. RNNs are able to maintain an internal state that captures information about the previous inputs, which makes them well-suited for tasks such as speech recognition, natural language processing, and language translation. **Deep Learning Applications** The main applications of deep learning AI can be divided into computer vision, natural language processing (NLP), and reinforcement learning. **1. Computer vision** The first Deep Learning application is Computer vision. In computer vision Deep learning AI models can enable machines to identify and understand visual data. Some of the main applications of deep learning in computer vision include: **Object detection and recognition:** Deep learning models can be used to identify and locate objects within images and videos, making it possible for machines to perform tasks such as self-driving cars, surveillance, and robotics. - **Image classification:** Deep learning models can be used to classify images into categories such as animals, plants, and buildings. This is used in applications such as medical imaging, quality control, and image retrieval. **Image segmentation:** Deep learning models can be used for image segmentation into different regions, making it possible to identify specific features within images. ![](media/image6.png) **2. Natural language processing (NLP)** In Deep learning applications, the second application is NLP. NLP the Deep learning model can enable machines to understand and generate human language. Some of the main applications of deep learning in NLP include: **Automatic Text Generation** -- Deep learning models can learn the corpus of text and new text like summaries, essays can be automatically generated using these trained models. **Language translation:** Deep learning models can translate text from one language to another,making it possible to communicate with people from different linguistic backgrounds. **Sentiment analysis:** Deep learning models can analyse the sentiment of a piece of text, making it possible to determine whether the text is positive, negative, or neutral. This is used in applications such as customer service, social media monitoring, and political analysis. **Speech recognition:** Deep learning models can recognize and transcribe spoken words, making it possible to perform tasks such as speech-to-text conversion, voice search, and voice-controlled devices. **3. Reinforcement learning:** In reinforcement learning deep learning works as training agents to take action in an environment to maximize a reward. Some of the main applications of deep learning in reinforcement learning include: ** Game playing:** Deep reinforcement learning models have been able to beat human experts at games such as Go, Chess, and Atari. ** Robotics:** Deep reinforcement learning models can be used to train robots to perform complex tasks such as grasping objects, navigation, and manipulation. **Control systems:** Deep reinforcement learning models can be used to control complex systems such as power grids, traffic management, and supply chain optimization. **Differences between Machine Learning(ML) & Deep Learning(DL):** **Idea of computational unit** **McCulloch Pitts unit:** The first computational model of a neuron was proposed by Warren MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943. ![https://miro.medium.com/v2/resize:fit:461/1\*fDHlg9iNo0LLK4czQqqO9A.png](media/image8.png) It may be divided into 2 parts. The first part, ***g ***takes an input (ahem dendrite ahem), performs an aggregation and based on the aggregated value the second part, ***f*** makes a decision. We can see that ***g*(x)** is just doing a sum of the inputs --- a simple aggregation. And ***theta*** here is called thresholding parameter. For example, if I always watch the game when the sum turns out to be 2 or more, the ***theta* **is 2 here. This is called the Thresholding Logic. **Linear Perceptron:** - A **linear perceptron** is a fundamental type of artificial neural network used for binary classification tasks in deep learning. - Introduced by Frank Rosenblatt in 1957. It is primarily used for **binary classification**. - It is particularly good at learning **linearly separable patterns** - It is the simplest type of feed forward neural network, consisting of a single layer of input nodes that are fully connected to a layer of output nodes. - It consists of a single layer of neurons that directly map input features to output predictions through a set of weights and an activation function. Here's an overview of its components and working: **Architecture of Linear Perceptron:** **Input Features:** The perceptron takes multiple input features, each input feature represents a characteristic or attribute of the input data. The perceptron takes multiple input features: X={X1,X2,...... ,Xn} **Weights:** Each input feature is associated with a weight, determining the significance of each input feature in influencing the perceptron's output. During training, these weights are adjusted to learn the optimal values. Each input is associated with a weight: W={W1,W2,........... ,Wn} **Bias (b)**: A constant added to the weighted sum to allow flexibility in the activation threshold. **Summation Function**: The perceptron calculates the weighted sum of inputs: ![](media/image10.png) **Activation Function**: The output of the summation is passed through an activation function. In the classic perceptron model, a step function is used: The output y is binary (0 or 1), corresponding to the predicted class. However, modern implementations often use continuous activation functions like sigmoid or ReLU for smoother optimization. **Training a Linear Perceptron** - The **perceptron learning algorithm** updates weights iteratively to minimize misclassification errors. The weight update rule is: ![](media/image12.png) where: - η is the learning rate. controlling how much the weights are adjusted. - y~true~ is the actual value. - y~pred~\_is the predicted value. Ex: Here's a simple example of a single-layer perceptron used to solve a **binary classification problem(AND function)**: ![](media/image14.png) ![](media/image16.png) ![](media/image18.png) ![](media/image20.png) ![](media/image22.png) **Applications** - Basic pattern recognition tasks. - Simple binary classification problems. **Activation Functions in Deep Learning:** In artificial neural networks(ANN), the activation function helps us to determine the output of Neural Network. They decide whether the neuron should be activated or not. It determines the output of a model, its accuracy, and computational efficiency. ![](media/image24.png) Inputs are fed into the neurons in the input layer. Inputs(Xi) are then multiplied with their weights(Wi) and add the bias gives the output(y=(Xi\*Wi)+b) of the neuron. We apply our activation function on **Y** then it is transfer to the next layer. Properties that Activation function should hold? ================================================ ***Derivative or Differential: **Change in y-axis w.r.t. change in x-axis.It is also known as slope.(Back prop)* ***Monotonic function:** A function which is either entirely non-increasing or non-decreasing.* **Most popular activation functions** ===================================== Several different types of activation functions are used in Deep Learning. Some of them are explained below: **1)Step Function(Heaviside step Function):** Function is one of the simplest kind of activation functions. In this, we consider a threshold value and if the value of net input say y is greater than the threshold then the neuron is activated. Mathematically, Given below is the graphical representation of step function. ![Lightbox](media/image26.png) 2)Sigmoid function(**Logistic function**) :It is one of the commonly used activation function. The **Sigmoid** function was introduced to ANN in the 1990s to replace the **Step** function. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Graphically ![Lightbox](media/image28.png) In the sigmoid function, we can see that its output is in the open interval (0,1) - - - - - Whenever we try to find the derivative of sigmoid in back propagation the value ranges between 0-0.25. Every activation function has its own pros and cons, sigmoid is not an exception. **Pros**: - - **Cons**: - - - - Hyperbolic Tangent**(**Tanh) similar to sigmoid with slight variation. Basically you can say it overcomes problem which is present in sigmoid. - - - - **Pros:** - - - **Tanh** tends to make each layer's output more or less centered around 0 and this often helps speed up convergence. Since, sigmoid and tanh are almost similar they also faces the same problem. **Cons**: - - It is used for the hidden layer in binary classification problem while sigmoid function is used in the output layer. ![](media/image31.png) **4)ReLU(* Rectified Linear Units*) Activation Function:** This the most popular activation function in Deep learning(used in hidden layer). ![](media/image33.png) If the input is negative the function returns 0, but for any positive input, it returns that value back. Lightbox - - **Pros:** - - - **Cons:** - - - Softmax function calculates the probabilities distribution of the event over 'n' different events. In general way of saying, this function will calculate the probabilities of each target class over all possible target classes(which helps for determining the target class). ![](media/image35.png) Softmax ensures that smaller values have a smaller probability and will not be discarded directly. It is a "max" that is "soft". It returns the probability for a datapoint belonging to each individual class. Note that the sum of all the values is 1. *Softmax can be described as the combination of multiple sigmoid function.* Softmax normalizes an input value into a vector of values that follows a probability distribution whose total sums up to 1. The output values are between the range \[0,1\] which is nice because we are able to avoid binary classification and accommodate as many classes or dimensions in our neural network model. For the multiclass problem, the output layer will have as many neuron as the target class. Typically it is used on output layer for NN to classify inputs into multiple categories. For example, suppose you have 4 class\[A,B,C,D\]. There would be 4 neurons in the output layer. Assume you got output from the neurons as\[2.5,5.7,1.6,4.3\]. After applying softmax function you will get \[0.26,0.14,0.41,0.19\]. These represent the probability for the data point belonging to each class. By seeing the probability value we can say input belong to class C. **Perceptron Learning Algorithm:** Perceptron Learning Algorithm Our goal is to find the w vector that can perfectly classify positive inputs and negative inputs in our data. ![](media/image37.png) **Dot Product Of Two Vectors**: a dot product is -Imagine you have two vectors of size n+1, w and x, the dot product of these vectors (w.x) could be computed as follows: The transpose is just to write it in a matrix multiplication form. **OR Function Using A Perceptron** ![](media/image39.png) **Linear Separability:** - Linear Separability refers to the data points in binary classification problems which can be separated using linear decision boundary. - if the data points can be separated using a line, linear function, or flat hyperplane are considered linearly separable. - - Linear Separability implies the existence of a hyperplane separating the two classes. For example, consider a dataset with two features x1 and x2 in which the points (−1, −1),(1, 1), (−3, −3),(4, 4) belong to one class and (−1, 1),(1, −1),(−5, 2),(4, −8) belong to the other. This data set is not linearly separable because there is no linear function that can separate class1 and class2. Let's consider 1-dimensional representation z in terms of x1 and x2 such that the dataset is linearly separable in terms of 1-dimensional representation corresponding to z. Defining z = x1x2, given data can be represented 1-dimensional and be linearly separable. Using this mapping, points of class 1 (−1, −1),(1, 1),(−3, −3),(4, 4) become (1), (1), (9), (6) and points of class2 (−1, 1),(1, −1),(−5, 2),(4, −8) become (-1), (-1), (-10), (-32). ![](media/image43.png) As you can see in the above plot, now z = 0 separates class1 and class2. A non-linearly separable data can be represented as a linearly separable form after applying a non-linear transformation(z = x1x2 in this example, resulting in data represented as 1D) EX 2: **no line can separate linearly inseparable 2D data:** ![](media/image45.png) **Methods for checking linear separability:** 1. 2. 3. 4. **Checking Linear separability** - - - - **Convergence theorem for Perceptron Learning Algorithm** The **Convergence Theorem for the Perceptron Learning Algorithm** states that if a given dataset is linearly separable, the perceptron learning algorithm will converge to a solution in a finite number of iterations. Here\'s a more formal statement: then the Perceptron Learning Algorithm will converge to a separating hyperplane (w,b) in a finite number of iterations. **Key Points:** 1. **Linear Separability**: The theorem requires that the data is linearly separable. If this condition is not met, the algorithm will not converge. 2. **Finite Iterations**: The number of iterations required for convergence is bounded by: **Limitations:** - If the data is not linearly separable, the algorithm oscillates indefinitely. - In practice, the perceptron struggles with noise or overlapping classes. This theorem highlights the importance of linear separability for the perceptron and provides the theoretical foundation for its convergence.

Deep Learning UNIT-1 PDF

Document Details

Tags

Related

Summary

Full Transcript