6CS012: Intro to Deep Learning & Neural Networks PDF
Document Details

Uploaded by RestfulHonor2094
Herald College Kathmandu, University of Wolverhampton
2025
Siman Giri
Tags
Related
Summary
Lecture 03 is an introduction to Deep Learning and Neural Networks, covering the transition from machine learning to deep learning, artificial neural network architectures, and early models of human cognition. It explores logistic regression, challenges in feature extraction, and the use of neural networks for complex tasks such as image and text processing.
Full Transcript
6CS012 – Artificial Intelligence and Machine Learning. Lecture – 03 From Machine to Deep Learning An Introduction to Artificial Neural Network. Siman Giri {Module Leader – 6CS012} Le...
6CS012 – Artificial Intelligence and Machine Learning. Lecture – 03 From Machine to Deep Learning An Introduction to Artificial Neural Network. Siman Giri {Module Leader – 6CS012} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 1 Network. Learning Objective, Discuss on the limitations of Logistic Regression for image classification Task? A brief history of neural networks. Early Models and their limitations. Introduction to Modern Neural Networks. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 2 Network. 1. Machine → Deep Learning? Why? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 3 Network. 1.1 Last week : Logistic Regression for Classification. We implemented Logistic Regression using ERM Framework: Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 4 Network. 1.1.1 Last week : Dissecting Logistic Regression. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 5 Network. 1.1.2 Last week : Application. We use Softmax Regression and Build: “MNIST Handwritten Digit Classification.” Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 6 Network. 1.2 Our Observation – 1: Non-Linear Decision Boundary: Logistic Regression{Softmax Regression} was better on separating a data in classes/label those were linearly separable. What for data where classes/label are not linearly separable? {In ML there exists more advance algorithm for example DT, SVM etc. but not suitable for unstructured dataset like images.} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 7 Network. 1.3 Our Observation – 2: Extraction of Features: When we extracted only pixel values, we got the csv file with 784 columns i.e. very high dimensions , and our dataset was only of the size of 28 × 28. Imagine for larger images, how big our dataset’s dimension may be. We can use feature extraction to reduce some dimension. What is the challenge? Why Feature extraction is a Challenge? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 8 Network. 1.4 Machine Learning Vs. Deep Learning. Machine Learning: One crucial aspect of machine learning approaches to solving problems is that human and often undervalued engineering plays an important role. A human still has to frame the problem, acquire and organize data, design a space of possible solutions, select a learning algorithm and its parameters, apply the algorithm to the data, validate the resulting solution to decide whether it’s good enough to use, etc. These steps are of great importance. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 9 Network. 1.4.1 Why Feature Extraction is a Limitations? Manual Effort & Expertise Required: Traditional methods rely on handcrafted features such as edge detectors , texture descriptors , or color histograms. Engineers and researchers must experiment with different feature extraction techniques to find the best ones for specific tasks. Suboptimal Performance: Handcrafted features may not always capture the most relevant information in complex images. They are often designed for specific datasets and may not generalize well across different conditions (e.g., lighting variations, occlusions). Curse of Dimensionality: Extracted features can be high-dimensional, making it computationally expensive to process and requiring dimensionality reduction techniques like PCA. Irrelevant or redundant features can degrade model performance. Lack of Adaptability: Features designed for one problem (e.g., object classification) might not be suitable for another (e.g., object detection or segmentation). Traditional feature extraction does not automatically adapt to variations in image quality, background, or perspective. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 10 Network. 1.4 Machine Learning Vs. Deep Learning. Can we learn the underlying features directly from data? Yes, Deep Learning. { We will talk more on this as we move forward with more complex deep learning algorithms.} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 11 Network. 1.5 Modern Breakthrough of Deep Learning. Deep learning have become one of the main approaches to AI They have been successfully applied to various pattern recognition, prediction, and analysis problems In many problems they have established the state of the art Often exceeding previous benchmarks by large margins Sometimes solving problems, you couldn’t solve using earlier ML methods Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 12 Network. 1.6 The Common Elements in above Breakthrough. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 13 Network. 1.7 Neural Networks are Taking Over. What are Neural Networks? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 14 Network. 1.7 Neural Networks are Taking Over. What are Neural Networks? It is said to be inspired by “human ability of thinking”. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 15 Network. 2. Introduction to Neural Networks. {From Biological Inspiration to Computational Models.} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 16 Network. 2.1 Way before Neural network … Humans have the remarkable ability to: Learn and Solve problems Recognize patterns and Memorize Create Think deeply but how exactly do humans function or think? Can we emulate i.e. put into machine? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 17 Network. 2.2 Early Models of Human Cognition. Associationism (– 400 BC – 1900 AD by Aristotle, Plato, David Hume, Ivan Pavlov, …) 360 BC: Aristotle wrote: "Hence, too, it is that we hunt through the mental train, excogitating from the present or some other, and from similar or contrary or coadjacent. Through this process reminiscence takes place. For the movements are, in these cases, sometimes at the same time, sometimes parts of the same whole, so that the subsequent movement is already more than half accomplished.” In English: We memorize and rationalize through association. Theory of Associationism: “Learning is a mental process that forms associations between temporally related phenomena.” Example: “hey here’s a bolt of lightning, we’re going to hear thunder” Challenge: But where are the associations stored and how? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 18 Network. 2.3 Beginning the Era of Connectionism. {Cautions: we have lots of historical development that happened and will discuss major events that happened in Mid 1800s.} Alexander Bain, in 1873 in his work “Mind and Body” proposed that “The information is in the connections” suggesting: The brain is a mass of interconnected neurons. He floated the idea: The brain is a network of neurons, and collection of neurons excites and simulate each other and the memory are stored in the connections. different combinations of inputs can result in different outputs. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 19 Network. 2.3.1 Connectionist Machines Neural networks are connectionist machines as opposed to Von – Neumann Machines. The machine has many non-linear processing units The program is the connections between these units Connections may also define memory Alan Turing’s Connectionist Model - 1948 But what are these independent processing units? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 20 Network. 2.3.2 Connectionist Machines: Warning Current deep learning algorithms emulate neural networks but run on Von Neumann machines, which have a fundamentally different architecture from biological brains. This creates both strengths and limitations in modern AI systems. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 21 Network. 2.4 Modelling the Brain: Neurons. What are the units? Answer: A neuron. The structure of Biological Neurons {major elements}: Dendrite: receives signals from other neurons Synapse: point of connection to other neurons Soma: process the information Axon: transmits the output of this neuron How neurons transmits the information? There are almost billions neurons. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 22 Network. 2.5 Modelling the Brain: Connectionist. 3/7/2025 23 2.6 Biological Neurons to Computational Model. Observations from working of Neurons: Interconnected network of hundreds to thousands of neurons works in parallel For neurons to activate or fire or spike certain threshold must be passed Inputs are received and can be of: Excitatory inputs/Synapses: Transmits input to the neuron. Inhibitory inputs/Synapses: Any signals from an inhibitory inputs prevents neuron from firing, regardless of other inputs. 3/7/2025 Deep Learning and Artificial Network. 24 2.6.1 Biological Neurons to Computational Model. Neurons as Biological Computational devices: How can we model the single neuron based on the above observation? A Neuron can either fire or not-fire, thus, output of neuron can be binary i.e. 0 or 1 As output can only be 0 or 1, if we can make collection of input also be binary, we can create a threshold function using a logic operator. Our First computational model was based on above thought and was called McCulloch & Pitts Neuron. Fig: A Symbolic representation of “How we want our Neuron to be?” 3/7/2025 Deep Learning and Artificial Network. 25 3. McCulloh and Pitts Neuron. {Earlier Computational or Mathematical Representations of Neuron.} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 26 Network. 3.1 The McCulloch – Pitts Artificial Neuron. The first computational model of a neuron was proposed by Warren MuCulloch (neuroscientist) and Walter Pitts (logician) in 1943 in their paper titled: “Logical Calculus of Ideas Immanent in Nervous Activity (1943).” Based on there thought and previous work on neural nets: on what was the fundamentals elements to represent computation in biological neurons they proposed a model also known as linear threshold gate or threshold logic units or MCP neurons. Modeled the neurons of the brain (and the brain itself) as performing propositional logic, where each neuron evaluates the truth value of its input (propositions) Effectively Boolean logic aka synaptic model Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 27 Network. 3.2 An MCP Neuron. Excitatory synapse: Transmits weighted input to the neuron Inhibitory synapse: Any signal from an inhibitory synapse prevents neuron from firing The activity of any inhibitory synapse absolutely prevents excitation of the neuron at that time. Regardless of other inputs Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 28 Network. 3.2.1 Computational Model of MCP Neuron. Mathematical Formalization of MCP. MCP describes the activity of single neuron with two states: firing(1) or not firing(0). We will use this notation for easy representations. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 29 Network. 3.3 MCP Neuron and Boolean Operations. For emulating AND Functions: The AND function is "activated" only when all the incoming inputs are "on", this is, it outputs a 1 only when all inputs are 1. What will be my Threshold Function? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 30 Network. 3.3 MCP Neuron and Boolean Operations. For OR Function: The OR function is "activated" when at least one of the incoming inputs is "on". What will be my Threshold Function? Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 31 Network. 3.3 MCP Neuron and Boolean Operations. For XOR Function: The XOR (Exclusive OR) function is activated (outputs "on" or 1) when exactly one of the incoming inputs is "on" (1), but not both. What will be my Threshold Function? The phenomenon also called “XOR Problem” or “XOR Realizations”. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 32 Network. 3.4 Limitation of MCP Neurons. A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable. Linear separability (for boolean functions) : There exists a line (plane) such that all inputs which produce a 1 lie on one side of the line (plane) and all inputs which produce a 0 lie on other side of the line (plane) The MCP Neuron Architecture lacks several characteristics of biological networks: Complex connectivity patterns i.e. represents single neuron only Processing of continuous values A measure of importance A learning procedure Regardless of limitation, MCP is considered a significant first step towards development theory of “Mind and Body” laying the foundations for Study of various learning mechanisms and paved the way for advances in artificial intelligence, including the invention of the perceptron and more complex model. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 33 Network. 4. Towards a Better Model. {Introduction to Perceptron and Perceptron Learning Theory.} Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 34 Network. 4.1 The Perceptron. Frank Rosenblatt Psychologist, Logician “ Inventor of the solution to everything, aka the Perceptron (1958) ”. “the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence,” New York Times (8 July) 1958. “Frankenstein Monster Designed by Navy That Thinks,” Tulsa, Oklahoma Times 1958. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 35 Network. 4.2 Idea Behind The Perceptron Model. The Perceptron, introduced by Frank Rosenblatt and later corrected by Minsky and Papert, extends the McCulloch-Pitts neuron by incorporating learnable numerical weights and an adaptive learning process. It serves as a Linear Binary Classifier, dividing data into two categories based on a linear decision boundary. A perceptron makes a decision based on a weighted sum of inputs: σ𝐧𝐢=𝟏 𝐰𝐢 𝐱 𝐢 ≥ 𝐓 here: 𝐰𝐢 → 𝐥𝐞𝐚𝐫𝐧𝐚𝐛𝐥𝐞 𝐰𝐞𝐢𝐠𝐡𝐭𝐬. 𝐱 𝐢 → 𝐢𝐧𝐩𝐮𝐭𝐬. 𝐓 → 𝐥𝐞𝐚𝐫𝐧𝐚𝐛𝐥𝐞 𝐓𝐡𝐫𝐞𝐬𝐡𝐨𝐥𝐝𝐬. Let’s say I do not want to prefixed my threshold, instead learn along with weights, then we can re-write the equation as: 𝟏 𝐢𝐟 𝐰𝐨 + 𝐰𝐢 𝐱 𝐢 ≥ 𝟎 𝐲ො = ൜ 𝐫𝐞𝐩𝐥𝐚𝐜𝐞 → 𝐓 = − 𝐰𝐨 𝟎 𝐢𝐟 𝐰𝐨 + 𝐰𝐢 𝐱 𝐢 < 𝟎 Reminds you of something. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 36 Network. 4.3 Simplified Computational Representation of Perceptron. Mathematical Formulation: A perceptron takes a set of inputs 𝑥1 , 𝑥2 , … , 𝑥𝑛 , assigns them weights 𝑤1 , 𝑤2 , … , 𝑤𝑛 and computes a weighted sum: 𝐳 = σ𝐧𝐢=𝟏 𝐰𝐢 𝐱 𝐢 + 𝐰𝐨 here: wi → weights learned during training. wo → bias also learned during training. adjusts the decision boundary z → net weighted input. The perceptron model then applies a threshold activation function (aka step function) on 𝐳 (𝐧𝐞𝐭 𝐰𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐢𝐧𝐩𝐮𝐭) given by: 𝟏 𝐢𝐟 𝐳 ≥ 𝟎 𝐟 𝐳 =ቊ 𝟎 𝐢𝐟 𝐳 < 𝟎 This threshold function determines whether the perceptron activates (outputs 1) or remains inactive (outputs 0). 3/7/2025 37 4.4 Perceptron Learning Algorithm. Input: Update the weights if misclassified 𝐲 ≠ 𝐲ො : Training Dataset 𝓓 = 𝐱 𝐢 , 𝐲𝐢 where: 𝐰𝐢 = 𝐰𝐢 + 𝛈 𝐲 − 𝐲ො 𝐱 𝐢 𝐱 𝐢 = 𝐢𝐧𝐩𝐮𝐭 𝐟𝐞𝐚𝐭𝐮𝐫𝐞 𝐯𝐞𝐜𝐭𝐨𝐫 𝐰𝐨 = 𝐰𝐨 + 𝛈 𝐲 − 𝐲ො 𝐲𝐢 = 𝐜𝐥𝐚𝐬𝐬 𝐥𝐚𝐛𝐞𝐥 ∈ {𝟎, 𝟏} Since both 𝐲 𝐚𝐧𝐝 𝐲ො can take only binary values, the Learning rate: 𝛈 = 𝟎. 𝟎𝟏 𝑜𝑟 small positive value term 𝐲 − 𝐲ො can be interpreted as: Number of iterations. 𝐰𝐡𝐞𝐧 𝐲 − 𝐲ො = 𝟏 → the weights are increased to push towards correct classification. Steps: 𝐰𝐡𝐞𝐧 𝐲 − 𝐲ො = − 𝟏 → the weights are decreased to 1. Initialize weights randomly: correct the misclassification. Assign small random values to 𝐰𝐡𝐞𝐧 𝐲 − 𝐲ො = 𝟎 → no update is needed. 𝐰 = (𝐰𝟏 , 𝐰𝟐 , … 𝐰𝐧 𝐚𝐧𝐝 𝐰𝐨 ) Stop if no updates occur in an epoch (i.e. convergence). 2. Repeat for each epoch until convergence: Output: For each training sample 𝐱 𝐢 , 𝐲𝐢 : Learned weights. Compute the weighted sum: 𝐳 = σ𝐰𝐢 𝐱 𝐢 + 𝐰𝐨 Apply the activation function (step function): 𝟏 𝐢𝐟 𝐳 ≥ 𝟎 𝐲ො = ቊ 𝟎 𝐢𝐟 𝐳 < 𝟎 Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 38 Network. 4.4 Perceptron Learning Algorithm. Putting it all together: 3/7/2025 39 4.5 Perceptron for learning “OR” Function. Looks like It works for OR Function. Let’s try for XOR function. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 40 Network. 4.5.1 Perceptron for learning “XOR” Function. There does not exist any possible values for 𝐰𝐢 𝐚𝐧𝐝 𝐰𝐨 that can satisfy all above equation. If we forced to learn using perceptron learning algorithm it may not converge or will terminate with very high error. This Phenomenon is called Minsky and Papert Correction. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 41 Network. 4.5.2 Minsky and Papert Correction. Works for linearly separable? XOR is still a problem In the book published by Minsky and Papert in 1969, the authors implied that, “since a single artificial neuron is incapable of implementing some functions such as the XOR logical function, larger networks also have similar limitations, and therefore should be dropped. Later research on three-layered perceptrons showed how to implement such functions, therefore saving the technique from obliteration.” The conclusion from Minsky and Papert Correction: Individual elements are weak computational elements. Network of elements are required. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 42 Network. 4.6 Solving the XOR Problem. A multi layer perceptron could solve for XOR problem: As it can compose arbitrarily complicated Boolean functions! In cognitive terms: Can compute arbitrary Boolean functions over sensory input We will verify this in Tutorial with some exercise so do not miss the Tutorial. More on this in the next week Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 43 Network. 4.7 Perceptron for Real Value Inputs. Step function is unsuitable for real-valued input due to non-differentiability and binary output, which hinder learning and fine-tuning. The step function may not capture the subtleties of the data; it creates a sharp boundary at 0, meaning that all positive sums are classified as class 1 and all negative sums are classified as class 0. However, there may be cases where net negative values also belong to class 1. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 44 Network. 4.8 Sigmoid Neurons. Sigmoid neurons are a type of artificial neuron that uses the sigmoid “activation” function to model the output of the neuron. The sigmoid function is a smooth, S-shaped curve that maps any real-valued input to a value between 0 and 1, making it especially useful for problems that involve probabilities or binary classification. The sigmoid activation function is defined as: 𝟏 𝛔 𝐳 = 𝟏+𝐞−𝐳 here: 𝐳 = 𝐰𝐨 + 𝐰𝐢 𝐱 𝐢 → the weighted sum of the inputs Fig: A single unit of Standard Perceptron with sigmoid. to the neuron. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 45 Network. 4.9 The activation function In General, Activation Functions are group of function that introduces non-linearity to the output of neurons. The activation function does the non-linear transformation to the input, making it capable to learn and perform more complex tasks. The activation function must be differentiable or the concept of updating weights (Gradient Descent and Backpropagation) fails, which is the core idea of deep learning. “There are more kind of activation function in use, we will discuss as required in upcoming classes.” Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 46 Network. The – End. Lec -03- Getting Started with Deep Learning and Neural 3/7/2025 47 Network.