Slideshow2.pdf

LING 298—Slideshow 2 Concordia 2024 Alan Bale, ChatGPT-4 Concordia University, Somewhere & Nowhere 1 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 2 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 3 Artificial Neural Network (ANN) A computational model which is inspired by the structure of the human brain, consisting of layers of interconnected nodes (neurons). Purpose: Used to recognize patterns, classify data, and make predictions based on input data. 4 Structure: Nodes and Connections 1. Nodes/Neurons: Basic units that perform computations, organized into layers (input, hidden, output). 2. Connections: Links between neurons, each associated with a weight that determines the strength of the connection. 5 Structure: Layers 1. Layers: a. Input Layer: Receives raw data. b. Hidden Layers: Intermediate layers where data is processed. c. Output Layer: Produces the final output or prediction. 6 Learning Process 1. Forward Propagation: Input data is passed through the network, layer by layer, and predictions are made. 2. Backpropagation: The process of adjusting weights to minimize error by propagating the error backward through the network. 7 Learning Process (cont.) 1. Activation Functions: Introduce non-linearity, allowing the network to learn complex patterns. 2. Training: Involves repeatedly adjusting weights using gradient descent to minimize a loss function. 8 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 9 Fully Connected Neural Network Input Hidden Output Layer Layer Layer 10 Input Layer Input Hidden Output Layer Layer Layer Serves as the network’s interface with the external environment. Receives raw data as numbers which represent various types of input (e.g., pixels of an image, words in a sentence, etc.). Passes this data either directly to an output layer, or into an intermediate layer for further processing. 11 Hidden Layer Input Hidden Output Layer Layer Layer Performs intermediate computations and transformations. Extracts and combines features from input data which aid in pattern recognition. More hidden layers allow for more complex representations, more complex transformations, and more complex pattern recognitions. 12 Output Layer Input Hidden Output Layer Layer Layer Produces the final predictions or classifications based on the processed input. Translates the network’s internal processing into a usable representation (e.g., identifying an image, generating the next word in a sequence, etc.). 13 Bias Input Hidden Output Layer Layer Layer Bias: ▶ A special type of weight added to each neuron to shift the activation function. Bias ▶ Helps the network learn more complex Bias Bias data patterns. Bias Bias ▶ Note: To eliminate clutter, these slides and future slides will often omit the Bias contributions of biases, however this should not be taken as a sign that their contributions are not significant. 14 Weighted Connections Input Hidden Output Layer Layer Layer Weights: ▶ Numerical values assigned to connections between nodes. ▶ Determine the importance of the input to a neuron. ▶ Example: Suppose the input to neuron A is 2 and its connection to neuron B is weighed as 3, then the overall contribution to the neuron B would be 6. 15 Weighted Connections Input Hidden Output Layer Layer Layer Weights: ▶ Numerical values assigned to connections between nodes. ▶ Determine the importance of the input to a neuron. ▶ Example: Suppose the input to neuron A is 2 and its connection to neuron B is weighed as -3, then the overall contribution to the neuron B would be -12. 15 Weight Adjustment Input Hidden Output Layer Layer Layer Weight Adjustment: ▶ Weights are updated during training to reduce errors in the network’s predictions. ▶ The adjustment process is guided by an algorithm that adjusts the weights by computing the slope (gradient) of a loss function, aiming to optimize the network’s performance. 16 Weight Adjustment Input Hidden Output Layer Layer Layer Weight Adjustment: ▶ Weights are updated during training to reduce errors in the network’s predictions. ▶ The adjustment process is guided by an algorithm that adjusts the weights by computing the slope (gradient) of a loss function, aiming to optimize the network’s performance. 16 Weight Adjustment Input Hidden Output Layer Layer Layer Weight Adjustment: ▶ Weights are updated during training to reduce errors in the network’s predictions. ▶ The adjustment process is guided by an algorithm that adjusts the weights by computing the slope (gradient) of a loss function, aiming to optimize the network’s performance. 16 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 17 Math Warning! There is a bunch of math in the next few slides. Do not be intimidated. You will never have to know the math. You don’t even have to understand the math. It is there mostly to remind me how things work so that I can then explain them to you. 18 Linear Transformations 1. Linear Combination: Each neuron calculates a weighted sum of its inputs. Basic formula: y = w · x + b, where: ▶ w = weights, ▶ x = input, ▶ b = bias. ▶ Example: if inputs x1 = 4 and x2 = 5, and weights w1 = 2 and w2 = 3 where the bias is −10, then activation of the neuron would be [(4 × 2) + (3 × 5)] + (−10), which equals 8 + 15 − 10 = 13. 2. Purpose: Enables the network to combine features and create new, more abstract features. 19 Non-linear Transformations 1. Activation Functions: Apply non-linearity to the output of a neuron. Allow the network to learn and represent complex patterns. Common functions include: a. ReLU (Rectified Linear Unit): Outputs the input if positive, otherwise zero. b. Sigmoid: Squashes the input to a range between 0 and 1. c. Tanh: Squashes the input to a range between -1 and 1. d. Softmax: Converts logits into probabilities for multi-class classification. 20 ReLU Activation Function ReLU(x) Rectified Linear Unit (ReLU): ReLU(x) = x ▶ The ReLU function is defined as: ReLU(x) = max(0, x) ReLU(x) = 0 x 21 ReLU Activation Function ReLU(x) Behavior: ReLU(x) = x ▶ Outputs the input directly if it is positive. ▶ Outputs zero for any negative input. ReLU(x) = 0 x 22 ReLU Activation Function ReLU(x) Example: ReLU(x) = x ▶ For x = 3, ReLU(3) = 3. ▶ For x = −2, ReLU(-2) = 0. ReLU(x) = 0 x 23 Softmax Activation Function Probability 0.71 Imagine a three-node output (see model 0.26 to the right), with raw score activations 0.03 of 2 for the first node, 1 for the second Item 1 Item 2 Item 3 Output Items node and 0.1 for the third node. These raw scores are sometimes called logits. Input Hidden Output Layer Layer Layer Softmax Function: ▶ Converts a vector of raw scores (logits) into probabilities. 24 Softmax Activation Function Probability 0.71 Softmax Function (cont.): 0.26 0.03 ▶ The function is defined as: Item 1 Item 2 Item 3 Output Items e zi Softmax(zi ) = PK Input Hidden Output j=1 e zj Layer Layer Layer where zi is the i-th element of the input vector, and K is the number of classes. 25 Softmax Activation Function Probability 0.71 Softmax Function (cont.): ▶ Example: 0.26 Given the logits z = [2, 1, 0.1]: 0.03 Item 1 Item 2 Item 3 Output Items e2 Softmax(z1 ) = e2 +e1 +e0.1 ≈ Input Hidden Output 0.71 Layer Layer Layer 1 e Softmax(z2 ) = e2 +e1 +e0.1 ≈ 0.26 e0.1 Softmax(z3 ) = e2 +e1 +e0.1 ≈ 0.03 26 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 27 Inefficient Tinkering 1. Individual Weight Adjustments: Imagine adjusting each weight between nodes one at a time. Either raise it a little, or lower it a little. Adjustment Strategy: If the adjustment results in less error, keep it. Otherwise reject it. Repeat, Repeat, Repeat... 2. Benefits and Problems: Benefit: Will reduce the error over time. Problem 1: Will take too much time, especially if there are billions of weights. Problem 2: Involves a lot of guess work, which is inefficient. 28 Backpropagation Automated Weight Adjustment: ▶ A systematic method to update weights based on errors. ▶ Ensures a more consistent and efficient learning process. 29 Error Calculation Error Calculation: ▶ The difference between the predicted output and the actual output is computed using a loss function. ▶ Common loss functions include Mean Squared Error (MSE) and Cross-Entropy. ▶ The goal is to minimize the loss function over time. 30 Gradient Descent Gradient Descent: ▶ Weights are updated in the direction that reduces the error, guided by the gradient of the loss function. ▶ The gradient indicates the steepness of the error surface. 31 Gradient Descent in 2D f (x) = Error Start x = weight Minimum 32 Gradient Descent in 3D f (x, y ) = Error Start 5 0 −2 Minimum 0 2 1 0 −1 2 −2 x = weight 1 y = weight 2 33 Learning Rate Learning Rate: ▶ A critical hyperparameter that needs tuning. ▶ Basically determines the “size” of the “steps” down the slope (i.e., how big of an adjustment to the weights). ▶ Too large can cause overshooting, too small can cause slow convergence. ▶ Adaptive learning rates like Adam optimize this automatically. 34 Propagation of Error Error Backpropagation: ▶ The error is propagated backward through the network from the output layer to the input layer. ▶ Each layer’s contribution to the error is computed. ▶ Weights are adjusted in each layer to minimize this contribution, reducing overall error. 35 Weight Updates Weight Updates: ▶ Weights are updated iteratively (i.e., the network continuously “steps” down) until the network converges to a minimum error. ▶ Convergence is achieved when subsequent weight updates (e.g., subsequent “steps”) result in minimal changes to the loss function. ▶ Regularization techniques like “dropout” help prevent overfitting during this process. 36 Backpropagation Example: Learning to Balance on a Bike Scenario: Imagine a child learning to balance on a bike. ▶ The Goal: Balance without falling over (minimize error). 37 Backpropagation Example: Learning to Balance on a Bike Step 1: The child tries to ride the bike. ▶ The child wobbles and leans too much to the left. ▶ This imbalance is like the network’s output error. 37 Backpropagation Example: Learning to Balance on a Bike Step 2: The child recognizes the error (falling to the left). ▶ The child needs to adjust by leaning to the right. ▶ This is similar to how the network measures and assesses error using a loss function. 37 Backpropagation Example: Learning to Balance on a Bike Step 3: Identify the contributions to error! ▶ The child’s body is similar to the weights between layers. ▶ The legs, arms, and torso all contribute to balancing the bike. ▶ These are the weights between layers (e.g., “arms” = weights between input and hidden layer 1, “torso” = weights between hidden layer 1 and hidden layer 2, “legs” = weights between hidden layer 2 and output). 37 Backpropagation Example: Learning to Balance on a Bike Step 4: Assess the degree of contributions to error! ▶ The position of the torso might contribute more to the error (e.g., the lean to the left) than the position of the arms. ▶ Thus, the child may need to adjust their torso to a greater degree than their arms. 37 Backpropagation Example: Learning to Balance on a Bike Step 5: The child’s body makes corrections. ▶ Each body part adjusts to minimize the wobble. The degree of adjustment is analogous to the learning rate! ▶ Body parts that contribute more to the wobble are adjusted more than the body parts that contribute minimally (e.g., adjust torso more than arms). ▶ Likewise, in the network, each set of weights between layers adjusts to reduce its contribution to the error accordingly. 37 Backpropagation Example: Learning to Balance on a Bike Step 6: Try and try again! With each attempt, the child gets better at balancing. ▶ Over time, fewer adjustments are needed, and the child rides smoothly. ▶ In the network, this is like optimizing weights through repeated backpropagation. 37 Outline Metaphorical Neurons The Basic Architecture Inputs to Outputs Learning as Weight Adjustment Differences Between Biological and Artificial Neurons 38 Complexity: Biological vs. Artificial Neurons Biological Neurons: Artificial Neurons: Highly complex cells with intricate Simplified mathematical functions structures, capable of processing that compute weighted sums of multiple signals simultaneously. inputs and apply activation functions. Chemical and electrical signalling. Operate in a more uniform and predictable manner compared to biological neurons. 39 Signal Transmission Biological Neurons: Artificial Neurons: Communicate via synapses using Transmit signals as numerical values neurotransmitters, with varied and through weighted connections. modulated signaling speeds. Typically involve unidirectional and Signals can be excitatory or uniform signal processing. inhibitory, affecting the likelihood of neuron firing. 40 Learning Mechanism Biological Neurons: Artificial Neurons: Learning involves synaptic plasticity, Learning occurs through where the strength of synaptic backpropagation, where weights are connections changes over time. adjusted based on the error between predicted and actual outputs. Complex processes like Long-Term Potentiation (LTP) and Long-Term Gradient descent is used to optimize Depression (LTD) affect learning and weights systematically, driven by a loss memory. function. Learning is not solely dependent on error correction but also involves other biological factors, such as neurotransmitter release and dendritic growth. 41 Backpropagation in Artificial Neural Networks 1. Process: Requires global knowledge of the network’s state, which is mathematically computed and applied uniformly across the network. 2. Effectiveness: Works well in artificial systems due to controlled environments and precise mathematical operations. 42 Backpropagation in Biological Neurons (1/2) 1. Challenges: Biological neurons do not have a centralized mechanism for error calculation and gradient descent as used in artificial networks. Neurons operate asynchronously, with learning involving a mix of local and global processes that do not align with the requirements of backpropagation. Synaptic changes are influenced by a range of factors (e.g., local neurotransmitter levels, electrical activity), making the precise, global error correction of backpropagation biologically implausible. 43 Backpropagation in Biological Neurons (2/2) 2. Alternative Theories: Some researchers propose that the brain might use different, more biologically plausible mechanisms for learning, such as Hebbian learning or other forms of local learning rules that do not rely on the exact mathematical principles of backpropagation. 44 Weight Distributions in Artificial Neural Networks 1. Weight Adjustments: Weights are adjusted systematically and globally across the network based on backpropagation and gradient descent. The distribution of weights is crucial for the network’s learning and performance. 45 Weight Distributions in Biological Neurons 1. Synaptic Strength: Synaptic weights (strengths) in biological neurons are influenced by a variety of local factors, including activity-dependent processes and biochemical changes. There is no direct equivalent to the precise, global weight adjustments seen in artificial neural networks. 2. Distributed Learning: Learning in the brain is thought to be distributed and decentralized, with each neuron or synapse contributing locally to the overall process. Weight distributions are likely more complex and less uniform compared to artificial systems. 46 Summary 1. Metaphor Benefits: Artificial neurons provide a simplified and functional model that helps in developing algorithms for learning and pattern recognition. 2. Metaphor Limitations: The actual complexity of biological neurons and their learning processes are not fully captured by artificial neural networks, particularly in the context of backpropagation and weight adjustments, which are unlikely to have direct biological counterparts. 47

Document Details

Related

Full Transcript