Artificial Neural Networks Lecture 6 2024 PDF

Machine Learning for Business Artificial Neural Networks. Dr. Kamesam Cognition: the mental action or process of acquiring knowledge and understanding through thought, experience, and the senses 1 Data Mining / Machine Learning Supervised Unsupervised Learning Learning Regression Association Classification Clustering Rules Linear Nonlinear Regression Regression DT C&RT Logistic Regression ANN today Artificial Neural Networks (ANN) Reference Materials: Introduction to IBM SPSS modeler and Data mining this document is on BB Suggested Reading for ANN Sec 4.7 in Textbook (Tan et. al ) Algorithmic Guide SPSS Modeler (on BB) – Chapter 26 on Neural Network Algorithms Related Materials on Blackboard – Articles on Neural Networks including “Deep Learning” 3 INTRODUCTION Biological Motivation Human brain is a densely interconnected network of approximately 1011 (100 billion) neurons, each connected to, on average, 104 (ten thousand) others. A Neuron is a cell that performs information processing in the brain Neuron activity is excited or inhibited through connections to other neurons. The fastest neuron switching times are known to be on the order of 10-3 sec. 4 Neural Network 5 Introduction Traditional computers are very good at number crunching, but not so good at cognitive tasks. Human brain, on the other hand, is very good (and fast) at cognitive tasks, but cannot do computing tasks like a computer In order to build a computer systems that can “learn”, it is a tempting idea to mimic the human brain. The brain can fire all the neurons concurrently. ----- Parallelism Serial computers require billions of cycles to perform some tasks that the human brain takes a fraction of a second. e.g: Face Recognition 6 Definition of Neural Network A Neural Network is a system composed of many simple processing elements operating in parallel which can acquire, store, and utilize experiential knowledge. Geoffrey Hinton Significant contributions to Artificial Neural Networks and Deep Learning 2018Turing Prize : https://amturing.acm.org/2018-turing-award.cfm 8 Biological neuron synapse axon nucleus cell body dendrites A neuron has – A branching input (dendritic tree) to which other neurons connect – A branching output (the axon) to other neurons The information circulates from the dendrites to the axon via the cell body Axon connects to dendrites via synapses – Synapses vary in strength – Synapses may be excitatory or inhibitory – Synapse is where a signal passes from one nerve cell to another9 Machine Learning Abstraction 10 Neuron in ANN Single Neuron x1 w0 w1 The Logistic function x2 calculates p from Σ w2 output x3 w3 Σ p x4 w4 xm wm Topologies of Neural Networks completely connected feedforward recurrent (directed, a-cyclic) (feedback connections) Prototypical Feed Forward Artificial Neural Networks The information is propagated from the inputs Output layer to the outputs Any number of hidden layers 2nd hidden Most nonlinear functions can layer be learnt from this ANN archtecture. 1st hidden Where is the learning ? layer The Learning is stored in Input the weights on each layer connection between layers x1 x2 ….. xn 14 Single hidden layer ANN for classification 15 AI, ML, and Deep Learning AI (Artificial Intelligence) ML (Machine Learning) DL ( Deep Learning) Generative AI 1950 1980 2010 2020 Many successful ML models, including DL and Generative AI are based on ANN 16 Artificial Neural Networks Q: ANN is one of many ML algorithms. What is Unique about ANN ? A: ANN can learn complex nonlinear separation boundaries. Any nonlinear function can be approximated by a ANN with 2 or more hidden layers. Next few slides will illustrate a Linear and several nonlinear separation boundaries. All the examples are with 2 features. 17 Linear separation boundary 18 The points (in 3D) belong to two classes. Linearly Separable? 19 The Plane (Linear Equation in 3D) separates the classes 20 Nonlinear separation boundary 21 Nonlinear separation 22 Nonlinear separation 23 Nonlinear separation 24 Nonlinear separation 25 Trained One Hidden Layer ANN Ex: Predict 5 year Survival, given features (age, gender, Stage of cancer).6 Output Age 34.4.2 Σ.1.5 0.6 Gender 2.3.2.8 Σ.7 Σ “Probability of beingAlive” Stage 4.2 Target Input Layer Weights Hidden Weights Layer Prediction Features Prediction: will survive with a confidence of 0.6 26 Predicting Survival Inputs.6 Output Age 34.5 0.6.1 Gender 2 Σ.7.8 “Probability of beingAlive” Stage 4 Dependent Independent Weights Hidden Weights variable variables Layer Prediction 27 Predicting Survival Inputs Output Age 34.2.5 0.6 Gender 2.3 Σ “Probability.8 of beingAlive” Stage 4.2 Dependent Independent Weights Hidden Weights variable variables Layer Prediction 28 Predicting Survival Inputs.6 Output Age 34.2.5.1 0.6 Gender 1.3 Σ.7 “Probability.8 of beingAlive” Stage 4.2 Dependent Independent Weights Hidden Weights variable variables Layer Prediction 29 Multi-Layer Networks Multi-layer networks can represent arbitrary functions, but an effective learning algorithm for such networks was thought to be difficult. output hidden activation input The weights determine the function computed. With 2 hidden layers, any nonlinear function can be approximated. 30 Training a NN: What does it learn? It learns the function that best translates inputs into outputs given its architecture The learning is embodied in the set of weights The training is iterative. Starting with some initial weights: – For each training example, calculate the output, calculate the error – Adjust the weights based on error – Go to next example and repeat the process, till the error is sufficiently small, or the weights converge – “epoch” : One round of training on all samples 31 Back Propagation Algorithm Define Randomly choose the initial coefficients Repeat while TSSE is too large – For each training record (presented in random order) Apply the inputs to the ANN Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer Calculate the error at the outputs Use the output error to compute error signals for pre- output layers Use the error signals to compute coefficient adjustments Adjustment the coefficients – Go back to Repeat 32 Training an autonomous Vehicle Show Alwin Training Video 33 ANN can be trained for Regression or Classification Predicting continuous variables – Supply attribute values at input nodes – Obtain predictions from the output node(s) Predicting classes Two classes : single output node with threshold Multiple classes: multiple outputs, one for each class Predicted class = output node with highest value “Universal approximator ”: Combination of simple non- linear units gives amazing flexibility 34 ANN: Strengths & weakness Strengths: Given sufficiently many training examples, can learn very complex functions, very accurately – non-linearity is built into the model Can be trained for Classification as well as Regression Handles noisy data well Overfitting can be an issue Drawbacks: Training time can be considerable A Black Box: Hard to explain how the ANN learns 35 Successful Applications Text to Speech (NetTalk) https://www.aaai.org/Papers/MAICS/2000/MAICS00-015.pdf Speech Recognition Handwriting recognition Machine Translation Driverless Vehicles Fraud detection Financial Applications (FICO credit score) – HNCsoftware (eventually bought by Fair Isaac) Credit approve / disapprove models Chemical Plant Control – Pavillion Technologies Game Playing (Neurogammon) 36 How many hidden layers ? Zero hidden layers – Simple Perceptron – Can correctly classify any Linearly separable dataset One hidden layer – Suitable for a single convex region of the decision space Two hidden layers – Can generate any arbitrary decision boundary 37 Neural Nets with SPSS Modeler SPSS modeler calculates number of neurons in the hidden layer as Max(3, (#input neurons+#output neurons)/2). You can ask for 2 hidden layers and you need to specify the number of Neurons in each Layer SPSS Neural Nets normalizes the input as X = (X – min(X)) /(Max(X) – Min(X) ) You can specify stopping rule – Number of minutes of running time – Number of iterations – Size of error 38 End Of Lecture 6 SPSS Demo 39 Perceptron Activation Functions  n 1 if ∑ wi xi > 0 output =  i =0  0 else Linear Regression n output = net = ∑ wi xi i =0 Sigmoid (Logistic) 1 output = σ (net ) = 1 + e − net 40 Perceptron Training Rule Update weights by: where η is the learning rate a small value (e.g., 0.1) sometimes is made to decay as the number of weight-tuning operations increases t – target output for the current training example o – linear unit output for the current training example 41 Figure 5. The error surface How can we calculate the direction of steepest descent along the error surface? This direction can be found by computing the derivative of E w.r.t. each component of the vector w. 42 Multi-Layer Network Inputs.6 Output Age 34.4.2 Σ.1.5 0.6 Gender 2.3.2.8 Σ.7 Σ “Probability of beingAlive” Stage 4.2 Dependent Independent Weights Hidden Weights variable variables Layer Prediction 43 Lecture 7 ANN for Regression 44 Data Mining / Machine Learning Supervised Unsupervised Learning Learning Regression Association Classification Clustering Rules Linear Nonlinear Regression Regression ANN C&RT DT C&RT Logistic Regression ANN today Deep Learning 46 Loan Appraiser - revisited Illustrates that a neural network (feed-forward in this case) is filled with seemingly meaningless weights The appraised value of this property is $176,228 (but it is not easy to explain how it was arrived at) 47 Real Estate Appraiser 48 Loan Prospector Regression A Neural Network is like a black box that knows how to process inputs to create a useful output. The calculation(s) are quite complex and not easy to interpret 49 ANN examples in WEKA Prediction function example – Cpu.arff, 6 attributes, output is numerical – When the output is numerical, ANN does not use the sigmoid function on the output node. – For prediction, first try a Linear regression, get an idea, then try ANN Classification – Wisconsin breast cancer data – output {benign, malignant} 50 Feature Engineering: Example of Predicting Debt Default 51 Appropriate formulation of input variables is crucial A Nonlinear Prediction Model 52 How Does the Brain Work ? (2) Each consists of : SOMA, DENDRITES, AXON, and SYNAPSE. Multiple Linear Regression For the Houseprices.xls data, we fit a Linear Regression model as follows: Houseprice= 136.79 + 276.08 * propertysize + 0.129 * HouseSize -1.399 * Age The general form of a linear regression model is Y = b0 + b1* X1 +b2* X2 + …………… +bn* Xn where Y is the target (dependent variable) and X1 , X2, ……., Xn are the explanatory variables 54 What is an artificial neuron ? Definition : Non linear, parameterized function with restricted output range y  n  y = f  w0 + ∑ wi xi  w0  i =1  f is an activation function x1 x2 x3 to be defined 55 Neural Nets: Weaknesses A “black-box” -- Hard to explain or gain intuition For complex problems, training time could be quite high Highly prone to over fitting. Often requires substantial “fiddling” – feature engineering create (normalized) meaningful, numeric inputs – parameter selection/tuning – best used in the hands of a neural network expert – Network topology has to be specified by user. 56 Neural Network Learning Learning approach based on modeling adaptation in biological neural systems. Perceptron: Initial algorithm for learning simple neural networks (single layer) developed in the 1950’s. Backpropagation: More complex algorithm for learning multi-layer neural networks developed in the 1980’s. 57 A Computing Unit. Now in more detail but for a particular model only Simple Neural Networks Linear regression – A single neuron – Activation function: Φ(Σ) = Σ (just pass through the sum) – Output: Linear in inputs: weighted sum (maybe plus a constant) Logistic regression – Same as above, but – Activation function is Φ(Σ) = 1/(1+e-Σ) – Output: Non-linear (“squashed”) weighted sum of inputs 59 Logistic Regression Training x1 w0 Next Training Record w1 The Logistic function x2 calculates p from Σ w2 output x3 w3 Σ p Predict Target x4 w4 YES Adjust weights Error? xm wm NO A Single Neuron with Sigmoid activation Inputs Output Age 34.5 0.6.4 Gender 1 Σ “Probability of beingAlive”.8 Stage 4 Independent Coefficients Prediction variables 1 σ= n − ∑ wi xi Features 1+ e i =0 61 Artificial Neural Net for Classification Features Data Learner Classifier Target Output (ANN) (model) SPSS Modeler Refers to ANN as Multi Layer Perceptron 62 Single Neuron x1 x0 (x0 =1,always) w1 w0 x2 Activation function w2 Inputs output x3 w3 Σ φ(.) y Summing function x4 w4  n   ∑ wi xi  xm wm  i =0  63 Logistic Regression : one Layer ANN x1 x0 (x0 =1,always) w1 w0 x2 Activation function w2 Inputs output x3 w3 Σ φ(.) y Summing function x4 w4 The Logistic function is the activation in  n   ∑ wi xi  Logistic Regression xm wm  i =0  64 Multiple Linear Regression : one Layer ANN x1 x0 (x0 =1,always) w1 w0 x2 Activation function w2 Inputs output x3 w3 Σ φ(.) y Summing function x4 w4 MLR is a special case With No activation.  n   ∑ wi xi  xm wm  i =0  65 Perceptron Classifier The perceptron is a type of artificial neural network which can be seen as the simplest kind of feed forward neural network with a single Neuron: a linear classifier. Introduced in the late 50s. Perceptron convergence theorem Rosenblatt 1962: Perceptron will learn to classify any linearly separable set of inputs. 66 Artificial Neural Networks We will learn a computing model inspired by the biological neural network. However, everything we will learn today is implemented in software, on traditional computers. There are some research projects that implement neural networks in hardware (IBM synapse, others) ANN are a popular learning model for Supervised Learning (Classification, as well as Regression) Deep Learning methodology is based on ANN 67 Artificial Neural Networks Drawback: Not easy to explain to most people. Black Box Attractions: ANN ability to generalize and learn from data “mimics” a human’s ability to learn from experience. Very useful in Data Mining…better results are the hope – Can handle Noisy data well – Long training times may be needed – Once training is done, Evaluation is Fast. 68

Artificial Neural Networks Lecture 6 2024 PDF

Document Details

Tags

Related

Summary

Full Transcript