Lecture 2.pdf
Document Details
Uploaded by GalorePascal
Full Transcript
Lecture 2 Introduction to Machine Learning Joaquim Gromicho Learning objectives for today and next week Understand how the perceptron works and its limitations. Understand how combining perceptrons in (neural) networks expands their capabilities. Preview more advanced methodologies, such as Deep...
Lecture 2 Introduction to Machine Learning Joaquim Gromicho Learning objectives for today and next week Understand how the perceptron works and its limitations. Understand how combining perceptrons in (neural) networks expands their capabilities. Preview more advanced methodologies, such as Deep Learning, Reinforcement Learning and Generative AI. These are beyond the scope of this course. Understand the potential of analytics in helping overcome major struggles the Earth and her ecosystem face. Next week we address Machine Learning methodologies that you will become able to use! In fact, the second half of the exercises of coming week are mostly based on the next lecture, to be given on Tuesday 10. Analytics for a Better World Lecture 2 2 Image taken from Deep Learning vs. Machine Learning – What’s The Difference? Analytics for a Better World Lecture 2 3 Image taken from Deep Learning vs. Machine Learning – What’s The Difference? Analytics for a Better World Lecture 2 4 In the beginning… There was the perceptron! a simple binary linear classifier. Image from https://news.cornell.edu/stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon showing Frank Rosenblatt Neural networks This first part of the slide deck has been adapted from Lecture 12 from the course CSE 4309 – Machine Learning by Vassilis Athitsos at the Computer Science and Engineering Department of the University of Texas at Arlington Analytics for a Better World Lecture 2 6 𝑥0 = 1 Perceptrons 𝑥1 Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 A perceptron is a function that maps D-dimensional vectors to real 𝑥2 numbers. … For notational convenience, we add a zero-th dimension to every input 𝑥𝐷 vector, that is always equal to 1. 1 𝑥0 is called the bias input. It is 𝑥1 always equal to 1. 𝐱 = 𝑥2 𝑤0 is called the bias weight. It is optimized during training. … 𝑥𝐷 Analytics for a Better World Lecture 2 7 𝑥0 = 1 Perceptrons 𝑥1 Output: 𝑧 A perceptron computes its output 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 in two steps: 𝑥2 … First step: 𝑎 = 𝒘𝑇 𝒙 = σ𝐷 𝑖=0 𝑤𝑖 𝑥𝑖 𝑥𝐷 Second step: 𝑧 = ℎ 𝑎 In a single formula: 𝐷 𝑧 = ℎ 𝑤𝑖 𝑥𝑖 𝑖=0 Analytics for a Better World Lecture 2 8 𝑥0 = 1 Perceptrons 𝑥1 Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 𝐷 𝑥2 𝑎 = 𝒘𝑇 𝒙 = 𝑤𝑖 𝑥𝑖 𝑖=0 … 𝑧=ℎ 𝑎 𝑥𝐷 ℎ is called an activation function. For example, ℎ could be the sigmoidal function 1 ℎ 𝑎 =𝜎 𝑎 = 1 + 𝑒 −𝑎 Analytics for a Better World Lecture 2 9 𝑥0 = 1 Perceptrons 𝑥1 Output: 𝑧 We will see perceptrons later, but we will 𝑧=ℎ 𝒘𝑇 𝒙 not call them perceptrons. 𝑥2 For example, linear regression produces a … classifier function y 𝐱 = ℎ 𝒘Τ 𝒙 𝑥𝐷 If we set ℎ to the identity function, then y 𝐱 is a perceptron. More on this Tuesday! Analytics for a Better World Lecture 2 10 𝑥0 = 1 Perceptrons and Neurons 𝑥1 Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 Perceptrons are inspired by neurons. 𝑥2 Neurons are the cells forming the nervous system, and the brain. … Neurons somehow sum up their inputs, and if the sum exceeds a threshold, they "fire". 𝑥𝐷 Since brains are "intelligent", computer scientists have long been hoping that perceptron-based systems can be used to model intelligence. Analytics for a Better World Lecture 2 11 Image from https://towardsdatascience.com/the-concept-of-artificial-neurons-perceptrons-in-neural-networks-fab22249cbfc Activation Functions A perceptron produces output 𝑧 = ℎ 𝒘𝑇 𝒙. One choice for the activation function ℎ: the step function. 0, if 𝑎 < 0 ℎ 𝑎 =ቊ 1, if 𝑎 ≥ 0 The step function is useful for providing some intuitive examples. It is not useful for actual real-world systems. Reason: since it is not differentiable, it does not allow optimization via gradient descent, a very powerful method to find the ‘optimal weights’ given a training data set. Analytics for a Better World Lecture 2 12 Activation Functions A perceptron produces output 𝑧 = ℎ 𝒘𝑇 𝒙. Another choice for the activation function: the sigmoidal function. 1 𝜎 𝑎 = 1+𝑒 −𝑎 The sigmoidal is often used in real-world systems. It is a differentiable function: it allows use of gradient descent. Analytics for a Better World Lecture 2 13 A perceptron learning how to classify From this Medium article by Mohammad Abdin. More on this on the next lecture! Analytics for a Better World Lecture 2 14 What can perceptrons do? And what they cannot… Analytics for a Better World Lecture 2 15 false AND false = false Example: The AND Perceptron false AND true = false true AND false = false 𝑥0 = 1 true AND true = true Suppose we use the step function for activation. Suppose Boolean value false is 𝑥1 represented as number 0. Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 Suppose Boolean value true is 𝑥2 represented as number 1. Then, the perceptron on the right computes the Boolean AND function! Analytics for a Better World Lecture 2 16 false OR false = false Example: The OR Perceptron false OR true = true true OR false = true 𝑥0 = 1 true OR true = true Suppose we use the step function for activation. Suppose Boolean value false is 𝑥1 represented as number 0. Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 Suppose Boolean value true is 𝑥2 represented as number 1. Then, the perceptron on the right computes the Boolean OR function! Analytics for a Better World Lecture 2 17 Example: The NOT Perceptron NOT false = true NOT true = false 𝑥0 = 1 Suppose we use the step function for activation. Suppose Boolean value false is 𝑤1 = −1 𝑇 Output: 𝑧 represented as number 0. 𝑥1 𝑧=ℎ 𝒘 𝒙 Suppose Boolean value true is represented as number 1. Then, the perceptron on the right computes the Boolean NOT function! Analytics for a Better World Lecture 2 18 𝑥0 = 1 The XOR Function 𝑥1 Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 𝑥2 false XOR false = false 𝑤1 × 0 + 𝑤2 × 0 + 𝑤𝑜 0 Impossible! 𝑤0 < 2𝑤0 → 𝑤0 > 0 Analytics for a Better World Lecture 2 19 What now? What a perceptron cannot do alone… more than one perceptrons can do together! Analytics for a Better World Lecture 2 20 Our First Neural Network: XOR A neural network is Unit built using perceptrons 3 as building blocks. Unit Output: The inputs to some 𝑥1 5 perceptrons are outputs of other perceptrons. 𝑥2 Unit Here is an example 4 neural network computing the XOR function. Analytics for a Better World Lecture 2 21 Our First Neural Network: XOR Unit To simplify the picture, 3 we do not show the bias input anymore. Output: 𝑥1 Unit 5 We just show the bias weights 𝑤𝑗,0. 𝑥2 Unit Besides the bias input, 4 there are two inputs: 𝑥1 , 𝑥2 Analytics for a Better World Lecture 2 22 false XOR false = false Validation of the XOR network: false XOR true = true true XOR false = true check if false XOR false = false true XOR true = false Unit 3 Output: 0 𝑥1 Unit 5 ℎ −1 = 0 0 𝑥2 Unit 4 Correct! Analytics for a Better World Lecture 2 23 false XOR false = false Validation of the XOR network: false XOR true = true true XOR false = true check if false XOR true = true true XOR true = false Unit 3 Output: 0 𝑥1 Unit 5 ℎ 1 =1 1 𝑥2 Unit 4 Correct! Analytics for a Better World Lecture 2 24 false XOR false = false Validation of the XOR network: false XOR true = true true XOR false = true check if true XOR false = true true XOR true = false Unit 3 Output: 1 𝑥1 Unit 5 ℎ 1 =1 0 𝑥2 Unit 4 Correct! Analytics for a Better World Lecture 2 25 false XOR false = false Validation of the XOR network: false XOR true = true true XOR false = true check if true XOR true = false true XOR true = false Unit 3 Output: 1 𝑥1 Unit 5 ℎ −1 = 0 1 𝑥2 Unit 4 Correct! Analytics for a Better World Lecture 2 26 Anatomy of the XOR network A OR B Unit 3 Output: 𝑥1 A AND (NOT B) Unit 5 𝑥2 A AND B Unit 4 Check this out! Analytics for a Better World Lecture 2 27 Terminology Analytics for a Better World Lecture 2 28 1 𝑥0 Unit Neural Networks 3 Unit Output: 𝑥1 5 This neural network 𝑥2 example consists of six units: Unit 4 Three input units (including the not-shown bias input). Three perceptrons. Yes, in the notation we will be using, inputs count as units. Analytics for a Better World Lecture 2 29 1 𝑥0 Unit Neural Networks 3 Unit Output: 𝑥1 5 𝑥2 Weights are denoted as 𝑤𝑗𝑖. Unit 4 Weight 𝑤𝑗𝑖 belongs to the edge that connects the output of unit 𝑖 with an input of unit 𝑗. Units 0, 1, … , 𝐷 are the input units (units 0, 1, 2 in this example). Analytics for a Better World Lecture 2 30 1 𝑥0 Unit Neural Networks 3 Oftentimes, Unit Output: 𝑥1 neural networks are 5 organized into layers. 𝑥2 Unit The input layer is the initial layer of 4 input units (units 0, 1, 2 in our example). The output layer is at the end (unit 5 in our example). Zero, one or more hidden layers can be between the input and output layers. Analytics for a Better World Lecture 2 31 1 𝑥0 Unit Neural Networks 3 Unit Output: 𝑥1 There is only one 5 hidden layer in our 𝑥2 example, containing Unit units 3 and 4. 4 Each hidden layer's inputs (except bias inputs) are outputs from previous layer. Each hidden layer's outputs are inputs to the next layer. The first hidden layer's inputs come from the input layer. The last hidden layer's outputs are inputs to the output layer. Analytics for a Better World Lecture 2 32 Common activation functions https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-9689331ba092 Analytics for a Better World Lecture 2 33 Break Analytics for a Better World Lecture 2 34 What can we do with neural networks? Some of the following slides are adapted from Wouter Kool. Analytics for a Better World Lecture 2 35 Universal Approximation Theorem “A feedforward network with a single layer is sufficient to represent any function…” Let me show you why… Analytics for a Better World Lecture 2 36 Universal Approximation Theorem Combining ReLUs gives a piecewise linear function! https://towardsdatascience.com/can-neural-networks-really-learn-any-function-65e106617fc6 Analytics for a Better World Lecture 2 37 Universal Approximation Theorem Combining ReLUs gives a piecewise linear function! https://towardsdatascience.com/can-neural-networks-really-learn-any-function-65e106617fc6 Analytics for a Better World Lecture 2 38 Universal Approximation Theorem “A feedforward network with a single layer single decision tree is sufficient to represent any function” 1 1 𝑥< 𝑥≥ 2 2 1 1 3 3 𝑥< 𝑥≥ 𝑥< 𝑥≥ 4 4 4 4 Analytics for a Better World Lecture 2 39 0.03 0.07 −0.01 0.05 A decision tree… On Monday we will learn more about decision (an in particular classification) trees. These are noon-linear classifiers, that are in fact piecewise constant! Analytics for a Better World Lecture 2 40 Universal Approximation Theorem “… but the layer may be infeasibly large and may fail to learn and generalize correctly.” “A feedforward network with a single layer is sufficient to represent any function” Ian Goodfellow Analytics for a Better World Lecture 2 41 And beyond, image captioning Image recognition An important domain Image from https://www.aimtechnologies.co/2023/07/04/image-recognition-revolutionizing-visual-data-analysis/ A survey of evolution of image captioning techniques, A. Kumara,& S. Goelb Analytics for a Better World Lecture 2 43 https://www.slideshare.net/GrokkingVN/grokking-techtalk-21-deep-learning-in-computer-vision Deep learning Deep learning From What is a Deep Learning Neural Net, or Deep Neural Network? Part 4 Analytics for a Better World Lecture 2 47 A survey of evolution of image captioning techniques, A. Kumara,& S. Goelb A survey of evolution of image captioning techniques, A. Kumara,& S. Goelb A survey of evolution of image captioning techniques, A. Kumara,& S. Goelb Learning without data! REINFORCE Repeat Do something Result = ? Good! Bad! Do more often! Do less often! Increase chance to repeat Decrease chance to repeat Learning to play Atari https://github.com/jihoonerd/Human-level-control-through- deep-reinforcement-learning?tab=readme-ov-file Reinforcement Learning in Pacman, Abeynaya Gnanasekaran, Jordi Feliu Faba, Jing An 55 Analytics for a Better World Lecture 2 Major news: AlphaGo Uses (Deep) Neural Network to predict which part of the search tree to expand Source: Forbes article on self driving cars As for instance ChatGPT Generative models Dealing with unseen data Our models will be discriminative We will study models that learn from the given data only. These are discriminative models. In technical terms: discriminative models focus on learning the conditional probability distribution of the target labels given the input data. Our introduction will not focus on probabilities and distributions, since there is still a lot that you need to learn about that! In simple terms: if a model learns about cat pictures, it can be given a new picture and decide whether it is a cat or not. Generative models Generative models aim to model the joint probability distribution of the input data and the target labels. In other words, they learn how data is generated from the underlying probability distribution. So, if a model learns about cat pictures, it can be asked to produce an original picture of a cat! ChatGPT ChatGPT uses deep learning, a subset of machine learning, to produce humanlike text through transformer neural networks. The transformer predicts text -- including the next word, sentence or paragraph -- based on its training data's typical sequence. From https://www.techtarget.com/whatis/definition/ChatGPT Basically, it read a lot of documents and learnt their content as part of a bigger model of the knowledge in those documents. When generating answers, it starts ‘thinking with the moth open’ and determines each time the ‘most likely next word to say’. Under the hood its all about conditional probabilities! A personal view on analytics Joaquim’s thoughts on the power of Analytics to contribute to a better world. Demand for a product Gas use Number of travelers Predictions & Data Data Science descriptions Travel times on roads Risk of water damage Churn probability Machine breakdown Supply chain segments Most similar other customer Network design Hub locations Inventory levels Route plans Load plans Production schedule Marketing actions Price levels Sales & operations plan Maintenance plan Situation Operations Decisions description reserach Predictions & Data Data Science descriptions But how ? Situation description Operations reserach Decisions 1st Semester year 1 2nd Semester year 1 Period 1 Period 2 Period 3 Period 1 Period 2 Period 3 Mathematics 1: Calculus Probability Theory and Mathematics 2: Linear Probability Theory and Statistics 1 Algebra Statistics 2 Introduction (6011P0166) Introduction (6011P0168) (6011P0171) (6011P0173) 6 EC Data Science: 6 EC to 6 EC 6 EC Data Programming Microeconomics for AE Preprocessing Analytics for a Better World (6011P0801) Finance for AE Business Law and Ethics (6011P0803) (6011P0283) (6011P0260) (6011P0286) 6EC (6011P0802) 6 EC 6 EC 6 EC 6 EC 6 EC 1st Semester year 2 2nd Semester year 2 Mathematics 3: Advanced Operations Research / Econometrics 1 Operations Research – Linear Algebra and Real Management Deterministic Methods (6012B0374) Stochastic Methods Analysis Entrepre- (6012B0801) Consulting / (6012B0805) (6012B0385) neurship Operational Hackathon Excellence Accounting and Control (6012B0807) Algorithms and Data Machine Learning (6012B0803) HR Analytics Structures in Python (6012B0806) (6012B0419) (6012B0804) Why does this matter? https://reliefweb.int/map/world/hunger-map-2020 Mankind is in trouble… what about the planet? Sounds familiar? https://overshoot.footprintnetwork.org/newsroom/past-earth-overshoot-days/ https://overshoot.footprintnetwork.org/newsroom/country-overshoot-days/ www.overshootday.org Fortunately… Saving on resource utilization has the desired effect on reducing the environmental depletion! Almost all analytics applications are immediately green! Analytics for a Better World Lecture 2 74 But… We need to do much better if we want to save the planet… If you want to have a reality check, please read this concerning report from the United Nations. This Photo by Unknown Author is licensed under CC BY-SA-NC What have we learned? How a perceptron works and its limitations, how networks of perceptrons overcome those. How impressive Deep Learning, Reinforcement Learning and Generative AI are. Thar analytics can help our future on Earth. Analytics for a Better World Lecture 2 76