Lecture 3.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Lecture 3 Introduction to Machine Learning: classical methods Joaquim Gromicho Weekly exercises on Canvas The solutions for the first graded assignment and the solutions for first week’s tutorial exercises can now be found on Canvas. The exercises for this week can also be found on Canvas. An...
Lecture 3 Introduction to Machine Learning: classical methods Joaquim Gromicho Weekly exercises on Canvas The solutions for the first graded assignment and the solutions for first week’s tutorial exercises can now be found on Canvas. The exercises for this week can also be found on Canvas. Analytics for a Better World Lecture 3 2 How you did with Code Grade Careful! Analytics for a Better World Lecture 3 3 In the first part of the course, we learn about Machine Learning and its role in predictive analytics Relation with the textbook The part of this course which is devoted to Machine Learning and its role in Predictive Analytics is just briefly mentioned in the book in section 2.4. Therefore, last week and this week’s slides plus the accompanying notebooks offer the main source of knowledge that you need to answer the questions. From machine learning you are supposed to understand: Perceptron (discussed in the previous lecture, today and also next lecture) Clustering (this and next lecture) Classification (this and next lecture) Linear regression (this and next lecture) Alongside with the slides one may also read this introduction to perceptron and this introduction to clustering, classification and regression. Acknowledgment: Today’s slides include materials from Wouter Kool and Ronald Buitenhek from ORTEC. Good example on how to learn from the materials Your seminar teacher Nicole Guarnieri experiences what once said by Seneca the Younger: While we teach, we learn. Nicole created this notebook while preparing herself to teach the first week’s exercises. Isn’t this a great example that you all can follow? Learning objectives: at the end of today’s lecture, you should… Be able to interpret and summarize the key characteristics of tabular data, including rows, columns, and various data types. Be able to explain the learning process of a perceptron, including weight updates, and describe how it acts as a linear classifier. Be able to differentiate between supervised and unsupervised learning by comparing their key features and use cases. Be able to explain the purposes of classification, clustering, and regression, and identify appropriate scenarios for using each method. Be able to apply key algorithms such as decision trees, k-means clustering, and linear regression to solve classification, clustering, and regression problems. Analytics for a Better World Lecture 3 7 Introduction to data Business data is often organized in tables Example taken from: S. Moro, P. Cortez and P. Rita. A Data- Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 Analytics for a Better World Lecture 3 9 Data tables contain observed values on variables Or at least this is how the analyst needs the data: tidy Variables Observations Values Hadley Wickham, Tidy data, The Journal of Statistical Software, vol. 59, 2014. Variable names Observation “names” or Observation “id’s” Categorical variables Typical data types in tables (qualitative data) ◼ Ordinal variables Numerical variables There is a logical order. (quantitative data) For example: Perfect, Good, Acceptable, Bad. Real values ◼ Nominal variables Unordered. Labels. Whole numbers For example: Male, Female, Other ◼ Binary variables Two values, Yes/No, True/False, 0/1 Note: “age” is replaced by “age category” Can you guess what you see on the next slide? Its also data… What do we ‘see’ here? This was a simple exercise in prediction… But what are predictions? We can see a perceptron as a predictor: the AND perceptron predicts the outcome of the Boolean AND function. Encoding with numbers Each pair of truth values (A,B) becomes a point in the plane. That value has as coordinates the inputs and as ‘value’ the output of the AND function. Analytics for a Better World Lecture 3 16 false AND false = false Example: The AND Perceptron false AND true = false true AND false = false 𝑥0 = 1 true AND true = true Assuming the step function for activation and representing the 𝑥1 Boolean value false as number 0 Output: 𝑧 and the Boolean value true is as 𝑧=ℎ 𝒘𝑇 𝒙 number 1, we saw on the previous 𝑥2 lecture that that the perceptron on the right correctly computes the Boolean AND function. Analytics for a Better World Lecture 3 17 false AND false = false Example: The AND Perceptron false AND true = false true AND false = false 𝑥0 = 1 true AND true = true 𝑥1 Output: 𝑧 𝑧=ℎ 𝒘𝑇 𝒙 𝑥2 Analytics for a Better World Lecture 3 18 Rosenblatt's Perceptron Learning Algorithm 1. Start: with initial values for the weights (we take 0). 2. Check the prediction: For each input example, the perceptron makes a prediction using the current weights. 3. Correct mistakes: If the prediction is wrong, the algorithm adjusts the weights slightly to make the correct output more likely in the future. It does this by: 1. Increasing the weights if the output was too low. 2. Decreasing the weights if the output was too high. 4. Repeat until right: This process is repeated a predefined number of times or until no mistakes occur. In short, every time the perceptron makes a mistake, it “learns” by adjusting the weights in the right direction to reduce future mistakes. For more information, including a more detailed history and attribution, please read the Wikipedia page on the perceptron. Analytics for a Better World Lecture 3 19 Animation of learning the AND function Analytics for a Better World Lecture 3 20 Back to predictions Analytics for a Better World Lecture 3 21 We tend to think like this… but are all our predictions in the future? Each of us has been the subject of predictions Insemination, gestation and birth (ok, this was future!) How does that relate to data? Obs 𝑿𝟏 𝑿𝟐 𝑿𝟑... 𝑿𝒏 𝒀 Obs 1 Obs 2 Obs 3 … … Obs many Obs 𝑿𝟏 𝑿𝟐 𝑿𝟑... 𝑿𝒏 𝒀 Obs 1 Obs 2 Obs 3 … … Obs many Obs 𝑿𝟏 𝑿𝟐 𝑿𝟑... 𝑿𝒏 𝒀 Obs 1 Obs 2 Obs 3 … … Obs many ID Predictors Response Obs 𝑿𝟏 𝑿𝟐 𝑿𝟑... 𝑿𝒏 𝒀 Obs 1 Obs 2 Obs 3 … … Obs many BIG DATA is long Many columns and wide Obs 𝑿𝟏 𝑿𝟐 𝑿𝟑... 𝑿𝒏 𝒀 Obs 1 Obs 2 Many Obs 3 rows … … Obs many Obs 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 Obs 1 Obs 2 Model … Historical data Obs many 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 New obs ? Prediction Obs 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 Obs 1 Obs 2 Model … Historical data Obs many Prediction for Y A model with known relationships Obs 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 Obs 1 Obs 2 𝒀 = 𝒂𝑿 + 𝒃 … Historical data Obs many Obs 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 Obs 1 𝒀 = 𝒂𝑿 + 𝒃 Obs 2 Machine learning aims to … Historical data Obs many Obs 𝑿𝟏 𝑿𝟐... 𝑿𝒏 𝒀 Obs 1 Obs 2 … Historical data ? Obs many Machine learning explores the study and construction of algorithms that can learn from and make predictions on data How to choose a methodology? Accuracy Interpretability ? These concepts will be defined on Friday! What enables predictions? We learn from data! Methodologies learn patterns Its all about correlation! Be careful: we want to believe that we learn causality! Causation is not the same as Correlation https://www.tylervigen.com/spurious-correlations Ice cream Causation Correlation Causation Dry, hot and sunny summer weather Sunburn Consider watching this TEDx talk as Prof. Dr. Ionica Smeets on the The danger of mixing up causality and well correlation. https://xkcd.com/552/ How do humans classify? Case: Fruit classification with a decision tree Case: Fruit classification with a decision tree A decision tree is a flowchart of questions For example, Try to create your first decision tree with Is it yellow? the questions like the ones on the left that If so, is it long and skinny? decides on the fruits: Apple, Banana, If so, is it easy to peel? Grapefruit, Orange, and Plantain. Then it is a banana! Is it yellow? no yes Is it long and Is it orange? skinny? no yes no yes Is it about the size Apple Apple Is it easy to peel? of a tennis ball? no yes no yes Grapefruit Orange Plantain Banana Areas that we will highlight Areas that we will highlight Regression – Classification – Clustering Name Age Gender Country Height John 0 M NL 60 Jack 10 M UK 130 Julie 50 F Ghana 160 Joyce 80 F USA 155 Jesse 60 F Argentina 178 Yifan 25 F China 175 Name Age Gender Country Height John 0 M NL 60 Jack 10 M UK 130 Regression Julie 50 F Ghana 160 for predicting Joyce 80 F USA 155 continuous variables Jesse 60 F Argentina 178 Yifan 25 F China 175 Name Age Height Country Gender John 0 60 NL M Jack 10 130 UK M Classification Julie 50 160 Ghana F for predicting Joyce 80 155 USA F discrete variables Jesse 60 178 Argentina F Yifan 25 175 China F Classification Belongs to True or false Category 1 or to category 2 or to Will move house or category 3 …. will not move house Male or female Good move or bad move A or B Sweetness Crunchiness Food type Cheese 1 1 Proteins Fish 3 1 Proteins Shrimp 2 2 Proteins Bacon 1 4 Proteins Nuts 4 5 Proteins Green bean 4 7 Vegetables Cucumber 3 8 Vegetables Lettuce 1 9 Vegetables Celery 3 10 Vegetables Carrot 7 10 Vegetables Apple 10 9 Fruits Pear 10 6 Fruits Grape 10 5 Fruits Orange 8 2 Fruits Banana 10 1 Fruits Tomato 5 4 ? Regression What is the height of the person? Which number? What will the new project cost? Which Y? How much … How many … Known X Unsupervised learning Regression – Classification – Clustering Jorge Joyce Name Age Gender Country Height Jesse John 0 M NL 60 Yifan Jack 10 M USA 130 Jorge 50 M Argentina 160 Joyce 80 F USA 155 Jesse 60 F Argentina 178 Yifan 25 F NL 175 John Jack No response variable How do machines classify? The available data: only length and width Fruit sizes plotted Outliers What do you see? Exercise 2: Clean the data ◼ Investigate the nature of the outliers. ◼ If required, suggest corrected values for either the fruit type (Name), the Length or the Width or all of them. Outlier analysis answer Fruit sizes plotted Length and Width swapped Factor 10 too small Let’s get our hands-on this example Please open this notebook to put your hands on this dataset. The same notebook includes classification trees, clustering and regression and illustrates this whole lecture. Draw boundaries … to create areas of Length-Width values that identify a fruit type. What is the first logical line to draw? A model trained on Apples, Oranges, Grapefruit and Bananas … … cannot recognize melons. If the data does not contain small bananas Then for the model, they do not exist Clustering: grouping similar things together Case: clustering Visualize Create a scatter plot Case: clustering Question: Manual clustering ◼ Which clusters would you create? ◼ Why? The most well-known clustering algorithm is k-means Note k-means works for numerical data It is based on the distances between the observations. K=3 K=3 Iteration 1: ◼ Choose 3 random initial centroids ◼ … K=3 Iteration 1: Choose random initial centroids Assign observations to nearest centroids We need a to be able to measure the distance between the points K=3 Iteration 2: Update cluster centroids Assign observations to nearest centroids K=3 Iteration 3: Update cluster centroids Assign observations to nearest centroids K=3 Iteration 4: Update cluster centroids Assign observations to nearest centroids k-Means: Overview of all iterations Iteration 1 Iteration 2 Iteration 3 Iteration 4 The k-means algorithm Results from the k-means algorithm for our example 𝑘=2 𝑘=3 𝑘=4 Results from the k-means algorithm for our example 𝑘=6 𝑘 = 91 Regression: predicting values Linear regression Regression: 𝑌 = 𝑓(𝑋) Linear regression: 𝑌 = 𝑎 + 𝑏 𝑋 A model for the relation between numerical variables Regression: Y = f(X) Linear regression: Y = a + b X Linear regression finds the best straight line through the data. This also works for more than one X: 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 Analytics for a Better World Lecture 3 Which line to choose? Linear regression 𝑦ො = 𝛽0 + 𝛽1 𝑥 The values of 𝑚 and 𝑏 should be such that σ𝑛𝑖=1 𝑦𝑖 − 𝑦ො𝑖 2 is minimized. Setting the gradient to zero yields: 𝑛σ𝑥𝑦− σ𝑥 σ𝑦 𝛽1 = 𝑛σ𝑥 2 − σx 2 𝛽0 = 𝑦ത − 𝛽1 𝑥ҧ With 𝑥ҧ = σ𝑥/𝑛 and 𝑦ത = σ𝑦/𝑛 Check: did we… Become able to interpret and summarize the key characteristics of tabular data, including rows, columns, and various data types? Become able to explain the learning process of a perceptron, including weight updates, and describe how it acts as a linear classifier? Become able to differentiate between supervised and unsupervised learning by comparing their key features and use cases? Become able to explain the purposes of classification, clustering, and regression, and identify appropriate scenarios for using each method? Become able to apply key algorithms such as decision trees, k-means clustering, and linear regression to solve classification, clustering, and regression problems? Analytics for a Better World Lecture 3 88 On Friday We will learn about accuracy and bias. We also learn what is the best tree, cluster and linear model for regression and learn how they are computed. We try to find optimal predictors!