Machine Learning (CSE 446): Geometry: Nearest Neighbors and K-means PDF

Summary

This document is lecture notes from a Machine Learning course (CSE 446) offered at the University of Washington, discussing various aspects of machine learning including geometrical concepts, such as nearest neighbors, and algorithms like k-means. These notes are not a past paper.

Full Transcript

Machine Learning (CSE 446): Geometry: Nearest Neighbors and K-means Sham M Kakade c 2018 University of Washington [email protected] 1 / 19 Review 1 / 19 Danger: Overfitting...

Machine Learning (CSE 446): Geometry: Nearest Neighbors and K-means Sham M Kakade c 2018 University of Washington [email protected] 1 / 19 Review 1 / 19 Danger: Overfitting overfitting error rate unseen data (lower is better) training data depth of the decision tree 2 / 19 Estimating your Error and Some help in avoiding overfitting I Test set: use these to estimate the performance of your learning algorithm. The cardinal rule of machine learning: Don’t touch your test data. I Dev set: use this during learning to help avoid overfitting. If we have hyperparameters (like depth, width), we can tune these on development data. 3 / 19 One limit of learning: Inductive Bias I Just as you had a tendency to focus on a certain type of function f , you want your algorithm to be “biased” towards the correct classifier (so that it can learn with a “small” number of examples). I BUT remember there is “no free lunch”: this “bias” means you must do worse on other (hopefully unrealistic) problems. 4 / 19 Another Limit of Learning: The Bayes Optimal Classifier f (BO) (x) = argmax D(x, y) y Theorem: The Bayes optimal classifier achieves minimal expected classification (e.g. zero/one) error (`(y, ŷ) = Jy 6= ŷK) of any deterministic classifier. See CIML and lecture notes for proof. 5 / 19 Today 5 / 19 Features Data derived from https://archive.ics.uci.edu/ml/datasets/Auto+MPG mpg; cylinders; displacement; horsepower; weight; acceleration; year; origin I All features are really represented as real values. (they are really “tuples”) I The “1–2–3” values suggest ordinality, which is misleading. I Side note: can convert discrete origin feature into three binary features as follows: 1/america → (1, 0, 0) 2/europe → (0, 1, 0) 3/asia → (0, 0, 1) 6 / 19 Instance x Becomes Vector x First example in the data, “Chevrolet Chevelle Malibu,” becomes: [8, 307.0, 130.0, 3504, 12.0, 70, 1, 0, 0] “Buick Skylark 320” becomes: [8, 350.0, 165.0, 3693, 11.5, 70, 1, 0, 0] 7 / 19 Euclidean Distance General formula for the Euclidean distance between two d-length vectors: v u d 0 uX dist(x, x ) = t (x[j] − x0 [j])2 j=1 = x − x0 2 The distance between the Chevrolet Chevelle Malibu and the Buick Skylark 320: s (8 − 8)2 + (307 − 350)2 + (130 − 165)2 + (3504 − 3693)2 +(12 − 11.5)2 + (70 − 70)2 + (1 − 1)2 + (0 − 0)2 + (0 − 0)2 √ = 1849 + 1225 + 35721 + 0.25 ≈ 196.965 8 / 19 Training Data in Rd 9 / 19 Classifying a New Example in Rd 10 / 19 Nearest Neighbor Classifier Data: training data D = h(xn , yn )iN n=1 , input x Result: predicted class let n∗ = argmin dist (xn , x); n∈{1,...,N } return yn∗ ; Algorithm 1: NNTest 11 / 19 Concept: Decision Boundary 12 / 19 Concept: Decision Boundary 13 / 19 Classifying a New Example in Rd 14 / 19 K-Nearest Neighbors Classifier Data: training data D = h(xn , yn )iNn=1 , input x Result: predicted class S = ∅; for n ∈ {1,... , N } do S = S ∪ {(dist(xn , x), yn )}; end # sort on distances L = sort(S); return majorityClass(L,... , L[K]); Algorithm 2: KNNTest 15 / 19 K-Nearest Neighbors: Comments I Inductive Bias: I Neighbors have the same label; classes align to contiguous “regions” in feature space. I All features are equally important. I What is the training error of 1-NN? I Should K be even or odd? I What about high dimensions? 16 / 19 Detour: Unsupervised Learning 16 / 19 Unsupervised Learning The training dataset consists only of hxn iN n=1. There might, or might not, be a test set with correct classes y. Simplest kind of unsupervised learning: cluster into K groups. 17 / 19 How should we cluster the points? “chicken-egg” problem: I If we know which points are grouped together, then easy to figure out the mean of any cluster. I If we know the means of the clusters, then easy to group the points together. How do we cluster our points? 18 / 19 An Iterative Clustering Algorithm 19 / 19 An Iterative Clustering Algorithm The stars are cluster centers, randomly initialized. 19 / 19 An Iterative Clustering Algorithm Assign each example to its nearest cluster center. 19 / 19 An Iterative Clustering Algorithm Recalculate cluster centers to reflect their respective examples. 19 / 19 An Iterative Clustering Algorithm Assign each example to its nearest cluster center. 19 / 19 An Iterative Clustering Algorithm Recalculate cluster centers to reflect their respective examples. 19 / 19 An Iterative Clustering Algorithm Assign each example to its nearest cluster center. 19 / 19 An Iterative Clustering Algorithm Recalculate cluster centers to reflect their respective examples. 19 / 19 An Iterative Clustering Algorithm At this point, nothing will change; we have converged. 19 / 19

Use Quizgecko on...
Browser
Browser