Week1 A Machine_Learning_for_Predictive_Data_Analytics.pdf

Full Transcript

CP322 Machine Learning Jiashu (Jessie) Zhao Evaluation References Fundamentals of Machine Learning for Predictive Data Analytics-Algorithms, Worked Examples, and Case Studies. By John D. Kelleher, Brian Mac Namee and Aoife D'Arcy (ISBN: 9780262331722). Pattern Recognit...

CP322 Machine Learning Jiashu (Jessie) Zhao Evaluation References Fundamentals of Machine Learning for Predictive Data Analytics-Algorithms, Worked Examples, and Case Studies. By John D. Kelleher, Brian Mac Namee and Aoife D'Arcy (ISBN: 9780262331722). Pattern Recognition and Machine Learning. By Authors: Bishop, Christopher (ISBN-13: 978-0387310732) Introduction to Machine Learning. By Ethem Alpaydin (ISBN - 13: 9780262028189) What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Introduction 1 What is Predictive Data Analytics? 2 What is Machine Learning? 3 How Does Machine Learning Work? 4 What Can Go Wrong With ML? 5 The Predictive Data Analytics Project Lifecycle: Crisp-DM 6 Summary What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary What is Predictive Data Analytics? What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Predictive Data Analytics encompasses the business and data processes and computational models that enable a business to make data-driven decisions. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Figure: Predictive data analytics moving from data to insights to decisions. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Example Applications: Price Prediction Fraud Detection Dosage Prediction Risk Assessment Propensity modelling Diagnosis Document Classification... What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary What is Machine Learning? What is Machine Learning? What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary (Supervised) Machine Learning techniques automatically learn a model of the relationship between a set of descriptive features and a target feature from a set of historical examples. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary - Classification: (Supervised) categorical Machine Learningtarget techniques automatically learn a model of the relationship between a set of descriptive features and a target feature from a set of historical examples. - Regression: continuous target Unsupervised learning Semi-supervised learning Reinforcement learning Figure: Using machine learning to induce a prediction model from a training dataset. Figure: Using the model to make predictions for new query instances. L OAN -S ALARY ID O CCUPATION AGE R ATIO O UTCOME 1 industrial 34 2.96 repaid 2 professional 41 4.64 default 3 professional 36 3.22 default 4 professional 41 3.11 default 5 industrial 48 3.80 default 6 industrial 61 2.52 repaid 7 professional 37 1.50 repaid 8 professional 40 1.93 repaid 9 industrial 33 5.25 default 10 industrial 32 4.15 default What is the relationship between the descriptive features (O CCUPATION, AGE, L OAN -S ALARY R ATIO) and the target feature (O UTCOME)? What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model This is also an example of a consistent prediction model What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO > 3 then O UTCOME=’default’ else O UTCOME=’repay’ end if This is an example of a prediction model This is also an example of a consistent prediction model Consistent: there are no instances in the dataset for which the model does not make a correct prediction. Notice that this model does not use all the features and the feature that it uses is a derived feature (in this case a ratio): feature design and feature selection are two important topics that we will return to again and again. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary What is the relationship between the descriptive features and the target feature (O UTCOME) in the following dataset? Loan- Salary ID Amount Salary Ratio Age Occupation House Type Outcome 1 245,100 66,400 3.69 44 industrial farm stb repaid 2 90,600 75,300 1.2 41 industrial farm stb repaid 3 195,600 52,100 3.75 37 industrial farm ftb default 4 157,800 67,600 2.33 44 industrial apartment ftb repaid 5 150,800 35,800 4.21 39 professional apartment stb default 6 133,000 45,300 2.94 29 industrial farm ftb default 7 193,100 73,200 2.64 38 professional house ftb repaid 8 215,000 77,600 2.77 17 professional farm ftb repaid 9 83,000 62,500 1.33 30 professional house ftb repaid 10 186,100 49,200 3.78 30 industrial house ftb default 11 161,500 53,300 3.03 28 professional apartment stb repaid 12 157,400 63,900 2.46 30 professional farm stb repaid 13 210,000 54,200 3.87 43 professional apartment ftb repaid 14 209,700 53,000 3.96 39 industrial farm ftb default 15 143,200 65,300 2.19 32 industrial apartment ftb default 16 203,000 64,400 3.15 44 industrial farm ftb repaid 17 247,800 63,800 3.88 46 industrial house stb repaid 18 162,700 77,400 2.1 37 professional house ftb repaid 19 213,300 61,100 3.49 21 industrial apartment ftb default 20 284,100 32,300 8.8 51 industrial farm ftb default 21 154,000 48,900 3.15 49 professional house stb repaid 22 112,800 79,700 1.42 41 professional house ftb repaid 23 252,000 59,700 4.22 27 professional house stb default 24 175,200 39,900 4.39 37 professional apartment stb default 25 149,700 58,600 2.55 35 industrial farm stb default What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO < 1.5 then O UTCOME=’repay’ else if L OAN -S ALARY R ATIO > 4 then O UTCOME=’default’ else if AGE < 40 and O CCUPATION =’industrial’ then O UTCOME=’default’ else O UTCOME=’repay’ end if What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary if L OAN -S ALARY R ATIO < 1.5 then O UTCOME=’repay’ else if L OAN -S ALARY R ATIO > 4 then O UTCOME=’default’ else if AGE < 40 and O CCUPATION =’industrial’ then O UTCOME=’default’ else O UTCOME=’repay’ end if The real value of machine learning becomes apparent in situations like this when we want to build prediction models from large datasets with multiple features. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary How Does Machine Learning Work? What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature. An obvious search criteria to drive this search is to look for models that are consistent with the data. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Machine learning algorithms work by searching through a set of possible prediction models for the model that best captures the relationship between the descriptive features and the target feature. An obvious search criteria to drive this search is to look for models that are consistent with the data. However, because a training dataset is only a sample,ML is an ill-posed problem. An ill posed problem is a problem for which a unique solution can not be determined using only the information that is available. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Table: A simple retail dataset ID B BY A LC O RG G RP 1 no no no couple 2 yes no yes family 3 yes yes no family 4 no no yes couple 5 no yes yes single Descriptive features indicate whether a customer buys a certain type of products. BBY: baby food ALC: alcohol ORG: organic vegetable What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Table: A full set of potential prediction models before any training data becomes available. B BY A LC O RG G RP M1 M2 M3 M4 M5... M6 561 no no no ? couple couple single couple couple couple no no yes ? single couple single couple couple single no yes no ? family family single single single family no yes yes ? single single single single single couple... yes no no ? couple couple family family family family yes no yes ? couple family family family family couple yes yes no ? single family family family family single yes yes yes ? single single family family couple family What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Table: A sample of the models that are consistent with the training data B BY A LC O RG G RP M1 M2 M3 M4 M5... M6 561 no no no couple couple couple single couple couple couple no no yes couple single couple single couple couple single no yes no ? family family single single single family no yes yes single single single single single single couple... yes no no ? couple couple family family family family yes no yes family couple family family family family couple yes yes no family single family family family family single yes yes yes ? single single family family couple family What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Table: A sample of the models that are consistent with the training data B BY A LC O RG G RP M1 M2 M3 M4 M5... M6 561 no no no couple couple couple single couple couple couple no no yes couple single couple single couple couple single no yes no ? family family single single single family no yes yes single single single single single single couple... yes no no ? couple couple family family family family yes no yes family couple family family family family couple yes yes no family single family family family family single yes yes yes ? single single family family couple family Notice that there is more than one candidate model left! It is because a single consistent model cannot be found based on a sample training dataset that ML is ill-posed. What is Predictive Data Analytics? What is ML? How Does ML Work? Underfitting/Overfitting Lifecycle Summary Consistency ⇡ memorizing the dataset. Consistency with noise in the data isn’t desirable. Goal: a model that generalises beyond the dataset and that isn’t influenced by the noise in the dataset. So what criteria should we use for choosing between models?

Use Quizgecko on...
Browser
Browser