Automated Machine Learning (AutoML) PDF

Because learning changes everything.® Chapter 07: Automated Machine Learning Part 03: Analytical Methods for Supervised Learning © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Chapter 07 Learning Objectives LO 07-01: Define Automated Machine Learning. LO 07-02: Identify and compare various uses of automated modeling. LO 07-03: Investigate the Automated Machine Learning Process. LO 07-04: Summarize the value of ensemble models. LO 07-05: Construct and assess an Automated Machine Learning model. © McGraw Hill LLC 2 What Is Automated Machine Learning (AutoML)? Recall there are two main types of analytic methods. Supervised has a defined target variable. Unsupervised has no target variable. Running each supervised technique individually and comparing accuracy results is too time consuming. An efficient alternative is Automated Machine Learning (AutoML). This is a supervised approach that explores and selects models using different algorithms and compares their predictive performance. Users still must understand the underlying elements involved in developing the model. © McGraw Hill LLC 3 What Questions Might Arise? How was the data collected and What are the reasons behind why prepared for analysis? the recommended model How did the model arrive at a produced the most accurate particular conclusion? decision? What is the blueprint of the Are there data issues that could model? be impacting the validity of the model? Why did the model arrive at a particular conclusion? Is the model consistent in its predictions? What variables had the greatest impact on the predicted outcome? Why is the model a good predictor? What patterns exist in the data? How accurate is the model? © McGraw Hill LLC 4 AutoML in Marketing Forty percent of companies report already using machine learning to improve sales and marketing performance. The adoption rate for AutoML is expected to increase substantially. Access the text alternative for slide images. © McGraw Hill LLC 5 Which Companies Are Actively Using AutoML? Facebook. Blue Health Intelligence (BHI). AirBnB. United Airlines. Sumitomo Mitsui Banking URBN. Corporation (SMBC). Disney. Kroger. Pelephone. The Philadelphia 76ers. Salesforce Einstein. © McGraw Hill LLC 6 What Are Key Steps in the Automated Machine Learning Process? There are four key steps. Preparing the data. Building models. Creating ensemble models. Recommending models. © McGraw Hill LLC 7 Data Preparation Data preparation may include handling: Missing data. Outliers. Variable selection. Data transformation. Data standardization in order to maintain a common format. Invalid and unreliable data results in “garbage in, garbage out.” Appropriate data preparation is a fundamental first step in producing accurate model predictions. © McGraw Hill LLC 8 Model Building Many models are built automatically after the analyst specifies the dependent variable. The purpose of a model is to extract insights from data. AutoML uses pre-established modeling techniques that create access for anyone from novices to data science experts. © McGraw Hill LLC 9 Creating Ensemble Models Sometimes the best approach is to combine different algorithms, blending information from more than one model into a single “super model.” This type of model is referred to as an ensemble model. This process reduces issues such as noise, bias, and inconsistent or skewed variance that cause prediction problems. An ensemble model usually generates the best overall predictive performance. Keep in mind that understanding how different variables have contributed to an outcome can be difficult. © McGraw Hill LLC 10 Simple Approaches to Ensemble Modeling For continuous target variables, one method is to take the average of predictions from multiple models. You first run each model separately to create two prediction scores. Then calculate the average of the two models to create a new ensemble score. For categorical Another more advanced technique involves using target variables, a weighted average. the most common The higher quality data would be assigned category of greater importance and thus weighted higher. “majority” rule can be used. © McGraw Hill LLC 11 Advanced Ensemble Methods – Bagging Bagging, short for “Bootstrap Aggregating” involves two main steps. Step 1 generates multiple random small samples from the larger sample. Because the observation is not removed from the original sample, only copied, it can be copied again and placed in a second or third sample. This process is referred to as “bootstrap sampling.” Step 2 is to execute a model on each sample and then combine the results. Combined results are based on taking the average of all samples for continuous outcomes. Or the majority of case results for categorical variables. © McGraw Hill LLC 12 Exhibit 7-3: Bagging (Bootstrap Aggregating) Naik, Amey. “Bagging: Machine Learning through visuals. #1: What is ‘Bagging’ ensemble learning?” Medium. June 24, 2018. https://medium.com/machine-learning-through-visuals/machine-learning-through-visuals-part-1- what-is-bagging-ensemble-learning-432059568cc8. Access the text alternative for slide images. © McGraw Hill LLC 13 Advanced Ensemble Methods – Boosting The objective of boosting is reducing error in the model. Boosting achieves this by observing the error records in a model and then oversampling misclassified records in the next model created. During the first step, the model is applied to a sample of the data. A new sample is drawn that is more likely to select records that were misclassified in the first model. Next, the second model is applied to the new sample. The steps are repeated multiple times by fitting a model over and over. The purpose of boosting is to improve performance and reduce misclassification. The final model will have a better prediction performance than any of the other models. © McGraw Hill LLC 14 Exhibit 7-4: Boosting Access the text alternative for slide images. © McGraw Hill LLC 15 Model Recommendation Multiple predictive models are examined and the model with the most accurate predictions is recommended. Accuracy is determined by how well a model identifies relationships and patterns in a dataset and uses this knowledge to predict outcomes. Higher levels of accuracy are measured based on better predictions of observations, not in the original datasets used to develop the model. The most accurate prediction model(s) is then used to make better decisions. © McGraw Hill LLC 16 Case Study – Loan Data: Understanding When and How to Support Fiscal Responsibility in Customers Lending Club is a P2P lending platform who wants to reduce their default rate from 10 to 8 percent within a year. A supervised model is needed to identify borrowers with a high chance of default. You will upload data into DataRobot for AutoML analysis. After identifying the target variable, you run the model and evaluate the results before applying the model to predict new cases. The results of the AutoML revealed 2 customers out of 11 who were more likely to default on their loan. Lending Club can send targeted messages to these customers about bill paying, penalties, and free access to financial advisors. © McGraw Hill LLC 17 Because learning changes everything. ® www.mheducation.com © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC.

Automated Machine Learning (AutoML) PDF

Document Details

Tags

Related

Summary

Full Transcript