Practical_Machine_Learning_Course_Notes (1).pdf

Practical Machine Learning Course Notes Xing Su Contents Prediction.................................................. 3 In Sample vs Out of Sample Errors................................. 4 Prediction Study Design.......................................... 6 Sample Division Guidelines for Prediction Study Design..................... 7 Picking the Right Data....................................... 8 Types of Errors............................................... 9 Notable Measurements for Error – Binary Variables....................... 9 Notable Measurements for Error – Continuous Variables..................... 11 Receiver Operating Characteristic Curves................................ 12 Cross Validation.............................................. 14 Random Subsampling........................................ 14 K-Fold................................................. 15 Leave One Out............................................ 15 caret Package (tutorial).......................................... 16 Data Slicing.............................................. 16 Training Options (tutorial)..................................... 19 Plotting Predictors (tutorial).................................... 21 Preprocessing (tutorial)....................................... 25 Covariate Creation/Feature Extraction.................................. 28 Creating Dummy Variables..................................... 28 Removing Zero Covariates...................................... 29 Creating Splines (Polynomial Functions).............................. 29 Multicore Parallel Processing.................................... 30 Preprocessing with Principal Component Analysis (PCA)....................... 31 prcomp Function........................................... 31 caret Package............................................ 32 Predicting with Regression........................................ 35 R Commands and Examples..................................... 35 Prediction with Trees........................................... 40 Process................................................ 40 Measures of Impurity (Reference).................................. 40 1 Constructing Trees with caret Package.............................. 42 Bagging................................................... 43 Bagging Algorithms......................................... 44 Random Forest............................................... 46 R Commands and Examples..................................... 46 Boosting................................................... 49 R Commands and Examples..................................... 50 Model Based Prediction.......................................... 52 Linear Discriminant Analysis.................................... 53 Naive Bayes.............................................. 55 Compare Results for LDA and Naive Bayes............................ 56 Model Selection............................................... 57 Example: Training vs Test Error for Combination of Predictors................. 57 Split Samples............................................. 59 Decompose Expected Prediction Error............................... 59 Hard Thresholding.......................................... 60 Regularized Regression Concept (Resource)............................ 61 Regularized Regression - Ridge Regression............................. 61 Regularized Regression - LASSO Regression............................ 63 Combining Predictors........................................... 67 Example - Majority Vote....................................... 67 Example - Model Ensembling.................................... 68 Forecasting................................................. 69 R Commands and Examples..................................... 70 Unsupervised Prediction.......................................... 74 R Commands and Examples..................................... 74 2 Prediction process for prediction = population → probability and sampling to pick set of data → split into training and test set → build prediction function → predict for new data → evaluate – Note: choosing the right dataset and knowing what the specific question is are paramount to the success of the prediction algorithm (GoogleFlu failed to predict accurately when people’s search habits changed) components of predictor = question → input data → features (extracting variables/characteristics) → algorithm → parameters (estimate) → evaluation relative order of importance = question (concrete/specific) > data (relevant) > features (properly extract) > algorithms data selection – Note: “garbage in = garbage out” → having the correct/relevant data will decide whether the model is successful – data for what you are trying to predict is most helpful – more data → better models (usually) feature selection – good features → lead to data compression, retain relevant information, created based on expert domain knowledge – common mistakes → automated feature selection (can yield good results but likely to behave inconsistently with slightly different data), not understanding/dealing with skewed data/outliers, throwing away information unnecessarily algorithm selection – matter less than one would expect – getting a sensible approach/algorithm will be the basis for a successful prediction – more complex algorithms can yield incremental improvements – ideally interpretable (simple to explain), accurate, scalable/fast (may leverage parallel computation) prediction is effectively about trade-offs – find the correct balance between interpretability vs accuracy vs speed vs simplicity vs scalability – interpretability is especially important in conveying how features are used to predict outcome – scalability is important because for an algorithm to be of practical use, it needs to be implementable on large datasets without incurring large costs (computational complexity/time) 3 In Sample vs Out of Sample Errors in sample error = error resulted from applying your prediction algorithm to the dataset you built it with – also known as resubstitution error – often optimistic (less than on a new sample) as the model may be tuned to error of the sample out of sample error = error resulted from applying your prediction algorithm to a new data set – also known as generalization error – out of sample error most important as it better evaluates how the model should perform in sample error < out of sample error – reason is over-fitting: model too adapted/optimized for the initial dataset ∗ data have two parts: signal vs noise ∗ goal of predictor (should be simple/robust) = find signal ∗ it is possible to design an accurate in-sample predictor, but it captures both signal and noise ∗ predictor won’t perform as well on new sample – often times it is better to give up a little accuracy for more robustness when predicting on new data example # load data library(kernlab); data(spam); set.seed(333) # picking a small subset (10 values) from spam data set smallSpam

Practical_Machine_Learning_Course_Notes (1).pdf

Document Details

Related

Full Transcript

Upgrade to continue