Machine Learning Lecture Notes PDF
Document Details
Uploaded by DetachableToucan
Dr. Nor Azuana Ramli
Tags
Summary
This document is a lecture note on fundamental concepts of machine learning. It covers introductions to machine learning, different types of machine learning. It also explains the bias-variance tradeoff, overfitting and avoiding overfitting, as well as cross-validations.
Full Transcript
BSD3523 CHAPTER 1: FUNDAMENTAL CONCEPTS OF MACHINE LEARNING CONTENTS Introduction to Machine Learning Machine Learning Pipeline Machine Learning Applications Machine Learning with Python The Bias-Variance Trade-Off Overfitting and Underfitting Avoiding Overfitting COURSE OUTCO...
BSD3523 CHAPTER 1: FUNDAMENTAL CONCEPTS OF MACHINE LEARNING CONTENTS Introduction to Machine Learning Machine Learning Pipeline Machine Learning Applications Machine Learning with Python The Bias-Variance Trade-Off Overfitting and Underfitting Avoiding Overfitting COURSE OUTCOMES Understand the meaning and concept of Know all the By the end of this machine learning. terms used in machine chapter, you learning such as bias, variance, should be able to: Know how machine underfit and overfit. learning being applied in the real world. What is Machine Learning? WHAT IS MACHINE LEARNING? “Machine Learning is defined as the study of computer programs that leverage algorithms and statistical models to learn through inference and patterns without being explicitly programed.” Artificial Intelligence VS Machine Learning AI ML AI allows a machine to simulate human ML allows a machine to learn autonomously intelligence to solve problems from past data The goal is to develop an intelligent system that The goal is to build machines that can learn can perform complex tasks from data to increase the accuracy of the We build systems that can solve complex tasks output like a human We train machines with data to perform AI has a wide scope of applications specific tasks and deliver accurate results AI uses technologies in a system so that it Machine learning has a limited scope of mimics human decision-making applications AI works with all types of data: structured, ML uses self-learning algorithms to produce semi-structured, and unstructured predictive models AI systems use logic and decision trees to learn, ML can only use structured and semi- reason, and self-correct structured data ML systems rely on statistical models to learn and can self-correct when provided with new data MACHINE LEARNING VS DATA MINING TYPES OF MACHINE LEARNING SUPERVISED LEARNING Supervised learning algorithms are trained using labelled examples, such as an input where the desired output is known For example, a segment of text could have a category label, such as: - Spam vs Legitimate Email - Positive vs Negative Movie Review The network receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors It then modifies the model accordingly Supervised learning is commonly used in applications where historical data predicts likely future events UNSUPERVISED LEARNING Unsupervised machine learning holds the advantage of being able to work with unlabeled data. This means that human labor is not required to make the dataset machine-readable, allowing much larger datasets to be worked on by the program. In supervised learning, the labels allow the algorithm to find the exact nature of the relationship between any two data points. However, unsupervised learning does not have labels to work off, resulting in the creation of hidden structures. Relationships between data points are perceived by the algorithm in an abstract manner, with no input required from human beings. The creation of these hidden structures is what makes unsupervised learning algorithms versatile. Instead of a defined and set problem statement, unsupervised learning algorithms can adapt to the data by dynamically changing hidden structures. This offers more post-deployment development than supervised learning algorithms. Learning problem that involves a small number of labeled examples and a large number of unlabeled examples. Learning problems of this type are SEMI- challenging as neither supervised nor unsupervised learning algorithms are able to make effective use of the mixtures of labeled SUPERVISED and untellable data. As such, specialized semis-supervised learning algorithms are LEARNING required. One way to do semi-supervised learning is to combine clustering and classification algorithms. There are other ways to do semi-supervised learning, including semi-supervised support vector machines (S3VM), a technique introduced at the 1998 NIPS conference. REINFORCEMENT LEARNING ✓ Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones. ✓ In general, a reinforcement learning agent is able to perceive and interpret its environment, take actions and learn through trial and error. ✓ This learning method has been adopted in artificial intelligence (AI) as a way of directing unsupervised machine learning through rewards and penalties. ✓ Current use cases are gaming, resource management, personalized recommendations, robotics and others. ✓ Gaming is likely the most common usage field for reinforcement learning. It is capable of achieving superhuman performance in numerous games. A common example involves the game Pac-Man. ✓ One of the barriers for deployment of this type of machine learning is its reliance on exploration of the environment. ✓ The time required to ensure the learning is done properly through this method can limit its usefulness and be intensive on computing resources. As the training environment grows more complex, so too do demands on time and compute resources. STAGES OF MACHINE LEARNING Scikit Learn package is used since it is the most popular ML package for Phyton and has a lot of algorithms built-in To install you can just “pip install scikit-learn” (If use GoogleColab no need to install) MACHINE Example of simple coding using GoogleColab: LEARNING WITH PHYTON Scikit-learn Algorithm Cheat-Sheet Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data. Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. As a result, such models perform very well on training data but has high error rates on test data. VARIANCE When a model does not perform as well as it does with the trained data set, there is a possibility that the model has a variance. It basically tells how scattered the predicted values are from the actual values. Formula for bias: ▪ Here, ŷ is the prediction At the same time, however, overfitting won't help us to generalize to new data and derive true patterns from it. ▪ The model, as a result, will perform poorly on datasets that weren't seen before. We call this situation high variance in machine learning. Formula for variance: Generalization: Peril of Overfitting Figure 1 Figure 2 Overfitting occurs when a model tries to fit the training data so closely that it does not generalize well to new data. An overfit model gets a low loss during training but does a poor job predicting new data Overfitting is caused by making a model more complex than necessary. The fundamental tension of machine learning is between fitting our data well, but also fitting the data as simply as possible. Example of overfitting Underfit Model ✓The opposite scenario is underfitting. When a model is underfit, it doesn't perform well on the training sets and won't do so on the testing sets, which means it fails to capture the underlying trend of the data. ✓Underfitting may occur if we aren't using enough data to train the model, just like we will fail the exam if we don't review enough material this may also happen if we're trying to fit a wrong model to the data, just like we will score low in any exercises or exams if we take the wrong approach and learn it the wrong way. ✓We call any of these situations a high bias in machine learning although its variance is low as the performance in training and test sets is pretty consistent, in a bad way. Example of underfitting Example of well-fitting BIAS-VARIANCE TRADE OFF If our model is too simple and has very few parameters then it may have high This tradeoff in complexity is why there bias and low variance. On the other is a tradeoff between bias and hand if our model has large number of variance. An algorithm can’t be more parameters then it’s going to have high complex and less complex at the same variance and low bias. So we need to time. find the right/good balance without overfitting and underfitting the data. BIAS-VARIANCE TRADE OFF Side note Bias: Error in training data Variance: Error in test data IDEAL SCENARIO **To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error. AVOIDING OVERFITTING Cross – Validation Regularization Feature selection and dimensionality reduction CROSS VALIDATION Definition of cross validation Cross-Validation is a technique used to assess how well our Machine learning models perform on unseen data Why is it needed? To overcome over-fitting problems Cross-Validation is a resampling technique with the fundamental idea of splitting the dataset into 2 parts- training data and test data. Train data is used to train the model and the unseen test data is used for prediction. If the model performs well over the test data and gives good accuracy, it means the model hasn’t overfitted the training data and can be used for prediction. 1. Hold out method This is the simplest evaluation method and is widely used in Machine Learning projects. Here the entire dataset(population) is divided into 2 sets – train set and test set. The data can be divided into 70-30 or 60-40, 75-25 or 80-20, or even 50-50 depending on the use case. As a rule, the proportion of training data has to be larger than the test data. The data split happens randomly, and we can’t be sure which data ends up in the train and test bucket during the split unless we specify random_state. This can lead to extremely high variance and every time, the split changes, the accuracy will also change. Disadvantages of this method Advantage of this method The test error rates are highly variable It is computationally inexpensive (high variance) and it totally depends on compared to other cross-validation which observations end up in the training techniques. set and test set Only a part of the data is used to train the model (high bias) which is not a very good idea when data is not huge and this will lead to overestimation of test error. PHYTON CODE from sklearn.model_selection import train_test_split X = [10,20,30,40,50,60,70,80,90,100] X_train, X_test= train_test_split(X,test_size=0.3, random_state=1) print(“Train:”,X_train,”Test:” ,X_test) 2. Leave one out cross-validation In this method, we divide the data into train and test sets – but with a twist. Instead of dividing the data into 2 subsets, we select a single observation as test data, and everything else is labeled as training data and the model is trained. Now the 2nd observation is selected as test data and the model is trained on the remaining data. This process continues ‘n’ times and the average of all these iterations is calculated and estimated as the test set error. This method gives unbiased estimates (low bias), but it has an extremely high variance. The major drawback of this method is it’s computationally expensive as the model is run ‘n’ times to test every observation in the data. PHYTON CODE from sklearn.model_selection import LeaveOneOut X = [10,20,30,40,50,60,70,80,90,100] l = LeaveOneOut() for train, test in l.split(X): print("%s %s"% (train,test)) 3. K-fold cross-validation In this resampling technique, the whole data is divided into k sets of almost equal sizes. The first set is selected as the test set and the model is trained on the remaining k-1 sets. The test error rate is then calculated after fitting the model to the test data. In the second iteration, the 2nd set is selected as a test set and the remaining k-1 sets are used to train the data and the error is calculated. This process continues for all the k sets. This method has low variance and intermediate bias. Typically, K-fold Cross Validation is performed using k=5 or k=10 as these values have been empirically shown to yield test error estimates that neither have high bias nor high variance. PHYTON CODE from sklearn.model_selection import KFold X = ["a",'b','c','d','e','f'] kf = KFold(n_splits=3, shuffle=False, random_state=None) for train, test in kf.split(X): print("Train data",train,"Test data",test) 4. Stratified K-fold cross-validation This is a slight variation from K-Fold Cross Validation, which uses ‘stratified sampling’ instead of ‘random sampling.’ PHYTON CODE from sklearn.model_selection import StratifiedKFold X = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]]) y= np.array([0,0,1,0,1,1]) skf = StratifiedKFold(n_splits=3,random_state=None,shuffle=False) for train_index,test_index in skf.split(X,y): print("Train:",train_index,'Test:',test_index) X_train,X_test = X[train_index], X[test_index] y_train,y_test = y[train_index], y[test_index] To conclude, the Cross- Validation technique that we choose highly depends on the use case and bias- variance trade-off. REGULARIZATION Regularization adds extra parameters to the error function we're trying to minimize, in order to penalize complex models. EXAMPLE OF REGULARIZATION The linear model is preferable as it may generalize better to more data points drawn from the underlying distribution. We can use regularization to reduce the influence of the high orders of polynomial by imposing penalties on them. This will discourage complexity, even though a less accurate and less strict rule is learned from the training data. IMPORTANT NOTE ON THE REGULARIZATION Regularization should be kept at a moderate level or, to be more precise, fine-tuned to an optimal level. Too small a regularization doesn't make any impact; too large a regularization will result in underfitting, as it moves the model away from the ground truth. of Chapter 1