Supervised Learning (1) (Linear Regression).pptx
Document Details
Uploaded by FelicitousTragedy
Full Transcript
Supervised Learning Supervised Learning Supervised learning is where the model is trained on a labelled dataset. A labelled dataset is one that has both input and output parameters. The labeled dataset used in supervised learning consists of input features and corresponding output l...
Supervised Learning Supervised Learning Supervised learning is where the model is trained on a labelled dataset. A labelled dataset is one that has both input and output parameters. The labeled dataset used in supervised learning consists of input features and corresponding output labels. The input features are the attributes or characteristics of the data that are used to make predictions, while the output labels are the desired outcomes or targets that the algorithm tries to predict. Training the system: While training the model, data is usually split in the ratio of 80:20 i.e. 80% as training data and the rest as testing data. In training data, we feed input as well as output for 80% of data. The model learns from training data only. We use different machine learning algorithms to build our model. Learning means that the model will build some logic of its own. Once the model is ready then it is good to be tested. At the time of testing, the input is fed from the remaining 20% of data that the model has never seen before, the model will predict some value and we will compare it with the actual output and calculate the accuracy. More formally, the task of supervised learning is this: Given a training set of N example input– output pairs (x1,y1), (x2,y2),... (xN,yN ), where each pair was generated by an unknown function y = f (x), discover a function h that approximates the true function f. The function h is called a hypothesis about the world. It is drawn from a hypothesis space H of possible functions. h is a model of the data, drawn from a model class H, or we can say a function drawn from a function class. We call the output yi the class ground truth—the true answer we are asking our model to predict. How do we choose a hypothesis space? We might have some prior knowledge about the process that generated the data. If not, we can perform exploratory data analysis: examining the data with statistical tests and visualizations—histograms, scatter plots, box plots. How do we choose a good hypothesis from within the hypothesis space? We could hope for a consistent hypothesis: and h such that each xi in the training set has h(xi) = yi. With continuous-valued outputs we can’t expect an exact match to the ground truth; instead we look for a best-fit function for which each h(xi) is close to yi. The true measure of a hypothesis is not how it does on the training set, but rather how well it handles inputs it has not yet seen. We can evaluate that with a second sample of (xi,yi) pairs called a test set. We say that h generalizes well if it accurately predicts the outputs of the test set. Shows that the function h that a learning algorithm discovers depends on the hypothesis space H it considers and on the training set it is given. Each of the four plots in the top row have the same training set of 13 data points in the (x,y) plane. The four plots in the bottom row have a second set of 13 data points; both sets are representative of the same unknown function f(x). Each column shows the best-fit hypothesis h from a different hypothesis space: Column 1: Straight lines; functions of the form h(x) = ω1x+ω0. There is no line that would be a consistent hypothesis for the data points. Column 2: Sinusoidal functions of the form h(x) = ω1x+sin(ω0x). This choice is not quite consistent, but fits both data sets very well. Column 3: Piecewise-linear functions where each line segment connects the dots from one data point to the next. These functions are always consistent. Column 4: Degree-12 polynomials, These are consistent: we can always get a degree-12 polynomial to perfectly fit 13 distinct points. But just because the hypothesis is consistent does not mean it is a good guess. Bias: the tendency of a predictive hypothesis to deviate from the expected value when averaged over different training sets. Bias often results from restrictions imposed by the hypothesis space. Underfitting: when the hypothesis fails to find a pattern in the data. Variance: the amount of change in the hypothesis due to fluctuation in the training data. Overfitting: when the function pays too much attention to the particular data set it is trained on, causing it to perform poorly on unseen data. Bias–variance tradeoff: a choice between more complex, low-bias hypotheses that fit the training data well and simpler, low-variance hypotheses that may generalize better. Types of Supervised Learning Algorithm Supervised learning is typically divided into two main categories: Regression and Classification. In classification, the algorithm learns to predict a categorical output variable or class label, When the output is one of a finite set of values such as whether a customer is likely to purchase a product or not (true/false). In regression, the algorithm learns to predict a number (such as tomorrow’s temperature, measured either as an integer or a real number). Regression Regression is a supervised learning technique used to predict continuous numerical values based on input features. It aims to establish a functional relationship between independent variables and a dependent variable, such as predicting house prices based on features like size, bedrooms, and location. The goal is to minimize the difference between predicted and actual values using algorithms like: Linear Regression, Decision Trees, or Neural Networks, ensuring the model captures underlying patterns in the data. Linear Regression Linear regression is a type of supervised machine learning algorithm that computes the linear relationship between the dependent variable and one or more independent features by fitting a linear equation to observed data. When there is only one independent feature, it is known as Simple Linear Regression, and when there are more than one feature, it is known as Multiple Linear Regression. Similarly, when there is only one dependent variable, it is considered Univariate Linear Regression, while when there are more than one dependent variables, it is known as Multivariate Regression. Types of Linear Regression There are two main types of linear regression: Simple Linear Regression This is the simplest form of linear regression, and it involves only one independent variable and one dependent variable. The equation for simple linear regression is: y=β0+β1X where: Y is the dependent variable X is the independent variable β0 is the intercept β1 is the slope Multiple Linear Regression This involves more than one independent variable and one dependent variable. The equation for multiple linear regression is: y=β0+β1X1+β2X2+………βnXn where: Y is the dependent variable X1, X2, …, Xn are the independent variables β0 is the intercept β1, β2, …, βn are the slopes In regression set of records are present with X and Y values and these values are used to learn a function so if you want to predict Y from an unknown X this learned function can be used. In regression we have to find the value of Y, So, a function is required that predicts continuous Y in the case of regression given X as independent features. Our primary objective while using linear regression is to locate the best-fit line, which implies that the error between the predicted and actual values should be kept to a minimum. There will be the least error in the best-fit line. The best Fit Line equation provides a straight line that represents the relationship between the dependent and independent variables. The slope of the line indicates how much the dependent variable changes for a unit change in the independent variable(s). Here Y is called a dependent or target variable and X is called an independent variable also known as the predictor of Y. There are many types of functions or modules that can be used for regression. A linear function is the simplest type of function. Here, X may be a single feature or multiple features representing the problem. Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x)). Hence, the name is Linear Regression. Hypothesis function in Linear Regression Assume that our independent feature is the experience i.e X and the respective salary Y is the dependent variable. Let’s assume there is a linear relationship between X and Y then the salary can be predicted using: Ŷ = θ1+θ2X OR ŷi= θ1+θ2xi Here, yiϵ Y (i=1,2,⋯,n) are labels to data (Supervised learning) xiϵ X (i=1,2,⋯,n) are the input independent training data (univariate – one input variable(parameter)) ŷi ϵ Ŷ (i=1,2,⋯,n) are the predicted values. The model gets the best regression fit line by finding the best θ1 and θ2 values. θ1: intercept θ2: coefficient of x Once we find the best θ1 and θ2 values, we get the best-fit line. So when we are finally using our model for prediction, it will predict the value How to update θ1 and θ2 values to get the best-fit line? To achieve the best-fit regression line, the model aims to predict the target value Ŷ such that the error difference between the predicted value Ŷ and the true value Y is minimum. So, it is very important to update the θ 1 and θ2 values, to reach the best value that minimizes the error between the predicted y value (pred) and the true y value (y). Minimize: Cost function for Linear Regression The cost function or the loss function is nothing but the error or difference between the predicted value Ŷ and the true value Y. In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which calculates the average of the squared errors between the predicted values ŷi and the actual values yi. The purpose is to determine the optimal values for the intercept θ1 and the coefficient of the input feature θ2providing the best-fit line for the given data points. The linear equation expressing this relationship is ŷi= θ1+ θ2xi MSE function can be calculated as: Utilizing the MSE function, the iterative process of gradient descent is applied to update the values of θ1 &θ2. This ensures that the MSE value converges to the global minima, signifying the most accurate fit of the linear regression line to the dataset. The final result is a linear regression line that minimizes the overall squared differences between the predicted and actual values, providing an optimal representation of the underlying relationship in the data. Assumptions of Simple Linear Regression Linear regression is a powerful tool for understanding and predicting the behavior of a variable, however, it needs to meet a few conditions in order to be accurate and dependable solutions. 1. Linearity: The independent and dependent variables have a linear relationship with one another. This implies that changes in the dependent variable follow those in the independent variable(s) in a linear fashion. This means that there should be a straight line that can be drawn through the data points. If the relationship is not linear, then linear regression will not be an accurate model. 2. Independence: The observations in the dataset are independent of each other. This means that the value of the dependent variable for one observation does not depend on the value of the dependent variable for another observation. If the observations are not independent, then linear regression will not be an accurate model. 3. Homoscedasticity: Across all levels of the independent variable(s), the variance of the errors is constant. This indicates that the amount of the independent variable(s) has no impact on the variance of the errors. If the variance of the residuals is not constant, then linear regression will not be an accurate model. 4. Normality: The residuals should be normally distributed. This means that the residuals should follow a bell-shaped curve. If the residuals are not normally distributed, then linear regression will not be an accurate model