Logistic Regression PPT PDF
Document Details
Uploaded by CleverNobelium1412
Tags
Summary
This presentation explains logistic regression, a statistical method used in machine learning for binary classification. It covers the equation, assumptions, and different types of logistic regression (binary, multinomial, and ordinal). The document focuses on the theoretical aspects of the technique.
Full Transcript
Logistic Regression What is Logistic Regression? Logistic regression is a supervised machine learning algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation. The model delivers a binary or dichotomous outcome...
Logistic Regression What is Logistic Regression? Logistic regression is a supervised machine learning algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation. The model delivers a binary or dichotomous outcome limited to two possible outcomes: yes/no, 0/1, or true/false. Logistic regression is a type of statistical model and it is often used for classification and predictive analytics. Logical regression analyzes the relationship between one or more independent variables and classifies data into discrete classes. 2 What is Logistic Regression? Logistic regression is extensively used in predictive modeling, where the model estimates the mathematical probability of whether an instance belongs to a specific category or not. Logistic regression is commonly used in binary classification problems where the outcome variable reveals either of the two categories (0 and 1). Some examples of such classifications and instances where the binary response is expected or implied are: Determine the probability of heart attacks Possibility of enrolling into a university Identifying spam emails 3 Key advantages of logistic regression 1. Easier to implement machine learning methods: Training a logistic model with a regression algorithm does not demand higher computational power. As such, logistic regression is easier to implement, interpret, and train than other ML methods. 2. Suitable for linearly separable datasets: In logistic regression, the y variable takes only two values. Hence, one can effectively classify data into two separate classes if linearly separable data is used. 3. Provides valuable insights. 4 Logistic Regression Equation and Assumptions Logistic regression uses a logistic function called a sigmoid function to map predictions and their probabilities. The sigmoid function refers to an S-shaped curve that converts any real value to a range between 0 and 1. If the output of the sigmoid function (estimated probability) is greater than a predefined threshold, the model predicts that the instance belongs to that class. If the estimated probability is less than the predefined threshold, the model predicts that the instance does not belong to the class. 5 Logistic Regression Equation and Assumptions For example, if the output of the sigmoid function is above 0.5, the output is considered as 1. On the other hand, if the output is less than 0.5, the output is classified as 0. Also, if the graph goes further to the negative end, the predicted value of y will be 0 and vice versa. In other words, if the output of the sigmoid function is 0.65, it implies that there are 65% chances of the event occurring; a coin toss, for example. 6 Logistic Regression Equation and Assumptions 7 Logistic Regression Equation and Assumptions The sigmoid function is referred to as an activation function for logistic regression and is defined as: where, e = base of natural logarithms x = numerical value one wishes to transform 8 Logistic Regression Equation and Assumptions The following equation represents logistic regression: where, e = base of natural logarithms x = input value y = predicted output b0 = bias or intercept term b1 = coefficient for input (x) 9 Key properties of the logistic regression equation Typical properties of the logistic regression equation include: 1. Logistic regression’s dependent variable obeys ‘Bernoulli distribution’ 2. Estimation/prediction is based on ‘maximum likelihood.’ 10 Key Assumptions for implementing logistic regression While implementing logistic regression, one needs to keep in mind the following key assumptions: 1. The dependent/response variable is binary. 2. Little or no multicollinearity between the predictor/explanatory variables 3. Linear relationship of independent variables to log odds 4. Prefers large sample size 5. Problem with extreme outliers 6. Consider independent observations 11 Key Assumptions for implementing logistic regression 1. The dependent/response variable is binary: The first assumption of logistic regression is that response variables can only take on two possible outcomes – pass/fail, and male/female. This assumption can be checked by simply counting the unique outcomes of the dependent variable [class distribution]. If more than two possible outcomes surface, then one can consider that this assumption is violated. 12 Key Assumptions for implementing logistic regression 2. Little or no multicollinearity between the predictor variables: This assumption implies that the predictor variables (or the independent variables) should be independent of each other. Multicollinearity relates to two or more highly correlated independent variables. Such variables do not provide unique information in the regression model and lead to wrongful interpretation. 13 Key Assumptions for implementing logistic regression 3. Linear relationship of independent variables to log odds: Log odds refer to the ways of expressing probabilities. Log odds are different from probabilities. Odds refer to the ratio of success to failure, while probability refers to the ratio of success to everything that can occur. For example, consider that you play twelve tennis games with your friend. Here, the odds of you winning are 5 to 7 (or 5/7), while the probability of you winning is 5 to 12 (as the total games played = 12). 14 Key Assumptions for implementing logistic regression 4. Prefers large sample size: Logistic regression analysis yields reliable, robust, and valid results when a larger sample size of the dataset is considered. 5. Problem with extreme outliers: This assumption can be verified by calculating Cook’s distance (Di) for each observation to identify influential data points that may negatively affect the regression model. 15 Key Assumptions for implementing logistic regression 5. Problem with extreme outliers: In situations when outliers exist, one can implement the following solutions: 1. Eliminate or remove the outliers 2. Consider a value of mean or median instead of outliers, or 3. Keep the outliers in the model but maintain a record of them while reporting the regression results 16 Key Assumptions for implementing logistic regression 6. Consider independent observations: This assumption states that the dataset observations should be independent of each other. The assumption can be verified by plotting residuals against time, which signifies the order of observations. The plot helps in determining the presence or absence of a random pattern. If a random pattern is present or detected, this assumption may be considered violated. 17 Types of Logistic Regression with Examples 1. Binary logistic regression Binary logistic regression predicts the relationship between the independent and binary dependent variables. Examples: Deciding on whether or not to offer a loan to a bank customer: Outcome = yes or no. Evaluating the risk of cancer: Outcome = high or low. Predicting a team’s win in a football match: Outcome = yes or no. 18 Types of Logistic Regression with Examples 2. Multinomial logistic regression A categorical dependent variable has two or more discrete outcomes in a multinomial regression type. This implies that this regression type has more than two possible outcomes. For Example: Estimating the type of food consumed by pets, the outcome may be wet food, dry food, or junk food. 19 Types of Logistic Regression with Examples 3. Ordinal logistic regression Ordinal logistic regression applies when the dependent variable is in an ordered state (i.e., ordinal). The dependent variable (y) specifies an order with two or more categories or levels. For Example: Predict survey answers: Outcomes = Agree/Disagree/Unsure. 20 Sum Up 21 Questions 22