Logistic Regression Part I: Introduction PDF
Document Details
Uploaded by PleasantFife4062
Indian Institute of Management Kashipur
s.patra
Tags
Summary
This document introduces logistic regression. It covers topics such as customer churn, the concept of odds and log-odds, and the logistic function. The provided text also features a dataset analysis with R code for churn modeling.
Full Transcript
Logistic Regression Part I: Introduction [email protected] Indian Institute of Management Kashipur [email protected] (Indian Institute of Management...
Logistic Regression Part I: Introduction [email protected] Indian Institute of Management Kashipur [email protected] (Indian Institute of Management Logistic Kashipur) Regression 1 / 33 Logistic Regression The logistic regression model describes the relationship between a discrete outcome variable, the “response”, and a set of explanatory variables. The response variable is often binary or dichotomous, although extensions to the model permit multi-category, polytomous outcomes. The explanatory variables may be continuous or (with factor variables) discrete. [email protected] (Indian Institute of Management Logistic Kashipur) Regression 2 / 33 Example: Customer Churn Modelling Customer churn is defined as when customers or subscribers discontinue doing business with a firm or service. Customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers. To reduce customer churn, telecom companies need to predict which customers are at high risk of churn. Consider the Churn_Modelling.csv dataset. Here the variable Exited indicated customer churn where 1 means discontinued customer and 0 means continued customer. Other variables (except CustomerId) are explanatory variables. [email protected] (Indian Institute of Management Logistic Kashipur) Regression 3 / 33 Churn Modelling Dataset churn % mutate_at(c("Geography", "Gender", "HasCrCard", "IsActiveMember", "Exited"), as.factor) %>% select(-1) %>% glimpse() ## Rows: 10,000 ## Columns: 11 ## $ CreditScore 619, 608, 502, 699, 850, 645, 822, 376, 501, 684, 528,~ ## $ Geography France, Spain, France, France, Spain, Spain, France, G~ ## $ Gender Female, Female, Female, Female, Female, Male, Male, Fe~ ## $ Age 42, 41, 42, 39, 43, 44, 50, 29, 44, 27, 31, 24, 34, 25~ ## $ Tenure 2, 1, 8, 1, 2, 8, 7, 4, 4, 2, 6, 3, 10, 5, 7, 3, 1, 9,~ ## $ Balance 0.00, 83807.86, 159660.80, 0.00, 125510.82, 113755.78,~ ## $ NumOfProducts 1, 1, 3, 2, 1, 2, 2, 4, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, ~ ## $ HasCrCard 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, ~ ## $ IsActiveMember 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, ~ ## $ EstimatedSalary 101348.88, 112542.58, 113931.57, 93826.63, 79084.10, 1~ ## $ Exited 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ~ [email protected] (Indian Institute of Management Logistic Kashipur) Regression 4 / 33 Preliminaries: Odds & Log-odds Odds may be familiar from the field of gambling. Odds are often stated as wins to losses (wins:losses), e.g. a one to five chance or ratio of winning is stated as 1:5. Given the probability of exit (p), we can convert it to odds of exit as the probability of exit divided by the probability of not exit: p odds of exit =. 1−p Logarithm of odds is called log-odds. p log odds = log 1−p and therefore, 1 exp(log odds) p= = 1 + exp(−log odds) 1 + exp(log odds) [email protected] (Indian Institute of Management Logistic Kashipur) Regression 5 / 33 Preliminaries: Odds & Log-odds prob_of_exit = seq(0.01, 0.99, length = 5) prob_of_exit ## 0.010 0.255 0.500 0.745 0.990 odds_of_exit