Econometrics 2 - Qualitative and Limited Dependent Variable Models PDF
Document Details
Uploaded by Deleted User
University of Southeastern Philippines
Feby Kirstine A. Evangelio
Tags
Summary
These lecture notes cover Econometrics 2, focusing on qualitative and limited dependent variable models. Different models, including Linear Probability Models, Probit, Logit, and Tobit, are explored. The material is geared toward an economics course at the university level.
Full Transcript
UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Econometrics 2 1. Qualitative and Limited Dependent Variable Models Instructor: Feby Kirstine A. Evangelio [email protected] UNIVERSITY OF SOUTHEASTERN PHILIPPINE...
UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Econometrics 2 1. Qualitative and Limited Dependent Variable Models Instructor: Feby Kirstine A. Evangelio [email protected] UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Qualitative and Limited Dependent Variable Models Qualitative: dependent variable is in a form of alternatives. Limited: dependent variable is in continuous form, however, their values are not completely observable. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Economic Application Example: An economic model explaining why some individuals take a second or third job and engage in “moonlighting”. 1 if it engages with moonlighting 𝑦! = # 0 otherwise An economic model of why some legislators in the U.S. House of Representatives vote for a particular bill and others do not. An economic model explaining why some loan applications are accepted and others are not at a large metropolitan bank. An economic model explaining why some individuals vote for increased spending in a school board election and others vote against. An economic model explaining why some female college students decide to study engineering and others do not. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Binary Choice Models Review on Regression Model – Interest on E(𝑦|𝑥) Let 𝑥4 = 𝑥45 = 1, 𝑥46, 𝑥47, … , 𝑥48 with 𝑘 number of independent variables Then P 𝑦4 = 1 𝑥4 = p(𝑥4 ) and P 𝑦4 = 0 𝑥4 = 1 − p(𝑥4 ) (complementary probability) Proof: The conditional probability function for 𝑦4 is 𝑓 𝑦4 𝑥4 = p(𝑥4 )9! 1 − p(𝑥4 ) 6:9! where 𝑦4 = {0,1} UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Binary Choice Models 𝑓 𝑦4 𝑥4 = p(𝑥4 )9! 1 − p(𝑥4 ) 6:9! where 𝑦4 = 0,1 Following this, 𝑓 1 𝑥4 = p(𝑥4 )6 1 − p(𝑥4 ) 6:6 𝑓 0 𝑥4 = p(𝑥4 )5 1 − p(𝑥4 ) 6:5 𝑓 1 𝑥4 = p(𝑥4 )6 1 − p(𝑥4 ) 5 𝑓 0 𝑥4 = 1 − p(𝑥4 ) 6 𝑓 1 𝑥4 = p(𝑥4 ) 𝑓 0 𝑥4 = 1 − p(𝑥4 ) Now, how are we going to approximate P 𝑦4 = 1 𝑥4 (or 𝑓 1 𝑥4 or p(𝑥4 ))? UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Transportation Economics Case: How do individuals decide between driving and commuting to work? Assumption: there are only two alternatives Possible factors Individual characteristics: age, sex, income Automobile characteristics: reliability, comfort, fuel economy Public transportation characteristics: reliability, cost, safety UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Transportation Economics Case: Single factor: commuting time as 𝑥4 𝑥4 = commuting time by bus – commuting time by car, for the 𝑖 ;< individual 𝑥4 = 𝑡𝑖𝑚𝑒=>?,4 − 𝑡𝑖𝑚𝑒ABC,4 a priori: we expect that as 𝑥4 increases, commuting time by bus is greater than commuting time by car, holding all else constant, an individual would be more inclined to drive. Positive relationship between the difference of commuting time and probability that an individual will drive to work: increase in 𝑥! à increase in p 𝑦! = 1 𝑥! UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Qualitative and Limited Dependent Variable Models A. Dummy Dependent Variable 1. Linear Probability Model (LPM) 2. Probit Model 3. Logit Model 4. Multinomial Logit B. Censored Dependent Variable 5. Tobit Model UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Linear Probability Model (LPM) Conditional Expected value E 𝑦4 𝑥4 = ∑64D5 𝑥4 p(𝑥4 ) E 𝑦4 𝑥4 = 0 8 𝑓 0 𝑥4 + 1 8 𝑓 1 𝑥4 E 𝑦4 𝑥4 = 0 8 1 − p(𝑥4 ) + 1 8 p(𝑥4 ) E 𝑦4 𝑥4 = p(𝑥4 ) Following this, p 𝑥4 = E 𝑦4 𝑥4 = 𝛽5 + ∑8ED6 𝛽E 𝑥4E = 𝛽5 + 𝛽6𝑥46 + 𝛽7𝑥47 + ⋯ + 𝛽8 𝑥48 UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Linear Probability Model (LPM) Let 𝑒4 the random error (observed outcome – conditional mean) 𝑒4 = 𝑦4 − E(𝑦4 |𝑥4 ) Then, 𝑦4 = E 𝑦4 𝑥4 + 𝑒4 𝑦4 = 𝛽5 + ∑8ED6 𝛽E 𝑥4E + 𝑒4 and E 𝑒4 𝑥4 = 0 (least square estimator parameters are unbiased) E(𝑒4 𝑥4 ) = 0 (least square estimator parameters are consistent) UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Marginal effect of LPM For a continuous variable 𝑥4E , 𝑗 = 1,2, … , 𝑘 the marginal effect of LPM is given by 𝜕E 𝑦4 𝑥4 𝑀𝐸FGH = 𝛽E = 𝜕𝑥4E UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Problems of LPM: 1. Logical inconsistencies. Suppose 𝛽E > 0, then we represent A unit increase in 𝑥4E leads to 𝛽E increase in p(𝑥4 ) where 𝛽E is constant (slope of a line), which means there is a possibility of greater than 1 probability when 𝑥4E gets larger. However, recall that we can only have 0 ≤ p(𝑥4 ) ≤ 1. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Problems of LPM: 2. The 𝑦4 , E 𝑦4 𝑥4 and 𝑒4 take only two values: 𝑦! = 0, 1 E 𝑦! 𝑥! = 0, 1 𝑒! = {−E 𝑦! 𝑥! , 1 − E 𝑦! 𝑥! } Why? For 𝑦! = 0,1 , then 𝛽" + ∑&#$% 𝛽# 𝑥!# = 0,1 That is, if 𝑦! = 1, then 𝛽" + ∑&#$% 𝛽# 𝑥!# = 1 and 𝑦! = 0, then 𝛽" + ∑&#$% 𝛽# 𝑥!# = 0 UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Problems of LPM: 2. The 𝑦4 , E 𝑦4 𝑥4 and 𝑒4 take only two values: 𝑦4 = 0, 1 E 𝑦4 𝑥4 = 0, 1 𝑒4 = {−E 𝑦4 𝑥4 , 1 − E 𝑦4 𝑥4 } Implication on the random error 𝑒4 For 𝑦4 = 0, then 𝑒4 = 0 − 𝛽5 + ∑8ED6 𝛽E 𝑥4E 𝑒4 = −𝛽5 + ∑8ED6 𝛽E 𝑥4E and 𝑦4 = 1, then 𝑒4 = 1 − 𝛽5 + ∑8ED6 𝛽E 𝑥4E UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Problems of LPM 3. Conditional variance in the random error is heteroskedastic. var 𝑒4 𝑥4 = p 𝑥4 1 − p 𝑥4 = 𝜎47 UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics To solve the problems of LPM, it needs transformation so that 𝑝(𝑥4 ) will lie on the interval [0,1]. One of the suitable distribution for this is cumulative normal function. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Probit Model The probability distribution p 𝑥4 = 𝑃4 = F(𝑍4 ) then the probit model is given by 8 𝑍4 = 𝛽5 + K 𝛽E 𝑥4E + 𝑒4 ED6 𝑍 is the standard normal distribution with mean of 0 and variance of 1, F(8) is the cumulative density function using the standardized 𝑍 table. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Marginal effect of Probit Model The marginal effect of variable 𝑥4E is given by 𝜕𝑃4 = 𝛽E 8 F(𝑍4 ) 𝜕𝑥4E IG where IJ ! is the marginal effect of variable 𝑥4E , !" 𝛽E is the parameter of variable 𝑥4E F(𝑍4 ) is the cumulative probability computed at the means of 𝑥4E UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Comparison: Probit and Logit Models The cumulative logistic distribution is used as the underlying distribution for the model. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Logit Model 6 G Logistic distribution: 𝑃4 = 6KL #$! → 𝑍4 = ln 6:G! ! then, the logit model is given by 8 𝑃4 ln = 𝛽5 + K 𝛽E 𝑥4E + 𝑒4 1 − 𝑃4 ED6 where 𝑃4 is the probability of an event happening, 1 − 𝑃4 is the probability of an event not happening. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Marginal effect of Logit Model The marginal effect of variable 𝑥4E is given by 𝜕𝑃4 = 𝛽E 𝑃4 (1 − 𝑃4 ) 𝜕𝑥4E IG where IJ ! is the marginal effect of variable 𝑥4E , !" 6 𝑃4 = is the probability, 6KL #$! 𝑍4 is computed at the means of 𝑥4E UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Estimation methods Ordinary Least Squares (OLS) – linear The ordinary least square minimizes the square of the residuals. Maximum Likelihood Estimation (MLE) – probabilistic The maximum likelihood estimation method maximizes the probability of observing the dataset given a model and its parameters. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Model comparison Pseudo R2 : goodness of fit of the model Pseudo R2 fit statistics for generalized linear models take on similar values to their ordinary least squares counterparts, but are based on maximum likelihood estimates instead of sums of squares. In general, higher values indicate that the model is better at discriminating. Information Criterion : estimation of prediction error Information Criterion is a method used to select the best model from a set of models by maximizing the likelihood of the data while penalizing the number of parameters to prevent overfitting. Akaike Information Criterion (AIC) or Bayes Information Criterion (BIC) AIC is an estimate of a constant plus the relative distance between the unknown true likelihood function of the data and the fitted likelihood function of the model, whereas BIC is an estimate of a function of the posterior probability of a model being true, under a certain Bayesian setup. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Multinomial Choices In probit and logit models, the decision maker chooses between two alternatives. But what if the choices involve more than two alternatives? Multinomial choice economic application examples: If you are shopping for a laundry detergent, which one do you choose? Tide, Ariel, Breeze, Surf, and so on. The consumer is faced with a wide array of alternatives. Marketing researchers relate these choices to prices of the alternatives, advertising, and product characteristics. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Multinomial Choices If you enroll in the business school, will you major in economics, marketing, management, finance, or accounting? 0 − economics 1 − marketing 𝑦! = 2 − management 3 − >inance 4 − accounting If you are going to a mall on a shopping spree, which mall will you go to, and why? When you graduated from high school, you had to choose between not going to college and going to a private four-year college, a public four-year college, or a two-year college. What factors led to your decision among these alternatives? UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Multinomial Logit Model Multinomial logit is a technique used when there are more than 2 categories for the dummy variable: 8 ∗ E 𝑌4 = 𝛽5 + K 𝛽E 𝑋4E = 𝑍4 ED6 where 𝑌 ∗ is an index with values = 0, 1, 2, 3, … Unordered Categories Adopt different technologies without ranking the preference Ordered Categories (Ordered Logit Model) High, Medium, Low UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Multinomial Logit Model Dummy variables with 𝑚 categories require the calculations of 𝑚 − 1 equations Reference category – usually the one with most frequency probability of the event happening 𝑃 𝑌4 = 𝑚 8 ln = 𝛽N + K 𝛽NE 𝑋4E = 𝑍N4 𝑃 𝑌4 = 0 ED6 probability of the event not happening UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Multinomial Logit Model Computation of Probabilities for Unordered Logit Reference category – usually the one with most frequency 1 𝑃 𝑌4 = 0 = 1 + ∑N 0 𝑌4 = S ED6 0 if 𝑌4 ∗ ≤ 0 where 𝑌4 ∗ is observed if 𝑌4 ∗ > 0 and unobserved if 𝑌4 ∗ ≤ 0 UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Limited Dependent Variable: Tobit Model Generally, the dependent variable can be either left-censored, right-censored, or both left-censored and right-censored, where the lower and/or upper limit of the dependent variable can be any number: 𝑎 if 𝑌4 ∗ ≤ 𝑎 8 𝑌4 = 𝑌4 ∗ = 𝛽5 + K 𝛽E 𝑋4E + 𝜀4 if 𝑎 < 𝑌4 ∗ < 𝑏 ED6 𝑏 if 𝑌4 ∗ ≥ 𝑏 where 𝑎 is the lower limit and 𝑏 is the lower limit of the dependent variable 𝑌4. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Limited Dependent Variable: Tobit Model Computation of Probability of 𝒀𝒊 = 𝟎 𝑍4 𝑃4 = 𝐹 − 𝜎 which is similar to probit but the 𝑍-value is divided by the standard deviation 𝜎. UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Marginal Effect of Tobit Model The marginal effect of variable 𝑥4E is given by 𝜕𝐸(𝑦4 |𝑥4 ) 𝛽6 + 𝛽7𝑥4 = 𝛽7 8 F 𝜕𝑥4 𝜎 where F is the cumulative distribution function (cdf) of the standard normal random variable that is evaluated at the estimates and a particular x-value. Because the cdf values are positive, the sign of the coefficient tells the direction of the marginal effect, but the magnitude of the marginal effect depends on both the coefficient and the cdf. If 𝛽7 > 0, as 𝑥 increases, the cdf function approaches one, and the slope of the regression function approaches that of the latent variable model (see figure in the next slide). UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics UNIVERSITY OF SOUTHEASTERN PHILIPPINES College of Applied Economics Marginal Effect of Tobit Model The marginal effect can be decomposed into two factors called the ‘‘McDonald- Moffit’’ decomposition: 𝜕𝐸(𝑦4 |𝑥4 ) 𝜕𝐸 𝑦4 𝑥4 , 𝑦 > 0 𝜕𝑃 𝑦4 > 0 = 𝑃 𝑦4 > 0 8 + 𝐸 𝑦4 𝑥4 , 𝑦 > 0 8 𝜕𝑥4 𝜕𝑥4 𝜕𝑥4 The first factor accounts for the marginal effect of a change in 𝑥 for the portion of the population whose 𝑦 -data is observed already. The second factor accounts for changes in the proportion of the population who switch from the 𝑦 -unobserved category to the y-observed category when 𝑥 changes.