Lecture 5 Econometrics PDF
Document Details
Uploaded by ReasonableDerivative
Southampton
2024
Nicolas Apfel
Tags
Summary
This lecture notes document details econometrics concepts, specifically Least Squares Prediction. It explains prediction of a dependent variable given a hypothetical value of an independent variable. The document illustrates the use of linear regression models and concepts like forecast error and goodness-of-fit. The document also contains relevant formulas and examples.
Full Transcript
# Lecture 5 ## Saturday, 2 November 2024 ## 12:50 PM ## Least Squares Prediction - Imagine we want to predict the value of *y* given a hypothetical value *x₀*. (Typical real-world application of econometrics) - We assume the SLR model holds, along with SLR1-SLR5, so we can write: * *y₀* = *β₁*...
# Lecture 5 ## Saturday, 2 November 2024 ## 12:50 PM ## Least Squares Prediction - Imagine we want to predict the value of *y* given a hypothetical value *x₀*. (Typical real-world application of econometrics) - We assume the SLR model holds, along with SLR1-SLR5, so we can write: * *y₀* = *β₁* + *β₂* *x₀* + *e₀* (1) - Hence we have: * *y₀* = *E[y₀|x₀]* + *e₀* - Prediction of *y₀* is then: * *ŷ₀* = *E[y₀|x₀]* + *E[e₀]* = *b₁* + *b₂* *x₀* + 0 (2) ## Least Squares Prediction - Define the forecast (or prediction) error: * *f* = *y₀* − *ŷ₀* = (*β₁* + *β₂* *x₀* + *e₀*) − (*b₁* + *b₂* *x₀*) (3) - Since *E[f]* = *β₁* + *β₂* *x₀* + 0 - *E[b₁]* - *E[b₂]* *x₀* = *β₁* + *β₂* *x₀* - *b₁* - *b₂* *x₀* = 0, we have that *ŷ₀* is an unbiased predictor of *y₀*. - We can calculate the variance of the forecast error: * *var(f)* = *σ²* [ 1 + (1/N) * ( *x₀* - *x* )² / Σ ( *x* - *x* )² ] (4) - The variance of the forecast error depends on: * The model uncertainty *σ²* * The sample size *N* * The variance of the regressor * The value of (*x₀* - *x* )² ## Least Squares Prediction - **Estimated forecast error variance** * *var(f)* = *σ²* [ 1 + (1/N) * (*x₀* - *x* )² / Σ ( *x* - *x* )² ] - **Standard Error** * *se(f)* = √ *var(f)* - **Prediction Interval** * *ŷ₀* ± *t* *n-2* *se(f)* ## Measuring Goodness-of-Fit - Recall our DGP * *yᵢ* = *β₁* + *β₂* *xᵢ* + *eᵢ* - And the predicted value implied by our estimation: * *ŷᵢ* = *b₁* + *b₂* *xᵢ* - We can decompose *yᵢ* into our estimate of the systematic and unexplained components: * *yᵢ* = *ŷᵢ* + *êᵢ* - Subtract the sample mean of *y*: * *yᵢ* − *ȳ* = *ŷᵢ* − *ȳ* + *êᵢ* (5) ## Measuring Goodness-of-Fit - Recall the definition of variance of *y*: * *σy²* = Σ (*yᵢ* − *ȳ* )² / (*N* - 1) - By squaring equation (5), taking the sum, and knowing that Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = 0, we get: * Σ (*yᵢ* − *ȳ* )² = Σ (*ŷᵢ* − *ȳ* )² + Σ *êᵢ*² - This gives us a decomposition of the total sample variation in *y* into its explained and unexplained components: * Σ (*yᵢ* − *ȳ* )² = total sum of squares (SST): a measure of total variation in *y* about the sample mean. * Σ (*ŷᵢ* − *ȳ* )² = sum of squares due to the regression (SSR): that part of total variation in *y*, about the sample mean, that is explained by, or due to, the regression. Also known as the explained sum of squares. * Σ *êᵢ*² = sum of squares due to error (SSE): that part of total variation in *y* about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors. * If the intercept is included, we always have that SST = SSR + SSE ## The Sum of Squares Decomposition - (*yᵢ* − *ȳ* )² = [( *ŷᵢ* − *ȳ* ) + *êᵢ* ]² = (*ŷᵢ* − *ȳ* )² + *êᵢ*² + 2 (*ŷᵢ* − *ȳ* ) *êᵢ* - Σ (*yᵢ* − *ȳ* )² = Σ (*ŷᵢ* − *ȳ* )² + Σ *êᵢ*² + 2 Σ (*ŷᵢ* − *ȳ* ) *êᵢ* - Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = Σ *yᵢ* *êᵢ* − *ȳ* Σ *êᵢ* = Σ (*b₁* + *b₂* *xᵢ*) *êᵢ* − *ȳ* Σ *êᵢ* - = *b₁* Σ *êᵢ* + *b₂* Σ *xᵢ* *êᵢ* − *ȳ* Σ *êᵢ* ## The Sum of Squares Decomposition - Σ *êᵢ* = Σ (*yᵢ* − *b₁* − *b₂* *xᵢ*) = Σ *yᵢ* − *Nb₁* − *b₂* Σ *xᵢ* = 0 - Σ *xᵢ* *êᵢ* = Σ *xᵢ* *yᵢ* − *b₁* − *b₂* *xᵢ*) = Σ *xᵢ* *yᵢ* − *b₁* Σ *xᵢ* − *b₂* Σ *xᵢ*² = 0 - hence Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = 0 - If the model contains an intercept it is guaranteed that SST = SSR + SSE. - If, instead, the model does not contain an intercept, then Σ *êᵢ* ≠ 0 and SST ≠ SSR + SSE. ## Coefficient of Determination - *R²* = SSR / SST = 1 - SSE / SST - The closer *R²* is to one, the closer the sample values *yᵢ* are to the fitted regression equation *ŷᵢ* = *b₁* + *b₂* *xᵢ* * If *R²* = 1, then all the sample data fall exactly on the fitted line, so SSE = 0, and the model fits the data "perfectly". * If the sample data for *y* and *x* are uncorrelated and show no linear association, then the least squares fitted line is horizontal, so that SSR=0 and *R²* = 0. * When 0 < *R²* < 1, *R²* is interpreted as the proportion of the variation in *y* about its mean that is explained by the regression model ## Correlation analysis - Correlation coefficient *ρxy* between *x* and *y*: - *ρxy = cov(x, y) / (σx σy)* ## Sample (estimated) correlation coefficient: - *rxy = *ôxy* / (*ôx* *ôy*) - where: * *ôxy* = Σ (*xᵢ* − *x* ) (*yᵢ* − *y* ) / (*N* - 1) * *ôx* = √ Σ (*xᵢ* − *x* )² / (*N* - 1) * *ôy* = √ Σ (*yᵢ* − *y* )² / (*N* - 1) ## Correlation analysis and *R²* - There are two relations between correlation and coefficient of determination: * The first one highlights the existence of a linear relation between two variables: * *y = R²* * The second one holds also for multiple regressions and highlights the closeness between observations and predictions: * *R² = r²* - Since *R²* measures the linear association between the sample data and their predicted values (second relation), *R²* is used as a measure of goodness-of-fit. ## Modeling issues: scaling the data - Changing the scale of *x*: * *y* = *β₁* + *β₂* *x* + *e* = *β₁* + (*cβ₂*) (*x/c*) + *e* = *β₁* + *β₂* *x* + *e* - where *β₁* = *cβ₂* and *x* = *x/c* - Changing the scale of *y*: * *y/c* = (*β₁/c*) + (*β₂/c*) *x* + (*e/c*) or *y* = *β₁* + *β₂* *x* + *e* ## Question: How are the *t*-ratio and *R²* affected? ## Choosing a Functional Form - The SLR model is linear in the parameters but it can accommodate many non-linear relationships between *x* and *y* simply by transforming the variables. - Examples: - Polynomial: if *x* is a variable then *x^p* means raising the variable to the power *p*; examples are quadratic (*x²*) and cubic (*x³*) transformations. - The natural logarithm: if *x* is a variable then its natural logarithm is *ln(x)*. - The reciprocal: if *x* is a variable then its reciprocal is *1/x*. - The log-log model: * *ln(y)* = *β₁* + *β₂* *ln(x)* - The parameter *β₂* is the elasticity of *y* with respect to *x*. - The log-linear model: * *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* - A one-unit increase in *x* leads to a 100 x *β₂* percent change in *y*. - The linear-log model: * *y* = *β₁* + *β₂* *ln(x)* or *Δy/100(Δx/x)* = *β₂*/ 100 - A 1% increase in *x* leads to a *β₂*/100 unit change in *y*. ## Guidelines - Remark: - Given this array of models that involve different transformations of the dependent and independent variables, and some of which have similar shapes, what are some guidelines for choosing a functional form? - Choose a shape that is consistent with what economic theory tells us about the relationship. - Choose a shape that is sufficiently flexible to fit the data. - Choose a shape so that assumptions SLR1-SR6 are satisfied, which ensures that the least squares estimators have the desirable properties we saw in previous lectures. ## Visual inspection of residuals - There are two types of model validation check: - *Estimation output:* incorrect sign or lack of statistical significance for a relevant variable are potential signs of wrong functional form or SLR1-6 assumption not holding. - *Plot of residuals:* visual inspection of estimation residuals is also important and residuals need to be erratic, with no pattern. ## Examples of residuals - Are the regression errors normally distributed? - The Jarque-Bera test statistic is given by: * *JB* = *N* (*S²* + (*K* − 3)² / 6) - where *N* is the sample size, *S* is skewness, and *K* is kurtosis. - Under *H₀* of Normality, *JB* ~ *χ²*. - In the food expenditure example: * *JB* = 40/6 ((-0.097)² + (2.99 - 3)²/4) = 0.063. - Critical value at 5% significance level is *χ²* = 5.99. Since 0.063 < 5.99, we fail to reject *H₀* of Normality. ## The Log-Normal distribution - Suppose that the variable *y* has a normal distribution, with mean *μ* and variance *σ²*. - If we consider *w* = *e^y* then *y* = *ln(w)* ~ *N(μ, σ²)*, so *y* is said to have a log-normal distribution. - The first two moments of the log-normal distribution: * *E[w]* = *e^(μ + σ²/2)* = *e^μ* *e^(σ²/2)* * *var[w]* = *e^(2μ + σ²)* (*e^(σ²)* − 1) - The median: * *Med* = *e^μ* ## The Log-linear model - Given the log-linear model *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* + *e*, if we assume that *e* ~ *N(0, σ²)*, how can we obtain predictions of *y*? - One suboptimal possibility it to simply take: * *ŷₙ* = *exp*(ln(y)) = *exp*(b₁ + b₂x) - A better alternative is to consider the properties of the log-normal distribution: * *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* + *eᵢ* - *yᵢ* = *e^(β₁ + β₂xᵢ + eᵢ)* - *E[yᵢ]* = *E[e^(β₁ + β₂xᵢ + eᵢ)]* = *E[e^(β₁ + β₂xᵢ) e^eᵢ]* = - *e^(β₁ + β₂xᵢ) E[e^eᵢ]* = *e^(β₁ + β₂xᵢ) e^(σ²/2)* = *e^(β₁ + β₂xᵢ + σ²/2)* ## The Log-linear model - How to obtain an estimate of the rate of return in the wage equation? * *r* is a nonlinear function of *β₂*: * *β₂* = *ln(1+r)* * *r* = *e^β₂* − 1 - So, a natural estimate is simply: * *r̂* = *e^b₂* − 1 - Recall that: * *b₂* ~ *N*( *β₂*, *σ²/Σ(xᵢ −x)²*) - So, a corrected estimate is: * *E[e^b₂]* = *e^β₂+var(b₂)/2* = *e^b₂+var(b₂)/2* − 1 - There is a measure of goodness-of-fit that we can use in many contexts (e.g. in a log-linear model): * *R²g* = [*corr(y, ŷc)]² = *r^2* *y, ŷc* - In the example we get: * *R²g* = 0.2577 * *R²g* = [*corr(y, ŷc)]² = (0.4739)² = 0.2246