Document Details

ReasonableDerivative

Uploaded by ReasonableDerivative

Southampton

2024

Nicolas Apfel

Tags

econometrics least squares prediction regression analysis statistical modeling

Summary

This lecture notes document details econometrics concepts, specifically Least Squares Prediction. It explains prediction of a dependent variable given a hypothetical value of an independent variable. The document illustrates the use of linear regression models and concepts like forecast error and goodness-of-fit. The document also contains relevant formulas and examples.

Full Transcript

# Lecture 5 ## Saturday, 2 November 2024 ## 12:50 PM ## Least Squares Prediction - Imagine we want to predict the value of *y* given a hypothetical value *x₀*. (Typical real-world application of econometrics) - We assume the SLR model holds, along with SLR1-SLR5, so we can write: * *y₀* = *β₁*...

# Lecture 5 ## Saturday, 2 November 2024 ## 12:50 PM ## Least Squares Prediction - Imagine we want to predict the value of *y* given a hypothetical value *x₀*. (Typical real-world application of econometrics) - We assume the SLR model holds, along with SLR1-SLR5, so we can write: * *y₀* = *β₁* + *β₂* *x₀* + *e₀* (1) - Hence we have: * *y₀* = *E[y₀|x₀]* + *e₀* - Prediction of *y₀* is then: * *ŷ₀* = *E[y₀|x₀]* + *E[e₀]* = *b₁* + *b₂* *x₀* + 0 (2) ## Least Squares Prediction - Define the forecast (or prediction) error: * *f* = *y₀* − *ŷ₀* = (*β₁* + *β₂* *x₀* + *e₀*) − (*b₁* + *b₂* *x₀*) (3) - Since *E[f]* = *β₁* + *β₂* *x₀* + 0 - *E[b₁]* - *E[b₂]* *x₀* = *β₁* + *β₂* *x₀* - *b₁* - *b₂* *x₀* = 0, we have that *ŷ₀* is an unbiased predictor of *y₀*. - We can calculate the variance of the forecast error: * *var(f)* = *σ²* [ 1 + (1/N) * ( *x₀* - *x* )² / Σ ( *x* - *x* )² ] (4) - The variance of the forecast error depends on: * The model uncertainty *σ²* * The sample size *N* * The variance of the regressor * The value of (*x₀* - *x* )² ## Least Squares Prediction - **Estimated forecast error variance** * *var(f)* = *σ²* [ 1 + (1/N) * (*x₀* - *x* )² / Σ ( *x* - *x* )² ] - **Standard Error** * *se(f)* = √ *var(f)* - **Prediction Interval** * *ŷ₀* ± *t* *n-2* *se(f)* ## Measuring Goodness-of-Fit - Recall our DGP * *yᵢ* = *β₁* + *β₂* *xᵢ* + *eᵢ* - And the predicted value implied by our estimation: * *ŷᵢ* = *b₁* + *b₂* *xᵢ* - We can decompose *yᵢ* into our estimate of the systematic and unexplained components: * *yᵢ* = *ŷᵢ* + *êᵢ* - Subtract the sample mean of *y*: * *yᵢ* − *ȳ* = *ŷᵢ* − *ȳ* + *êᵢ* (5) ## Measuring Goodness-of-Fit - Recall the definition of variance of *y*: * *σy²* = Σ (*yᵢ* − *ȳ* )² / (*N* - 1) - By squaring equation (5), taking the sum, and knowing that Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = 0, we get: * Σ (*yᵢ* − *ȳ* )² = Σ (*ŷᵢ* − *ȳ* )² + Σ *êᵢ*² - This gives us a decomposition of the total sample variation in *y* into its explained and unexplained components: * Σ (*yᵢ* − *ȳ* )² = total sum of squares (SST): a measure of total variation in *y* about the sample mean. * Σ (*ŷᵢ* − *ȳ* )² = sum of squares due to the regression (SSR): that part of total variation in *y*, about the sample mean, that is explained by, or due to, the regression. Also known as the explained sum of squares. * Σ *êᵢ*² = sum of squares due to error (SSE): that part of total variation in *y* about its mean that is not explained by the regression. Also known as the unexplained sum of squares, the residual sum of squares, or the sum of squared errors. * If the intercept is included, we always have that SST = SSR + SSE ## The Sum of Squares Decomposition - (*yᵢ* − *ȳ* )² = [( *ŷᵢ* − *ȳ* ) + *êᵢ* ]² = (*ŷᵢ* − *ȳ* )² + *êᵢ*² + 2 (*ŷᵢ* − *ȳ* ) *êᵢ* - Σ (*yᵢ* − *ȳ* )² = Σ (*ŷᵢ* − *ȳ* )² + Σ *êᵢ*² + 2 Σ (*ŷᵢ* − *ȳ* ) *êᵢ* - Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = Σ *yᵢ* *êᵢ* − *ȳ* Σ *êᵢ* = Σ (*b₁* + *b₂* *xᵢ*) *êᵢ* − *ȳ* Σ *êᵢ* - = *b₁* Σ *êᵢ* + *b₂* Σ *xᵢ* *êᵢ* − *ȳ* Σ *êᵢ* ## The Sum of Squares Decomposition - Σ *êᵢ* = Σ (*yᵢ* − *b₁* − *b₂* *xᵢ*) = Σ *yᵢ* − *Nb₁* − *b₂* Σ *xᵢ* = 0 - Σ *xᵢ* *êᵢ* = Σ *xᵢ* *yᵢ* − *b₁* − *b₂* *xᵢ*) = Σ *xᵢ* *yᵢ* − *b₁* Σ *xᵢ* − *b₂* Σ *xᵢ*² = 0 - hence Σ (*ŷᵢ* − *ȳ* ) *êᵢ* = 0 - If the model contains an intercept it is guaranteed that SST = SSR + SSE. - If, instead, the model does not contain an intercept, then Σ *êᵢ* ≠ 0 and SST ≠ SSR + SSE. ## Coefficient of Determination - *R²* = SSR / SST = 1 - SSE / SST - The closer *R²* is to one, the closer the sample values *yᵢ* are to the fitted regression equation *ŷᵢ* = *b₁* + *b₂* *xᵢ* * If *R²* = 1, then all the sample data fall exactly on the fitted line, so SSE = 0, and the model fits the data "perfectly". * If the sample data for *y* and *x* are uncorrelated and show no linear association, then the least squares fitted line is horizontal, so that SSR=0 and *R²* = 0. * When 0 < *R²* < 1, *R²* is interpreted as the proportion of the variation in *y* about its mean that is explained by the regression model ## Correlation analysis - Correlation coefficient *ρxy* between *x* and *y*: - *ρxy = cov(x, y) / (σx σy)* ## Sample (estimated) correlation coefficient: - *rxy = *ôxy* / (*ôx* *ôy*) - where: * *ôxy* = Σ (*xᵢ* − *x* ) (*yᵢ* − *y* ) / (*N* - 1) * *ôx* = √ Σ (*xᵢ* − *x* )² / (*N* - 1) * *ôy* = √ Σ (*yᵢ* − *y* )² / (*N* - 1) ## Correlation analysis and *R²* - There are two relations between correlation and coefficient of determination: * The first one highlights the existence of a linear relation between two variables: * *y = R²* * The second one holds also for multiple regressions and highlights the closeness between observations and predictions: * *R² = r²* - Since *R²* measures the linear association between the sample data and their predicted values (second relation), *R²* is used as a measure of goodness-of-fit. ## Modeling issues: scaling the data - Changing the scale of *x*: * *y* = *β₁* + *β₂* *x* + *e* = *β₁* + (*cβ₂*) (*x/c*) + *e* = *β₁* + *β₂* *x* + *e* - where *β₁* = *cβ₂* and *x* = *x/c* - Changing the scale of *y*: * *y/c* = (*β₁/c*) + (*β₂/c*) *x* + (*e/c*) or *y* = *β₁* + *β₂* *x* + *e* ## Question: How are the *t*-ratio and *R²* affected? ## Choosing a Functional Form - The SLR model is linear in the parameters but it can accommodate many non-linear relationships between *x* and *y* simply by transforming the variables. - Examples: - Polynomial: if *x* is a variable then *x^p* means raising the variable to the power *p*; examples are quadratic (*x²*) and cubic (*x³*) transformations. - The natural logarithm: if *x* is a variable then its natural logarithm is *ln(x)*. - The reciprocal: if *x* is a variable then its reciprocal is *1/x*. - The log-log model: * *ln(y)* = *β₁* + *β₂* *ln(x)* - The parameter *β₂* is the elasticity of *y* with respect to *x*. - The log-linear model: * *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* - A one-unit increase in *x* leads to a 100 x *β₂* percent change in *y*. - The linear-log model: * *y* = *β₁* + *β₂* *ln(x)* or *Δy/100(Δx/x)* = *β₂*/ 100 - A 1% increase in *x* leads to a *β₂*/100 unit change in *y*. ## Guidelines - Remark: - Given this array of models that involve different transformations of the dependent and independent variables, and some of which have similar shapes, what are some guidelines for choosing a functional form? - Choose a shape that is consistent with what economic theory tells us about the relationship. - Choose a shape that is sufficiently flexible to fit the data. - Choose a shape so that assumptions SLR1-SR6 are satisfied, which ensures that the least squares estimators have the desirable properties we saw in previous lectures. ## Visual inspection of residuals - There are two types of model validation check: - *Estimation output:* incorrect sign or lack of statistical significance for a relevant variable are potential signs of wrong functional form or SLR1-6 assumption not holding. - *Plot of residuals:* visual inspection of estimation residuals is also important and residuals need to be erratic, with no pattern. ## Examples of residuals - Are the regression errors normally distributed? - The Jarque-Bera test statistic is given by: * *JB* = *N* (*S²* + (*K* − 3)² / 6) - where *N* is the sample size, *S* is skewness, and *K* is kurtosis. - Under *H₀* of Normality, *JB* ~ *χ²*. - In the food expenditure example: * *JB* = 40/6 ((-0.097)² + (2.99 - 3)²/4) = 0.063. - Critical value at 5% significance level is *χ²* = 5.99. Since 0.063 < 5.99, we fail to reject *H₀* of Normality. ## The Log-Normal distribution - Suppose that the variable *y* has a normal distribution, with mean *μ* and variance *σ²*. - If we consider *w* = *e^y* then *y* = *ln(w)* ~ *N(μ, σ²)*, so *y* is said to have a log-normal distribution. - The first two moments of the log-normal distribution: * *E[w]* = *e^(μ + σ²/2)* = *e^μ* *e^(σ²/2)* * *var[w]* = *e^(2μ + σ²)* (*e^(σ²)* − 1) - The median: * *Med* = *e^μ* ## The Log-linear model - Given the log-linear model *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* + *e*, if we assume that *e* ~ *N(0, σ²)*, how can we obtain predictions of *y*? - One suboptimal possibility it to simply take: * *ŷₙ* = *exp*(ln(y)) = *exp*(b₁ + b₂x) - A better alternative is to consider the properties of the log-normal distribution: * *ln(yᵢ)* = *β₁* + *β₂* *xᵢ* + *eᵢ* - *yᵢ* = *e^(β₁ + β₂xᵢ + eᵢ)* - *E[yᵢ]* = *E[e^(β₁ + β₂xᵢ + eᵢ)]* = *E[e^(β₁ + β₂xᵢ) e^eᵢ]* = - *e^(β₁ + β₂xᵢ) E[e^eᵢ]* = *e^(β₁ + β₂xᵢ) e^(σ²/2)* = *e^(β₁ + β₂xᵢ + σ²/2)* ## The Log-linear model - How to obtain an estimate of the rate of return in the wage equation? * *r* is a nonlinear function of *β₂*: * *β₂* = *ln(1+r)* * *r* = *e^β₂* − 1 - So, a natural estimate is simply: * *r̂* = *e^b₂* − 1 - Recall that: * *b₂* ~ *N*( *β₂*, *σ²/Σ(xᵢ −x)²*) - So, a corrected estimate is: * *E[e^b₂]* = *e^β₂+var(b₂)/2* = *e^b₂+var(b₂)/2* − 1 - There is a measure of goodness-of-fit that we can use in many contexts (e.g. in a log-linear model): * *R²g* = [*corr(y, ŷc)]² = *r^2* *y, ŷc* - In the example we get: * *R²g* = 0.2577 * *R²g* = [*corr(y, ŷc)]² = (0.4739)² = 0.2246

Use Quizgecko on...
Browser
Browser