Week 1 (ii) Lecture PDF - Linear Regression

Lecture 1: A brief overview of the classical linear regression model Essential reading: Chapter 3 in Brooks. Dr Artur SemeyutinBIE0014: Econometrics Huddersfield Business School w/c 16/01/2023Dr Artur Semeyutin (BIE0014) OLS Business School1 / 37 Regression Regression is probably the single most important tool at the econometrician’s disposal. But what is regression analysis? It is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables (usually known as the independent variable(s)). Dr Artur Semeyutin (BIE0014) OLS Business School2 / 37 Some Notation Denote the dependent variable by yand the independent variable(s) by x 1, x 2, ..., x k where there are kindependent variables. Some alternative names for the yand xvariables: y x dependent variable independent variables regressand regressors effect variable causal variables explained variable explanatory variabl Note that there can be many xvariables but we will limit ourselves to the case where there is only one xvariable to start with. In our set-up, there is only one yvariable. Dr Artur Semeyutin (BIE0014) OLS Business School3 / 37 Regression is different from Correlation If we say yand xare correlated, it means that we are treating yand xin a completely symmetrical way. In regression, we treat the dependent variable ( y) and the independent variable(s) ( x’s) very differently. The yvariable is assumed to be random or “stochastic” in some way, i.e. to have a probability distribution. The x variables are, however, assumed to have fixed (Ωon-stochastic”) values in repeated samples. Dr Artur Semeyutin (BIE0014) OLS Business School4 / 37 Simple Regression For simplicity, say k=1. This is the situation where ydepends on only one x variable. Examples of the kind of relationship that may be of interest include: –How asset returns vary with their level of market risk –Measuring the long-term relationship between stock prices and dividends. –Constructing an optimal hedge ratio Dr Artur Semeyutin (BIE0014) OLS Business School5 / 37 Simple Regression: An Example Suppose that we have the following data on the excess returns on a fund manager’s portfolio (fiund XXX”) together with the excess returns on a market index: Year,tExcess return Excess return on market index = r XXX ,t − rf t = rm t− rf t 1 17.8 13.7 2 39.0 23.2 3 12.8 6.9 4 24.2 16.8 5 17.2 12.3 We have some intuition that the beta on this fund is positive, and we therefore want to find whether there appears to be a relationship between x and ygiven the data that we have. The first stage would be to form a scatter plot of the two variables. Dr Artur Semeyutin (BIE0014) OLS Business School6 / 37 Graph (Scatter Diagram) Dr Artur Semeyutin (BIE0014) OLS Business School7 / 3745 4035 30 252015 10 5 0 05 1015 2520 Excess return on fund XXX Excess return on market portfolio Finding a Line of Best Fit We can use the general equation for a straight line, y= a+ bx to get the line that best “fits” the data. However, this equation (y= a+ bx) is completely deterministic. Is this realistic? No. So what we do is to add a random disturbance term, u into the equation. yt = α+ βx t + u t where t= 1,2,3,4,5 Dr Artur Semeyutin (BIE0014) OLS Business School8 / 37 Why do we include a Disturbance term? The disturbance term can capture a number of features: –We always leave out some determinants of y t –There may be errors in the measurement of y t that cannot bemodelled. –Random outside influences on y t which we cannot model Dr Artur Semeyutin (BIE0014) OLS Business School9 / 37 Determining the Regression Coefficients So how do we determine what αand βare? Choose αand βso that the (vertical) distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible): Dr Artur Semeyutin (BIE0014) OLS Business School10 / 37 x y Ordinary Least Squares The most common method used to fit a line to the data is known as OLS (ordinary least squares). What we actually do is take each distance and square it (i.e. take the area of each of the squares in the diagram) and minimise the total sum of the squares (hence least squares). Tightening up the notation, let yt denote the actual data point t ˆ y t denote the fitted value from the regression line ˆ u t denote the residual, y t − ˆ y t Dr Artur Semeyutin (BIE0014) OLS Business School11 / 37 Actual and Fitted Value Dr Artur Semeyutin (BIE0014) OLS Business School12 / 37x y ût y t x t y t ˆ How OLS Works So min. ˆ u 12 + ˆ u 22 + ˆ u 32 + ˆ u 42 + ˆ u 52 , or minimise P 5 t =1 ˆ u t2 . This is known as the residual sum of squares. But what was ˆ u t? It was the difference between the actual point and the line, y t − ˆ y t. So minimising P (y t − ˆ y t) 2 is equivalent to minimising P ˆ u t2 with respect to ˆ α and ˆ β . Dr Artur Semeyutin (BIE0014) OLS Business School13 / 37 Deriving the OLS Estimator But ˆ y t = ˆ α+ ˆ β x t , so let L = T X t =1 ( y t − ˆ y t) 2 = T X t =1 ( y t − ˆ α − ˆ β x t) 2 . Want to minimise Lwith respect to (w.r.t.) ˆ αand ˆ β , so differentiate L w.r.t. ˆ αand ˆ β ∂L ∂ ˆ α = −2X t ( y t − ˆ α − ˆ β x t) = 0(1) ∂ L ∂ ˆ β = −2X t x t( y t − ˆ α − ˆ β x t) = 0(2) From (1), X t ( y t − ˆ α − ˆ β x t) = 0 ⇔X yt − Tˆ α − ˆ β X xt = 0 Dr Artur Semeyutin (BIE0014) OLS Business School14 / 37 Deriving the OLS Estimator (Cont’d)But P yt = T¯ y and P xt = T¯ x . So we can write T¯ y − Tˆ α − Tˆ β ¯ x = 0 or ¯ y− ˆ α − ˆ β ¯ x = 0 (3) From (2), X t x t( y t − ˆ α − ˆ β x t) = 0(4) From (3), ˆ α = ¯ y− ˆ β ¯ x (5)Dr Artur Semeyutin (BIE0014) OLS Business School15 / 37 Deriving the OLS Estimator (Cont’d)Substitute into (4) for ˆ αfrom (5), X t x t( y t − ¯ y + ˆ β ¯ x − ˆ β x t) = 0 X t x ty t − ¯ y X xt + ˆ β ¯ x X xt − ˆ β X x2 t = 0 X t x ty t − T¯ x ¯ y + ˆ β T ¯ x 2 − ˆ β X x2 t = 0 Rearranging for ˆ β , ˆ β T ¯ x 2 − X x2 t = T x y − X xty t So overall we have ˆ β = X xty t − T x y X x2 t − T¯ x 2 and ˆ α = ¯ y− ˆ β ¯ x This method of finding the optimum is known as ordinary least squares. Dr Artur Semeyutin (BIE0014) OLS Business School16 / 37 What do We Use ˆ α and ˆ β For? In the CAPM example used above, plugging the 5 observations in to make up the formulae given above would lead to the estimates ˆ α = −1.74 and ˆ β = 1 .64. We would write the fitted line as: ˆ y t = −1.74 + 1 .64 x t Question: If an analyst tells you that she expects the market to yield a return 20% higher than the risk-free rate next year, what would you expect the return on fund XXX to be? Solution: We can say that the expected value of y= ‘ −1.74 + 1.64 × value of x’, so plug x= 20 into the equation to get the expected value for y : ˆ y t = −1.74 + 1 .64 ×20 = 31 .06 Dr Artur Semeyutin (BIE0014) OLS Business School17 / 37 Accuracy of Intercept Estimate Care needs to be exercised when considering the intercept estimate, particularly if there are no or few observations close to the y-axis: Dr Artur Semeyutin (BIE0014) OLS Business School18 / 37 x y 0 The Population and the Sample The population is the total collection of all objects or people to be studied, for example, Interested in Population of interest predicting outcome the entire electorate of an election A sample is a selection of just some items from the population. A random sample is a sample in which each individual item in the population is equally likely to be drawn. Dr Artur Semeyutin (BIE0014) OLS Business School19 / 37 The DGP and the PRF The population regression function (PRF) is a description of the model that is thought to be generating the actual data and the true relationship between the variables (i.e. the true values of αand β). The PRF is y t = α+ βx t + u t The SRF is ˆ y t = ˆ α+ ˆ β x t and we also know that ˆ u t = y t − ˆ y t. We use the SRF to infer likely values of the PRF. We also want to know how “good” our estimates of αand βare. Dr Artur Semeyutin (BIE0014) OLS Business School20 / 37 Linearity In order to use OLS, we need a model which is linear in the parameters ( α and β). It does not necessarily have to be linear in the variables ( yand x). Linear in the parameters means that the parameters are not multiplied together, divided, squared or cubed etc. Some models can be transformed to linear ones by a suitable substitution or manipulation, e.g. the exponential regression model yt = eα X β t eu t ⇔ lnY t= α+ βln X t+ u t Then let y t = ln Y t and x t = ln X t y t = α+ βx t + u t Dr Artur Semeyutin (BIE0014) OLS Business School21 / 37 Linear and Non-linear Models This is known as the exponential regression model. Here, the coefficients can be interpreted as elasticities. Similarly, if theory suggests that yand xshould be inversely related: y t = α+ β x t + u t then the regression can be estimated using OLS by substituting zt = 1 x t But some models are intrinsically non-linear, e.g. yt = α+ xβ t + u t Dr Artur Semeyutin (BIE0014) OLS Business School22 / 37 Estimator or Estimate? Estimators are the formulae used to calculate the coefficients. Estimates are the actual numerical values for the coefficients. Dr Artur Semeyutin (BIE0014) OLS Business School23 / 37 The Assumptions Underlying the Classical Linear Regression Model (CLRM) The model which we have used is known as the classical linear regression model. We observe data for x t, but since y t also depends on u t, we must be specific about how the u t are generated. We usually make the following set of assumptions about the u t’s (the unobservable error terms): Technical notation Interpretation (1) E(u t) = 0 The errors have zero mean (2) var(u t) = σ2 The variance of the errors is constant and finite over all values of x t (3) cov(u i, u j) = 0 The errors are linearly independent of one another (4) cov(u t, x t) = 0 There is no relationship between the error and corresponding xvariate Dr Artur Semeyutin (BIE0014) OLS Business School24 / 37 The Assumptions Underlying the Classical Linear Regression Model (CLRM) (Cont’d)An alternative assumption to (4), which is slightly stronger, is that the x t’s are non-stochastic or fixed in repeated samples. A fifth assumption is required if we want to make inferences about the population parameters (the actual αand β) from the sample parameters ( ˆ α and ˆ β ) Additional assumption (5) u t is normally distributed Dr Artur Semeyutin (BIE0014) OLS Business School25 / 37 Properties of the OLS Estimator If assumptions (1) through (4) hold, then the estimators and determined by OLS are known as Best Linear Unbiased Estimators (BLUE). What does the acronym stand for? ‘Estimator’ – ˆ αand ˆ β are estimators of the true value of αand β ‘Linear’ – ˆ αand ˆ β are linear estimators ‘Unbiased’ – on average, the actual values of ˆ αand ˆ β will be equal to their true values ‘Best’ – means that the OLS estimator ˆ β has minimum variance among the class of linear unbiased estimators; the Gauss–Markov theorem proves that the OLS estimator is best. Dr Artur Semeyutin (BIE0014) OLS Business School26 / 37 Consistency/Unbiasedness/Efficiency Consistent The least squares estimators ˆ αand ˆ β are consistent. That is, the estimates will converge to their true values as the sample size increases to infinity. Need the assumptions E(x tu t) = 0 and Var(u t) = σ2 < ∞ to prove this. Consistency implies that lim T →∞ Pr [ |ˆ β − β|> δ ] = 0 ∀δ > 0 Unbiased The least squares estimates of ˆ αand ˆ β are unbiased. That is E( ˆα ) = αand E (ˆ β ) = β. Thus on average the estimated value will be equal to the true values. To prove this also requires the assumption that E(u t) = 0. Unbiasedness is a stronger condition than consistency. Efficiency Dr Artur Semeyutin (BIE0014) OLS Business School27 / 37 Consistency/Unbiasedness/Efficiency (Cont’d) An estimator ˆ β of parameter βis said to be efficient if it is unbiased and no other unbiased estimator has a smaller variance. If the estimator is efficient, we are minimising the probability that it is a long way off from the true value of β. Dr Artur Semeyutin (BIE0014) OLS Business School28 / 37 Precision and Standard Errors Any set of regression estimates of and are specific to the sample used in their estimation. Recall that the estimators of αand βfrom the sample parameters ( ˆ αand ˆ β ) are given by ˆ β = X xty t − T x y X x2 t − T¯ x 2 and ˆ α = ¯ y− ˆ β ¯ x Dr Artur Semeyutin (BIE0014) OLS Business School29 / 37 Precision and Standard Errors (Cont’d)What we need is some measure of the reliability or precision of the estimators ( ˆ α and ˆ β ). The precision of the estimate is given by its standard error. Given assumptions (1)–(4) above, then the standard errors can be shown to be given by SE( ˆα ) = sv u u u t X x2 t T X (x t − ¯ x )2 = sv u u u t X x2 t T X x2 t − T¯ x 2 SE (ˆ β ) = sv u u t 1 X (x t − ¯ x )2 = sv u u t 1 X x2 t − T¯ x 2 where sis the estimated standard deviation of the residuals. Dr Artur Semeyutin (BIE0014) OLS Business School30 / 37 Estimating the Variance of the Disturbance Term The variance of the random variable u tis given by Var( u t) = E[( u t)-E( u t)] 2 which reduces to Var(u t) = E( u2 t ) We could estimate this using the average of u2 t : s 2 = 1 T P u2 t Unfortunately this is not workable since u t is not observable. We can use the sample counterpart to u t, which is ˆ u t: s 2 = 1 T P ˆ u 2 t But this estimator is a biased estimator of σ2 . Dr Artur Semeyutin (BIE0014) OLS Business School31 / 37 Estimating the Variance of the Disturbance Term (cont’d) An unbiased estimator of σis given by s= q P ˆ u 2 t T − 2 where P ˆ u 2 t is the residual sum of squares and Tis the sample size. Some Comments on the Standard Error Estimators 1 Both SE( ˆ α) and SE( ˆ β ) depend on s2 (or s). The greater the variance s2 , then the more dispersed the errors are about their mean value and therefore the more dispersed ywill be about its mean value. 2 The sum of the squares of xabout their mean appears in both formulae. The larger the sum of squares, the smaller the coefficient variances. Dr Artur Semeyutin (BIE0014) OLS Business School32 / 37 Some Comments on the Standard Error Estimators Consider what happens if P (x t − ¯ x )2 is small or large: 1 The larger the sample size, T, the smaller will be the coefficient variances. T appears explicitly in SE( ˆ α) and implicitly in SE( ˆ β ). Dr Artur Semeyutin (BIE0014) OLS Business School33 / 37yx _ y x_ 0 yx 0 _ y x_ Some Comments on the Standard Error Estimators (Cont’d) Tappears implicitly since the sum P (x t − ¯ x )2 is from t= 1 to T. 2 The term P x2 t appears in the SE( ˆ α). The reason is that P x2 t measures how far the points are away from the y -axis. Dr Artur Semeyutin (BIE0014) OLS Business School34 / 37 Example: How to Calculate the Parameters and Standard Errors Assume we have the following data calculated from a regression of yon a single variable xand a constant over 22 observations. Data: Xxty t = 830102 ,T = 22 ,¯ x = 416 .5 , ¯ y = 86 .65 , X x2 t = 3919654 ,RSS = 130 .6 Calculations ˆ β = 830102 −(22 ×416 .5 × 86.65) 3919654 −22 ×(416 .5) 2 = 0 .35 ˆ α = 86 .65 −0.35 ×416 .5 = −59 .12 We write ˆ y t = ˆ α+ ˆ β x t ˆ y t = −59 .12 + 0 .35 x t Dr Artur Semeyutin (BIE0014) OLS Business School35 / 37 Example: How to Calculate the Parameters and Standard Errors (Cont’d)SE (regression ), s= q P ˆ u 2 t T − 2 = q 130 .6 20 = 2 .55 SE ( ˆα ) = 2 .55 ×s 3919654 22 ×(3919654 −22 ×416 .5 2 ) = 3 .35 SE (ˆ β ) = 2 .55 ×r 1 3919654 −22 ×416 .5 2 = 0 .0079 We now write the results as ˆ y t = −59 .12 + 0 .35 x t (3 .35) (0 .0079) Dr Artur Semeyutin (BIE0014) OLS Business School36 / 37 Essential Reading Please read the textbook chapter: Chris Brooks - Introductory Econometrics for Finance, 4th Edition (2019) Cambridge University Press, Chapter 3 (and 5). Or read: Jeffrey Wooldridge - Introductory Econometrics, 7th Edition (2019) Cengage, Chapter 2 (and 3). There is an R script with seminar exercises on Brightspace for the upcoming week. Please practice and practice again before our next session - Econometrics is often learnt by doing! Dr Artur Semeyutin (BIE0014) OLS Business School37 / 37

Week 1 (ii) Lecture PDF - Linear Regression

Document Details

Tags

Related

Summary

Full Transcript