Chapter 3 - Basic Ideas of Linear Regression PDF
Document Details
![CapableOmaha9956](https://quizgecko.com/images/avatars/avatar-17.webp)
Uploaded by CapableOmaha9956
Universiti Malaya
Saizal Pinjaman
Tags
Summary
This document presents an introduction to linear regression, a statistical method used in econometrics to model the relationship between a dependent variable and one or more explanatory variables. The document covers basic concepts such as the meaning of regression, objectives of regression, a hypothetical example, stochastic specification, sample regression function, and estimation of parameters. It further explains the importance of considering errors and the use of least squares principles.
Full Transcript
Chapter 3: Basic ideas of linear regression BT22203: ECONOMETRICS The meaning of regression Concerned with the study of the relationship between one variable called the explained or dependent variable analysis Although regression and onedealsor more wi...
Chapter 3: Basic ideas of linear regression BT22203: ECONOMETRICS The meaning of regression Concerned with the study of the relationship between one variable called the explained or dependent variable analysis Although regression and onedealsor more withother the variables called relationship independent between a dependentor explanatory variable variables. and one or more Causality must beindependent variables, justified, or inferred, it does from the not necessarily theory imply the that underlies causation. phenomenon that is tested empirically. BT22203: Econometrics, Lecturer: Saizal Pinjaman To estimate the mean value of the dependent variable, given To testthe values ofabout hypotheses the The objectives independent variables the nature of the of regression dependence To predict or forecast the mean value of the dependent One or morevariable, of the given the value(s)objectives preceding of the independent combined variable(s) beyond the sample range BT22203: Econometrics, Lecturer: Saizal Pinjaman A hypothetical example Suppose we are interested in finding out whether a student’s family income is related to how well students score on the mathematics section of the test. Let Y represent the math S.A.T. score and X represent annual family income. BT22203: Econometrics, Lecturer: Saizal Pinjaman These connected circles are formally called the conditional mean/expected values. It refer to the mean values for each income level The line connecting the conditional means is called the population regression line (PRL). BT22203: Econometrics, Lecturer: Saizal Pinjaman Since the PRL in Figure 2-1 is approximately linear, we can express it mathematically in the following functional form: E(Y|Xi) is the mean value of Y corresponding to or conditional upon a given value of X. The subscript i refers to the ith subpopulation. Regression of Y on X can be defined simply as the mean of the distribution of Y values which has the given X. BT22203: Econometrics, Lecturer: Saizal Pinjaman B1and B2 are called the parameters or regression coefficients. B1 is also known as the intercept and B2 as the slope The slope coefficient measures the rate of change in the mean value of Y per unit change in X In the future expressions like E(Y|Xi) will be simply written as E(Y) BT22203: Econometrics, Lecturer: Saizal Pinjaman Look at Table 2-1, we know that corresponding to X = Statistical or $75,000, the average Y is 528 points. stochastic specification But if we pick one student at random from the 10 of the students corresponding to this income, we know that the math S.A.T. score for that student will not population necessarily be equal to the mean value of 528. regression How do you explain the score of an individual function student in relation to income? Any individual’s math S.A.T. score is equal to the average for that group plus or minus some quantity. BT22203: Econometrics, Lecturer: Saizal Pinjaman Mathematical expression: u is known as the stochastic or error term. Eq. (2.2) is called the stochastic (or statistical) PRF. systematic or deterministic component: (B1 + B2Xi) nonsystematic or random or noise component: ui Represent the influence of those variables that are not explicitly included in the model. The nature Reflect inherent randomness in human behavior. of the stochastic Represent errors of measurement. error term The principle of Ockham’s razor – regression model to be kept as simple as possible until proved inadequate. BT22203: Econometrics, Lecturer: Saizal Pinjaman The sample regression function Can we estimate the PRF from the sample data? we rarely have the entire population at our disposal. we may not be able to estimate the PRF accurately because of sampling fluctuations, or sampling error BT22203: Econometrics, Lecturer: Saizal Pinjaman Which of the two SRLs represents the true PRL? There is no way we can be sure that either of the SRLs shown in Figure 2-3 represents the true PRL We would get K different SRLs for K different samples, and all these SRLs are not likely to be the same. BT22203: Econometrics, Lecturer: Saizal Pinjaman We can develop the concept of the sample regression function (SRF) to represent the SRL. The sample counterpart of Eq. (2.1) may be written as: BT22203: Econometrics, Lecturer: Saizal Pinjaman the stochastic version of Eq. (2.3): ei = the estimator of ui. ei is the residual term, or simply the residual. ei represents the difference between the actual Y values and their estimated values from the sample regression. BT22203: Econometrics, Lecturer: Saizal Pinjaman Keep in mind that we actually do not observe B1, B2, and u. What we observe are their proxies, namely b1, b2, and e once we have a specific sample. BT22203: Econometrics, Lecturer: Saizal Pinjaman Granted that the SRF is only an approximation of the PRF, can we find a method or a procedure that will make this approximation as close as possible? It is fascinating to consider that this can be done even though we never actually determine the PRF itself. BT22203: Econometrics, Lecturer: Saizal Pinjaman The special meaning of the term ‘linear’ regression Linearity in the variables the more “natural” meaning of linearity the conditional mean value of the dependent variable is a linear function of the independent variable(s) the following functions are not linear: BT22203: Econometrics, Lecturer: Saizal Pinjaman Linearity in the parameters the conditional mean of the dependent variable is a linear function of the parameters It means that B appear with a power of 1 only. it may or may not be linear in the variables. nonlinear in the parameter model: BT22203: Econometrics, Lecturer: Saizal Pinjaman Simple versus multiple regression Multiple regression model is a case where the dependent variable is a function of more than one explanatory variable. The stochastic specification of the model can be expressed as BT22203: Econometrics, Lecturer: Saizal Pinjaman 2.8 Estimation of parameters: The method of ordinary least squares How then do we estimate the PRF? the method that is used most frequently is that of least squares (LS), more popularly known as the method of ordinary least squares (OLS). The Least Squares Principle: Since the PRF is not directly observable, we estimate it from the SRF BT22203: Econometrics, Lecturer: Saizal Pinjaman Which we can write as: The best way to estimate the PRF is to choose b1 and b2, the estimators of B1 and B2, in such a way that the residuals ei or residual sum of squares (RSS) are as small as possible. The least squares principle states: BT22203: Econometrics, Lecturer: Saizal Pinjaman How do we actually determine these values? We obtain the following solutions for b1 and b2: The sample intercept is thus the sample mean value of Y minus the estimated slope times the sample mean value of X: The small letters denote deviations from the sample mean values. BT22203: Econometrics, Lecturer: Saizal Pinjaman Interesting features of OLS: SRF obtained by the method of OLS passes through the sample mean values of X and Y: The mean value of the residuals is always zero. The sum of the product of the residuals e and the values of the explanatory variable X is zero: The sum of the product of the residuals ei and the estimated that is, σ 𝑒𝑖 𝑌𝑖 is zero BT22203: Econometrics, Lecturer: Saizal Pinjaman Putting it all together BT22203: Econometrics, Lecturer: Saizal Pinjaman From the computations shown in Table 2-4, we obtain the following sample math S.A.T. score regression: Note that we have put a cap on Y to remind us that it is an estimator of the true population mean corresponding to the given level of X. BT22203: Econometrics, Lecturer: Saizal Pinjaman