Econometrics I Past Paper PDF
Document Details
Uploaded by PeacefulPanda4677
Arba Minch University
2023
Tags
Summary
This document is Econometrics I module notes and example of study questions for ARBAMINCH UNIVERSITY College of Business and Economics Department of Economics. The module covers the introduction to econometrics. This module includes definitions of econometrics and it's relationship with mathematical and economic statistic, and the difference between economic models and econometric models. Chapter 1 - Introduction, Chapter 2 - The classical regression analysis, simple linear regression models, Chapter 3 - Multiple linear regression models, and Chapter 4 - Violations of the classical linear regression model assumptions are also discussed.
Full Transcript
ARBAMINCH UNIVERSITY COLLEGE OF BUSINESS AND ECONOMICS DEPARTMENNT OF ECONOMICS Econometrics I (Econ 3061) MODULE AMU 2023 1 Introduction to the mo...
ARBAMINCH UNIVERSITY COLLEGE OF BUSINESS AND ECONOMICS DEPARTMENNT OF ECONOMICS Econometrics I (Econ 3061) MODULE AMU 2023 1 Introduction to the module The principal objective of the course, “Econometrics I”, is to provide an elementary but comprehensive introduction to the art and science of econometrics. It enables students to see how economic theory, statistical and mathematical methods are combined in the analysis of economic data, with a purpose of giving empirical content to economic theories and verify or refute them. This course includes four chapters. The first chapter introduces students with the definition and some fundamental conceptualization of econometrics. In chapter two a fairly detailed treatment of the simple classical linear regression model will be made. In this chapter students will be introduced with the basic logic, concepts, assumptions, estimation methods, and interpretations of the simple classical linear regression models and their applications in economic science. Chapter three, which deals with Multiple Regression Models, is basically the extension of the simple regression models. But in chapter three attempts will be made to expand the linear regression model by incorporating more than one explanatory variables or regressors to the model. Chapter four pays due attention to violations of CLRM assumptions, their consequences, and the remedial measures. Specifically, Autocorrelation, Heteroscedasticity, and Multicolliearity problems will be given much focus. Contents of the Module in Brief Chapter 1. Introduction Definition and scope of econometrics Economic models vs. econometric models Methodology of econometrics Desirable properties of an econometric model Goals of econometrics Chapter 2. The Classical Regression Analysis: The Simple Linear regression Models Stochastic and non-stochastic relationships The Simple Regression model The basic Assumptions the Classical Regression Model OLS Method of Estimation Properties of OLS Estimators Inferences/Predictions Chapter 3. The Classical Regression Analysis: The Multiple Linear Regression Models Assumptions Ordinary Least Squares (OLS) estimation Matrix Approach to Multiple Regression Model Properties of the OLS estimators Inferences/Predictions Chapter 4: Violations of the assumptions of Classical Linear Regression models 4.1 Heteroscedasticity 4.2 Autocorrelation 4.3 Multicollinearity 2 Chapter One Introduction 1.1 Definition and scope of econometrics The economic theories we learn in various economics courses suggest many relationships among economic variables. For instance, in microeconomics we learn demand and supply models in which the quantities demanded and supplied of a good depend on its price. In macroeconomics, we study „investment function‟ to explain the amount of aggregate investment in the economy as the rate of interest changes; and „consumption function‟ that relates aggregate consumption to the level of aggregate disposable income. Each of such specifications involves a relationship among economic variables. As economists, we may be interested in questions such as: If one variable changes in a certain magnitude, by how much will another variable change? Also, given that we know the value of one variable; can we forecast or predict the corresponding value of another? The purpose of studying the relationships among economic variables and attempting to answer questions of the type raised here, is to help us understood the real economic world we live in. However, economic theories that postulate the relationships between economic variables have to be checked against data obtained from the real world. If empirical data verify the relationship proposed by economic theory, we accept the theory as valid. If the theory is incompatible with the observed behavior, we either reject the theory or in the light of the empirical evidence of the data, modify the theory. To provide a better understanding of economic relationships and a better 3 guidance for economic policy making we also need to know the quantitative relationships between the different economic variables. We obtain these quantitative measurements taken from the real world. The field of knowledge which helps us to carryout such an evaluation of economic theories in empirical terms is econometrics. Distance students! Having said the background statement in our attempt for defining „ECONOMETRICS‟, we may now formally define what econometrics is. WHAT IS ECONOMETRICS? Literally interpreted, econometrics means “economic measurement”, but the scope of econometrics is much broader as described by leading econometricians. Various econometricians used different ways of wordings to define econometrics. But if we distill the fundamental features/concepts of all the definitions, we may obtain the following definition. “Econometrics is the science which integrates economic theory, economic statistics, and mathematical economics to investigate the empirical support of the general schematic law established by economic theory. It is a special type of economic analysis and research in which the general economic theories, formulated in mathematical terms, is combined with empirical measurements of economic phenomena. Starting from the relationships of economic theory, we express them in mathematical terms so that they can be measured. We then use specific methods, called econometric methods in order to obtain numerical estimates of the coefficients of the economic relationships.” Measurement is an important aspect of econometrics. However, the scope of econometrics is much broader than measurement. As D.Intriligator rightly stated 4 the “metric” part of the word econometrics signifies „measurement‟, and hence econometrics is basically concerned with measuring of economic relationships. In short, econometrics may be considered as the integration of economics, mathematics, and statistics for the purpose of providing numerical values for the parameters of economic relationships and verifying economic theories. 1.2 Econometrics vs. mathematical economics Mathematical economics states economic theory in terms of mathematical symbols. There is no essential difference between mathematical economics and economic theory. Both state the same relationships, but while economic theory use verbal exposition, mathematical symbols. Both express economic relationships in an exact or deterministic form. Neither mathematical economics nor economic theory allows for random elements which might affect the relationship and make it stochastic. Furthermore, they do not provide numerical values for the coefficients of economic relationships. Econometrics differs from mathematical economics in that, although econometrics presupposes, the economic relationships to be expressed in mathematical forms, it does not assume exact or deterministic relationship. Econometrics assumes random relationships among economic variables. Econometric methods are designed to take into account random disturbances which relate deviations from exact behavioral patterns suggested by economic theory and mathematical economics. Furthermore, econometric methods provide numerical values of the coefficients of economic relationships. 1.3 Econometrics vs. statistics Econometrics differs from both mathematical statistics and economic statistics. An economic statistician gathers empirical data, records them, tabulates them or 5 charts them, and attempts to describe the pattern in their development over time and perhaps detect some relationship between various economic magnitudes. Economic statistics is mainly a descriptive aspect of economics. It does not provide explanations of the development of the various variables and it does not provide measurements the coefficients of economic relationships. Mathematical (or inferential) statistics deals with the method of measurement which are developed on the basis of controlled experiments. But statistical methods of measurement are not appropriate for a number of economic relationships because for most economic relationships controlled or carefully planned experiments cannot be designed due to the fact that the nature of relationships among economic variables are stochastic or random. Yet the fundamental ideas of inferential statistics are applicable in econometrics, but they must be adapted to the problem economic life. Econometric methods are adjusted so that they may become appropriate for the measurement of economic relationships which are stochastic. The adjustment consists primarily in specifying the stochastic (random) elements that are supposed to operate in the real world and enter into the determination of the observed data. 1.4 Economic models vs. econometric models i) Economic models: Any economic theory is an observation from the real world. For one reason, the immense complexity of the real world economy makes it impossible for us to understand all interrelationships at once. Another reason is that all the interrelationships are not equally important as such for the understanding of the economic phenomenon under study. The sensible procedure is therefore, to pick up the important factors and relationships relevant to our problem and to focus our attention on these alone. Such a deliberately simplified analytical framework is called on economic model. It is an organized set of relationships that describes the 6 functioning of an economic entity under a set of simplifying assumptions. All economic reasoning is ultimately based on models. Economic models consist of the following three basic structural elements. 1. A set of variables 2. A list of fundamental relationships and 3. A number of strategic coefficients ii) Econometric models: The most important characteristic of economic relationships is that they contain a random element which is ignored by mathematical economic models which postulate exact relationships between economic variables. Example: Economic theory postulates that the demand for a commodity depends on its price, on the prices of other related commodities, on consumers‟ income and on tastes. This is an exact relationship which can be written mathematically as: Q b0 b1 P b2 P0 b3Y b4 t The above demand equation is exact. How ever, many more factors may affect demand. In econometrics the influence of these „other‟ factors is taken into account by the introduction into the economic relationships of random variable. In our example, the demand function studied with the tools of econometrics would be of the stochastic form: Q b0 b1 P b2 P0 b3Y b4 t u where u stands for the random factors which affect the quantity demanded. 1.5 Methodology of econometrics Econometric research is concerned with the measurement of the parameters of economic relationships and with the predication of the values of economic variables. The relationships of economic theory which can be measured with 7 econometric techniques are relationships in which some variables are postulated as causes of the variation of other variables. Starting with the postulated theoretical relationships among economic variables, econometric research or inquiry generally proceeds along the following lines/stages. 1. Specification the model 2. Estimation of the model 3. Evaluation of the estimates 4. Evaluation of he forecasting power of the estimated model 1. Specification of the model In this step the econometrician has to express the relationships between economic variables in mathematical form. This step involves the determination of three important tasks: i) the dependent and independent (explanatory) variables which will be included in the model. ii) the a priori theoretical expectations about the size and sign of the parameters of the function. iii) the mathematical form of the model (number of equations, specific form of the equations, etc.) Note: The specification of the econometric model will be based on economic theory and on any available information related to the phenomena under investigation. Thus, specification of the econometric model presupposes knowledge of economic theory and familiarity with the particular phenomenon being studied. Specification of the model is the most important and the most difficult stage of any econometric research. It is often the weakest point of most econometric applications. In this stage there exists enormous degree of likelihood of 8 committing errors or incorrectly specifying the model. Some of the common reasons for incorrect specification of the econometric models are: 1. the imperfections, looseness of statements in economic theories. 2. the limitation of our knowledge of the factors which are operative in any particular case. 3. the formidable obstacles presented by data requirements in the estimation of large models. The most common errors of specification are: a. Omissions of some important variables from the function. b. The omissions of some equations (for example, in simultaneous equations model). c. The mistaken mathematical form of the functions. 2. Estimation of the model This is purely a technical stage which requires knowledge of the various econometric methods, their assumptions and the economic implications for the estimates of the parameters. This stage includes the following activities. a. Gathering of the data on the variables included in the model. b. Examination of the identification conditions of the function (especially for simultaneous equations models). c. Examination of the aggregations problems involved in the variables of the function. d. Examination of the degree of correlation between the explanatory variables (i.e. examination of the problem of multicollinearity). e. Choice of appropriate economic techniques for estimation, i.e. to decide a specific econometric method to be applied in estimation; such as, OLS, MLM, Logit, and Probit. 9 3. Evaluation of the estimates This stage consists of deciding whether the estimates of the parameters are theoretically meaningful and statistically satisfactory. This stage enables the econometrician to evaluate the results of calculations and determine the reliability of the results. For this purpose we use various criteria which may be classified into three groups: i. Economic a priori criteria: These criteria are determined by economic theory and refer to the size and sign of the parameters of economic relationships. ii. Statistical criteria (first-order tests): These are determined by statistical theory and aim at the evaluation of the statistical reliability of the estimates of the parameters of the model. Correlation coefficient test, standard error test, t-test, F-test, and R2-test are some of the most commonly used statistical tests. iii. Econometric criteria (second-order tests): These are set by the theory of econometrics and aim at the investigation of whether the assumptions of the econometric method employed are satisfied or not in any particular case. The econometric criteria serve as a second order test (as test of the statistical tests) i.e. they determine the reliability of the statistical criteria; they help us establish whether the estimates have the desirable properties of unbiasedness, consistency etc. Econometric criteria aim at the detection of the violation or validity of the assumptions of the various econometric techniques. 4) Evaluation of the forecasting power of the model: Forecasting is one of the aims of econometric research. However, before using an estimated model for forecasting by some way or another the predictive power of the model. It is possible that the model may be economically 10 meaningful and statistically and econometrically correct for the sample period for which the model has been estimated; yet it may not be suitable for forecasting due to various factors (reasons). Therefore, this stage involves the investigation of the stability of the estimates and their sensitivity to changes in the size of the sample. Consequently, we must establish whether the estimated function performs adequately outside the sample of data. i.e. we must test an extra sample performance the model. 1.6 Desirable properties of an econometric model An econometric model is a model whose parameters have been estimated with some appropriate econometric technique. The „goodness‟ of an econometric model is judged customarily according to the following desirable properties. 1. Theoretical plausibility. The model should be compatible with the postulates of economic theory. It must describe adequately the economic phenomena to which it relates. 2. Explanatory ability. The model should be able to explain the observations of he actual world. It must be consistent with the observed behaviour of the economic variables whose relationship it determines. 3. Accuracy of the estimates of the parameters. The estimates of the coefficients should be accurate in the sense that they should approximate as best as possible the true parameters of he structural model. The estimates should if possible possess the desirable properties of unbiasedness, consistency and efficiency. 11 4. Forecasting ability. The model should produce satisfactory predictions of future values of he dependent (endogenous) variables. 5. Simplicity. The model should represent the economic relationships with maximum simplicity. The fewer the equations and the simpler their mathematical form, the better the model is considered, ceteris paribus (that is to say provided that the other desirable properties are not affected by the simplifications of the model). 1.7 Goals of Econometrics Three main goals of Econometrics are identified: i) Analysis i.e. testing economic theory ii) Policy making i.e. Obtaining numerical estimates of the coefficients of economic relationships for policy simulations. iii) Forecasting i.e. using the numerical estimates of the coefficients in order to forecast the future values of economic magnitudes. Review questions How would you define econometrics? How does it differ from mathematical economics and statistics? Describe the main steps involved in any econometrics research. Differentiate between economic and econometric model. What are the goals of econometrics? 12 Chapter Two THE CLASSICAL REGRESSION ANALYSIS [The Simple Linear Regression Model] Economic theories are mainly concerned with the relationships among various economic variables. These relationships, when phrased in mathematical terms, can predict the effect of one variable on another. The functional relationships of these variables define the dependence of one variable upon the other variable (s) in the specific form. The specific functional forms may be linear, quadratic, logarithmic, exponential, hyperbolic, or any other form. In this chapter we shall consider a simple linear regression model, i.e. a relationship between two variables related in a linear form. We shall first discuss two important forms of relation: stochastic and non-stochastic, among which we shall be using the former in econometric analysis. 2.1. Stochastic and Non-stochastic Relationships A relationship between X and Y, characterized as Y = f(X) is said to be deterministic or non-stochastic if for each value of the independent variable (X) there is one and only one corresponding value of dependent variable (Y). On the other hand, a relationship between X and Y is said to be stochastic if for a particular value of X there is a whole probabilistic distribution of values of Y. In such a case, for any given value of X, the dependent variable Y assumes some specific value only with some probability. Let‟s illustrate the distinction between stochastic and non stochastic relationships with the help of a supply function. 13 Assuming that the supply for a certain commodity depends on its price (other determinants taken to be constant) and the function being linear, the relationship can be put as: Q f (P) P (2.1) The above relationship between P and Q is such that for a particular value of P, there is only one corresponding value of Q. This is, therefore, a deterministic (non- stochastic) relationship since for each price there is always only one corresponding quantity supplied. This implies that all the variation in Y is due solely to changes in X, and that there are no other factors affecting the dependent variable. If this were true all the points of price-quantity pairs, if plotted on a two- dimensional plane, would fall on a straight line. However, if we gather observations on the quantity actually supplied in the market at various prices and we plot them on a diagram we see that they do not fall on a straight line. The derivation of the observation from the line may be attributed to several factors. a. Omission of variables from the function b. Random behavior of human beings c. Imperfect specification of the mathematical form of the model d. Error of aggregation e. Error of measurement 14 In order to take into account the above sources of errors we introduce in econometric functions a random variable which is usually denoted by the letter „u‟ or „ ‟ and is called error term or random disturbance or stochastic term of the function, so called be cause u is supposed to „disturb‟ the exact linear relationship which is assumed to exist between X and Y. By introducing this random variable in the function the model is rendered stochastic of the form: Yi X ui.................................................................................................................................................................................. (2.2) Thus a stochastic model is a model in which the dependent variable is not only determined by the explanatory variable(s) included in the model but also by others which are not included in the model. 2.2. Simple Linear Regression model. The above stochastic relationship (2.2) with one explanatory variable is called simple linear regression model. The true relationship which connects the variables involved is split into two parts: a part represented by a line and a part represented by the random term „u‟. 15 The scatter of observations represents the true relationship between Y and X. The line represents the exact part of the relationship and the deviation of the observation from the line represents the random component of the relationship. - Were it not for the errors in the model, we would observe all the points on the line Y ' ,Y ' ,......,Y ' corresponding to X , X ,.... , X. However because of the random 1 2 n 1 2 n disturbance, we observe Y1 ,Y2 ,......,Yn corresponding to X 1 , X 2 ,...., X n. These points diverge from the regression line by u1 , u2 ,.... , un. Y xi u ⏟i 1 –2–3 ⏟i the dependent var iable the regression line random var iable - The first component in the bracket is the part of Y explained by the changes in X and the second is the part of Y not explained by X, that is to say the change in Y is due to the random influence of ui. 2.2.1 Assumptions of the Classical Linear Stochastic Regression Model. The classicals made important assumption in their analysis of regression.The most importat of these assumptions are discussed below. 1. The model is linear in parameters. The classicals assumed that the model should be linear in the parameters regardless of whether the explanatory and the dependent variables are linear or not. This is because if the parameters are non-linear it is difficult to estimate them since their value is not known but you are given with the data of the dependent and independent variable. Example 1. Y x u is linear in both parameters and the variables, so it Satisfies the assumption 16 2. ln Y ln x u is linear only in the parameters. Since the the classicals worry on the parameters, the model satisfies the assumption. Dear distance students! Check yourself whether the following models satisfy the above assumption and give your answer to your tutor. a. ln Y 2 ln X 2 U i b. Yi X i U i 2. U i is a random real variable This means that the value which u may assume in any one period depends on chance; it may be positive, negative or zero. Every value has a certain probability of being assumed by u in any particular instance. 2. The mean value of the random variable(U) in any particular period is zero This means that for each value of x, the random variable(u) may assume various values, some greater than zero and some smaller than zero, but if we considered all the possible and negative values of u, for any given value of X, they would have on average value equal to zero. In other words the positive and negative values of u cancel each other. Mathematically, E(Ui ) 0............................................................... (2.3) 17 3. The variance of the random variable(U) is constant in each period (The assumption of homoscedasticity) For all values of X, the u‟s will show the same dispersion around their mean. In Fig.2.c this assumption is denoted by the fact that the values that u can assume lie with in the same limits, irrespective of the value of X. For X 1 , u can assume any value with in the range AB; for X 2 , u can assume any value with in the range CD which is equal to AB and so on. Graphically; Mathematically; Var(U i ) E[U i E(U i)]2 E(U i )2 2 (Since E(U ) 0 ).This constant variance is i called homoscedasticity assumption and the constant variance itself is called homoscedastic variance. 18 4. The random variable (U) has a normal distribution This means the values of u (for each x) have a bell shaped symmetrical distribution about their zero mean and constant variance 2 , i.e. U i N (0, 2 )........................................................................... 2.4 5. The random terms of different observations U ,Ui are j independent. (The assumption of no autocorrelation) This means the value which the random term assumed in one period does not depend on the value which it assumed in any other period. Algebraically, Cov(u i u j) [(u i (u i)][u j (u j)] E(uiu j ) 0...................................................... (2.5) 6. The X i are a set of fixed values in the hypothetical process of repeated sampling which underlies the linear regression model. - This means that, in taking large number of samples on Y and X, the X i values are the same in all samples, but the ui values do differ from sample to sample, and so of course do the values of yi. 7. The random variable (U) is independent of the explanatory variables. This means there is no correlation between the random variable and the explanatory variable. If two variables are unrelated their covariance is zero. Hence Cov( X i ,U i ) 0............................................................................ (2.6) Proof:- cov( XU ) [( X i ( X i )][U i (Ui )] 19 [( X i ( X i )(U i )] given E(Ui ) 0 ( X iUi ) ( X i )(Ui ) ( X iUi ) X i (Ui ) , given that the xi are fixed 0 8. The explanatory variables are measured without error - U absorbs the influence of omitted variables and possibly errors of measurement in the y‟s. i.e., we will assume that the regressors are error free, while y values may or may not include errors of measurement. Dear students! We can now use the above assumptions to derive the following basic concepts. A. The dependent variable Yi is normally distributed. i.e Yi ~ N( x i ), 2 ………………………………(2.7) Proof: Mean: (Y ) xi ui X i Since (ui ) 0 Variance: Var(Y ) Y (Y ) 2 i i i X i u i ( X i ) 2 (u i ) 2 2 (since (u i ) 2 2 ) var(Y i ) 2...................................................................... (2.8) The shape of the distribution of Yi is determined by the shape of the distribution of ui which is normal by assumption 4. Since and , being constant, they don‟t 20 affect the distribution of yi. Furthermore, the values of the explanatory variable, xi , are a set of fixed values by assumption 5 and therefore don‟t affect the shape of the distribution of yi. Yi ~ N( x i , ) 2 B. successive values of the dependent variable are independent, i.e Cov(Yi ,Y j ) 0 Proof: Cov(Yi ,Y j ) E{[Yi E(Yi )][Y j E(Yj )]} E{[ X i U i E( X i U i )][ X j U j E( X j U j )} (Since Yi X i Ui andYj X j U j ) = E[( X i Ui X i )( X j U j X j )] ,Since (ui ) 0 E(UiU j ) 0 (from equation (2.5)) Therefore, Cov(Yi,Yj ) 0. 2.2.2 Methods of estimation Specifying the model and stating its underlying assumptions are the first stage of any econometric application. The next step is the estimation of the numerical values of the parameters of economic relationships. The parameters of the simple linear regression model can be estimated by various methods. Three of the most commonly used methods are: 1. Ordinary least square method (OLS) 2. Maximum likelihood method (MLM) 3. Method of moments (MM) But, here we will deal with the OLS and the MLM methods of estimation. 21 2.2.2.1 The ordinary least square (OLS) method The model Yi X i Ui is called the true relationship between Y and X because Y and X represent their respective population value, and and are called the true parameters since they are estimated from the population value of Y and X But it is difficult to obtain the population value of Y and X because of technical or economic reasons. So we are forced to take the sample value of Y and X. The parameters estimated from the sample value of Y and X are called the estimators of the true parameters and and are symbolized as ˆ and ˆ. The model Yi ˆ ˆX i ei , is called estimated relationship between Y and X since ˆ and ˆ are estimated from the sample of Y and X and ei represents the sample counterpart of the population random disturbance U i. Estimation of and by least square method (OLS) or classical least square (CLS) involves finding values for the estimates ˆ and ˆ which will minimize the sum of square of the squared residuals ( e 2i ). From the estimated relationship Yi ˆ ˆX i ei , we obtain: ei Yi (ˆ ˆX i ).........................................................(2.6) e2 (Y ˆ ˆX i i i ) 2 ……………………….(2.7) To find the values of ˆ and ˆ that minimize this sum, we have to partially differentiate e 2 with respect to ˆ i and ˆ and set the partial derivatives equal to zero. e2 i 2 (Yi ˆ ˆX i ) 0.......................................................(2.8) 1. ˆ Rearranging this expression we will get: Y i n ̂ X i ……(2.9) If you divide (2.9) by „n‟ and rearrange, we get ˆ Y ˆX....................................................................................................... (2.10) 22 e 2i 2. 2 X i (Yi ˆ ˆX ) 0............................................................. (2.11) ˆ Note: at this point that the term in the parenthesis in equation 2.8and 2.11 is the residual, e Yi ˆ ˆX i. Hence it is possible to rewrite (2.8) and (2.11) as 2 ei 0 and 2 X iei 0. It follows that; e i 0 and X e i i 0.......................................... (2.12) If we rearrange equation (2.11) we obtain; Y X ˆX i ˆX i ……………………………………….(2.13) 2 i i Equation (2.9) and (2.13) are called the Normal Equations. Substituting the values of ˆ from (2.10) to (2.13), we get: Y X i i Xi (Y ˆX ) ˆX i2 YX i ˆXX i ˆX i2 Y X i i YXi ˆ(Xi 2 XX i ) XY nXY = ˆ ( Xi 2 nX 2) XY n X Y ˆ ………………….(2.14) X i2 n X 2 Equation (2.14) can be rewritten in somewhat different way as follows; ( X X )(Y Y ) ( XY XY XY XY ) XY YX XY nXY XY nYX nXY nXY ( X X )(Y Y ) XY n X Y (2.15) ( X X )2 X 2 nX 2 (2.16) Substituting (2.15) and (2.16) in (2.14), we get 23 ( X X )(Y Y ) ˆ ( X X ) 2 Now, denoting ( X i X ) as xi , and (Yi Y ) as yi we get; xi yi ˆ ……………………………………… (2.17) x i2 The expression in (2.17) to estimate the parameter coefficient is termed is the formula in deviation form. 2.2.2.2 Estimation of a function with zero intercept Suppose it is desired to fit the lineYi X i U i , subject to the restriction 0. To estimate ˆ , the problem is put in a form of restricted minimization problem and then Lagrange method is applied. We minimize: e2 (Y ˆ ˆX ) 2 n i i i i1 Subject to: ˆ 0 The composite function then becomes Z (Yi ˆ ˆX i ) ̂ , where is a Lagrange multiplier. 2 We minimize the function with respect to ˆ, ˆ, and Z 2(Yi ˆ ˆX i ) 0 (i) ˆ Z 2(Yi ˆ ˆX i ) ( X i ) 0 (ii) ˆ z 2 0 (iii) Substituting (iii) in (ii) and rearranging we obtain: Xi (Yi ˆX i ) 0 Yi X i ˆX 2 0 i 24 X Y ˆ i 2 i ……………………………………..(2.18) X i This formula involves the actual values (observations) of the variables and not their deviation forms, as in the case of unrestricted value of ˆ. 2.2.2.3. Statistical Properties of Least Square Estimators There are various econometric methods with which we may obtain the estimates of the parameters of economic relationships. We would like to an estimate to be as close as the value of the true population parameters i.e. to vary within only a small range around the true parameter. How are we to choose among the different econometric methods, the one that gives „good‟ estimates? We need some criteria for judging the „goodness‟ of an estimate. „Closeness‟ of the estimate to the population parameter is measured by the mean and variance or standard deviation of the sampling distribution of the estimates of the different econometric methods. We assume the usual process of repeated sampling i.e. we assume that we get a very large number of samples each of size „n‟; we compute the estimates ˆ ‟s from each sample, and for each econometric method and we form their distribution. We next compare the mean (expected value) and the variances of these distributions and we choose among the alternative estimates the one whose distribution is concentrated as close as possible around the population parameter. PROPERTIES OF OLS ESTIMATORS The ideal or optimum properties that the OLS estimates possess may be summarized by well known theorem known as the Gauss-Markov Theorem. Statement of the theorem: “Given the assumptions of the classical linear regression model, the OLS estimators, in the class of linear and unbiased estimators, have the minimum variance, i.e. the OLS estimators are BLUE. 25 According to the this theorem, under the basic assumptions of the classical linear regression model, the least squares estimators are linear, unbiased and have minimum variance (i.e. are best of all linear unbiased estimators). Some times the theorem referred as the BLUE theorem i.e. Best, Linear, Unbiased Estimator. An estimator is called BLUE if: a. Linear: a linear function of the a random variable, such as, the dependent variable Y. b. Unbiased: its average or expected value is equal to the true population parameter. c. Minimum variance: It has a minimum variance in the class of linear and unbiased estimators. An unbiased estimator with the least variance is known as an efficient estimator. According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE properties. The detailed proof of these properties are presented below Dear colleague lets proof these properties one by one. a. Linearity: (for ˆ ) Proposition: ˆ & ˆ are linear in Y. Proof: From (2.17) of the OLS estimator of ˆ is given by: x y xi (Y Y ) xiY Yxi ˆ i2 i , x i x i2 x i2 (but xi ( X X ) X nX nX nX 0 ) x Y ˆ i ; Now, let xi Ki (i 1,2,.... n) x i2 x 2i ˆ KiY (2.19) ˆ K1Y1 K 2Y2 K3Y3 KnYn ˆ is linear in Y 26 Check yourself question: Show that ˆ is linear in Y? Hint: ̂ 1 n Xk iYi. Derive this relationship between ˆ and Y. b. Unbiasedness: Proposition: ˆ & ˆ are the unbiased estimators of the true parameters & From your statistics course, you may recall that if ˆ is an estimator of then E(ˆ) the amount of bias and if ˆ is the unbiased estimator of then bias =0 i.e. E(ˆ) 0 E(ˆ) In our case, ˆ & ˆ are estimators of the true parameters & .To show that they are the unbiased estimators of their respective parameters means to prove that: (ˆ) and (̂ ) Proof (1): Prove that ˆ is unbiased i.e. (ˆ) . We know that ˆ kYi ki ( X i U i ) ki ki X i kiui , but ki 0 and ki X i 1 xi ( X X ) X nX nX nX k i 2 0 xi x i2 x i2 xi 2 ki 0............................................................................................................ (2.20) xi X i ( X X ) Xi ki X i x i2 x i2 X 2 XX X nX 1 2 2 X 2 nX 2 X nX 2 2 ki X i 1...................................................................................................... (2.21) ˆ kiui ˆ kiui (2.22) (ˆ) E( ) ki E(ui ), Since ki are fixed 27 (ˆ) , since (ui ) 0 Therefore, ˆ is unbiased estimator of . Proof(2): prove that ˆ is unbiased i.e.: (̂ ) From the proof of linearity property under 2.2.2.3 (a), we know that: ̂ 1 n Xk Y i i 1 n Xk X U , Since Yi X i U i i i i 1 n X i 1 n ui Xki Xki X i Xkiui 1 n ui Xkiui , ˆ 1 n ui Xkiui 1 n Xk i )u i......................................................... (2.23) (̂ ) 1 n (u i ) Xk i (u i ) (̂ ) (2.24) ˆ is an unbiased estimator of. c. Minimum variance of ˆ and ˆ Now, we have to establish that out of the class of linear and unbiased estimators of and , ˆ and ˆpossess the smallest sampling variances. For this, we shall first obtain variance ofˆ and ˆ and then establish that each has the minimum variance in comparison of the variances of other linear and unbiased estimators obtained by any other econometric methods than OLS. a. Variance of ˆ var( ) (ˆ (ˆ))2 (ˆ )2............................................................... (2.25) Substitute (2.22) in (2.25) and we get var(ˆ) E( k iu i ) 2 [k 2u 2 k 2u 2 ............ k 2u 2 2k k u u ....... 2k k u u ] 1 1 2 2 n n 1 2 1 2 n1 n n1 n 28 [k 2u 2 k 2u 2 ............ k 2u 2 ] [2k k u u ....... 2k k u u ] 1 1 2 2 n n 1 2 1 2 n1 n n1 n ( k i u i ) (k ik ju iu j ) i j 2 2 k 2(u 2 ) 2k k (u u ) 2k 2 (Since (u u ) =0) i i i j i j i i j x i x 2i 1 k , and therefore, k 2 i x i2 i (x i2 )2 x i2 var(ˆ) 2k i2 2 ……………………………………………..(2.26) 2 xi b. Variance of ˆ var(̂ ) (̂ ( ) 2 ̂ 2 (2.27) Substituting equation (2.23) in (2.27), we get var(̂ ) 1 n Xk i u 2i 2 1 n Xk i (u i ) 2 2 2( 1 n Xk )i 2 2( 1 n2 2 n Xk i X 2 k i2 ) 2( 1 n 2 X n k i X 2k i2 ) , Since k i 0 2 ( 1n X 2k 2i ) 1 X2 x 2 1 2 ( k 2 i ) , Since i i (x 2i )2 x i2 n x2 Again: 1 X 22 x i2 nX 2 X 22 n x nx 2 nx i i i X 2 X i2 var(̂ ) 2 1 n 2 2 nx 2 .................................................................... (2.28) x i i 29 Dear student! We have computed the variances OLS estimators. Now, it is time to check whether these variances of OLS estimators do possess minimum variance property compared to the variances other estimators of the true and , other than ˆ and ˆ. To establish that ˆ and ˆ possess minimum variance property, we compare their variances with that of the variances of some other alternative linear and unbiased estimators of and , say * and *. Now, we want to prove that any other linear and unbiased estimator of the true population parameter obtained from any other econometric method has larger variance that that OLS estimators. Lets first show minimum variance of ˆ and then that of ˆ. 1. Minimum variance of ˆ Suppose: * an alternative linear and unbiased estimator of and; Let * wi Y i...................................................................................................................... (2.29) where , wi ki ; but: wi ki ci * wi ( X i ui ) Since Yi X i U i wi wi X i wiui ( *) wi wi X i ,since (ui ) 0 Since * is assumed to be an unbiased estimator, then for * is to be an unbiased estimator of , there must be true that wi 0 and wi X 1 in the above equation. But, wi ki ci wi (ki ci ) ki ci Therefore, ci 0 since ki wi 0 Again wi X i (ki ci ) X i ki X i ci X i Since wi X i 1 and ki X i 1 ci X i 0. From these values we can drive ci xi 0, where xi X i X 30 ci xi ci ( X i X ) ci X i Xci Since ci xi 1 ci 0 ci xi 0 Thus, from the above calculations we can summarize the following results. wi 0, wi xi 1, ci 0, ci X i 0 To prove whether ˆ has minimum variance or not lets compute var( *) to compare with var(ˆ). var( *) var(wiYi ) w 2 var(Y ) i i var( *) 2w2i since Var (Y )i 2 But, w 2 (k c )2 k 2 2k c c 2 i i i i i i i w2 k 2 c 2 Since k c ci xi 0 x i2 i i i i i Therefore, var( *) 2 (k 2 c 2 ) 2k 2 2c 2 i i i i var( *) var( ˆ ) 2 c i2 Given that ci is an arbitrary constant, 2c 2i is a positive i.e it is greater than zero. Thus var( *) var(ˆ). This proves that ˆ possesses minimum variance property. In the similar way we can prove that the least square estimate of the constant intercept (ˆ ) possesses minimum variance. 2. Minimum Variance of ˆ We take a new estimator * , which we assume to be a linear and unbiased estimator of function of. The least square estimator ˆ is given by: ˆ ( 1 n Xki )Yi By analogy with that the proof of the minimum variance property of ˆ , let‟s use the weights wi = ci + ki Consequently; * ( 1n Xwi )Yi 31 Since we want * to be on unbiased estimator of the true , that is, (*) , we substitute for Y xi ui in * and find the expected value of *. * ( 1n Xwi )( X i ui ) X ui ( Xw XX w Xw u ) i i i i i n n n * X ui / n Xwi Xwi X i Xwiui For * to be an unbiased estimator of the true , the following must hold. (wi ) 0, (wi X i ) 1 and (wiui ) 0 i.e., if wi 0, and wi X i 1. These conditions imply that ci 0 and ci X i 0. As in the case of ˆ , we need to compute Var( * ) to compare with var(ˆ ) var( *) var( 1 n Xw )Y i i ( 1 Xw )2 var(Y ) n i i 2( 1 n Xw )i 2 2( 1 n X 2 wi 2 2 1 nXw i) 2 2 ( n n X 2 w i 2 2 X 1 nw i) 2 var( *) 2 1 n X w i 2 2 ,Since wi 0 but w 2 k 2 c 2 i i i var( *) 2 1 n X (k c2 2 2 i i 1 X2 2 2 2 2 var( *) n x 2 X c i i X i2 nx 2 X ci 2 2 2 2 i The first term in the bracket it var(̂ ) , hence var( *) var(̂ ) 2 X 2 c 2i var( *) var(̂ ) , Since 2 X 2c i2 0 32 Therefore, we have proved that the least square estimators of linear regression model are best, linear and unbiased (BLU) estimators. The variance of the random variable (Ui) Dear student! You may observe that the variances of the OLS estimates involve 2 , which is the population variance of the random disturbance term. But it is difficult to obtain the population data of the disturbance term because of technical and economic reasons. Hence it is difficult to compute 2 ; this implies that variances of OLS estimates are also difficult to compute. But we can compute these variances if we take the unbiased estimate of 2 which is ˆ 2 computed from the sample value of the disturbance term ei from the expression: ei2............................................................ ˆ 2 2.30 u n2 To use ˆ 2 in the expressions for the variances of ˆ and ˆ , we have to prove ) E e 2 whether ˆ is the unbiased estimator of , i.e., E(ˆ 2 2 2 2 i n2 To prove this we have to compute e 2 from the i expressions of Y, Yˆ , y, yˆ and ei. Proof: Yi ˆ ˆX i ei Yˆ ˆ ˆx Y Yˆ ei.................................................................................................................................................................................................................(2.31) ei Yi Yˆ............................................................................................................................. (2.32) Summing (2.31) will result the following expression Yi yi ei Yi Yˆi sin ce (ei ) 0 Dividing both sides the above by „n‟ will give us 33 Y Yˆ i Y Yˆ (2.33) n n Putting (2.31) and (2.33) together and subtract Y Yˆ e Y Yˆ (Y Y ) (Yˆ Yˆ ) e yi ŷ i e............................................................................ (2.34) From (2.34): ei yi ŷ i......................................................................................................................................(2.35) Where the y‟s are in deviation form. Now, we have to express y i and ŷi in other expression as derived below. From: Yi X i U i Y X U We get, by subtraction yi (Yi Y ) i ( X i X ) (Ui U ) xi (U U ) yi x (U U )............................................................................................ (2.36) Note that we assumed earlier that , (u) 0 , i.e in taking a very large number samples we expect U to have a mean value of zero, but in any particular single sample U is not necessarily zero. Similarly: From; Yˆ ˆ ˆx Y ˆ ˆx We get, by subtraction Yˆ Yˆ ˆ( X X ) yˆ ˆx........................................................................................................ (2.37) Substituting (2.36) and (2.37) in (2.35) we get 34 ei xi (u i u ) ˆxi (u i u ) (̂ i )xi The summation over the n sample values of the squares of the residuals over the „n‟ samples yields: e2i [(u i u ) (ˆ )xi ]2 [(u i u ) 2 (ˆ ) 2 x i2 2(u i u )(ˆ )x i] (u i u )2 (ˆ )2 x 2i 2[(ˆ )x i(u i u )] Taking expected values we have: (e2 ) [(u u )2 ] [(ˆ )2 x 2 ] 2[(ˆ )x (u u )]..................... (2.38) i i i i i The right hand side terms of (2.38)may be rearranged as follows a. [(u u )2 ] (u 2 uu ) i i (u i )2 u 2 i n 1 (u 2 ) (u) 2 i n n 2 1 (u u ....... u )2 since (u 2 ) 2 n 1 2 i i u n (u 2u u ) 2 1 2 n i i j n 2 1 ((u 2 ) 2u u ) i j n i i j n 2 1 n 2 2 (u u ) n u n i j n 2 2 (given (u u ) 0) u u i j u2 (n 1).............................................................................. (2.39) b. [(ˆ )2 x 2 ] x 2.(ˆ ) 2 i i Given that the X‟s are fixed in all samples and we know that 1 (ˆ ) 2 var(ˆ) 2 u x 2 35 1 Hence x 2.(ˆ )2 x 2. 2 x 2 i i u x i2. ( ˆ ) 2 u2 ……………………………………………(2.40) c. -2 [(ˆ )xi (ui u )] 2[(ˆ )(xi ui uxi )] = -2 [(ˆ )(xi ui )] , sin ce xi 0 But from (2.22) , (ˆ ) kiui and substitute it in the above expression, we will get: -2 [(ˆ )xi (ui u ) 2(kiui )(xi ui )] x i u i = -2 x ( x i u i ) ,since k i x 2 i i x i2 (x iu i )2 2 x i 2 x 2u 2 2x x u u i j i j 2 i i xi 2 x 2 (u i2 ) 2(x ix j)(u ui j) 2 i j xi xi 2 2 x 2(u i 2 ) 2 ( given (uiu j ) 0) xi 2 2(u 2i ) 2 2 …………………………………………………….(2.41) Consequently, Equation (2.38) can be written interms of (2.39), (2.40) and (2.41) as follows: e2 n 1 2 2 2 2 (n 2) 2.............................................. (2.42) i u u u From which we get e i2 E(ˆ 2 ) 2.......................................................................................(2.43) n2 u u e2i Since ˆ 2 u n2 36 Thus, ˆ 2 ei is unbiased estimate of the true variance of the error term( 2 ). 2 n2 Dear student! The conclusion that we can drive from the above proof is that we e 2 i can substitute ˆ 2 for ( 2 ) in the variance expression of ˆ and ˆ , since n2 E(ˆ 2 ) 2. Hence the formula of variance of ˆ and ˆ becomes; Var(ˆ) ˆ = 2 e2i..................................................................................................... (2.44) x i2 (n 2) xi 2 X i2 Var(̂ ) nx e i X i..................................................................................... 2 2 ˆ 2 (2.45) 2 n(n 2) x 2 i i e 2 can be computed as e y ˆ x yi.i 2 2 Note: i i i Dear Student! Do not worry about the derivation of this expression! we will perform the derivation of it in our subsequent subtopic. 2.2.2.4. Statistical test of Significance of the OLS Estimators (First Order tests) After the estimation of the parameters and the determination of the least square regression line, we need to know how „good‟ is the fit of this line to the sample observation of Y and X, that is to say we need to measure the dispersion of observations around the regression line. This knowledge is essential because the closer the observation to the line, the better the goodness of fit, i.e. the better is the explanation of the variations of Y by the changes in the explanatory variables. We divide the available criteria into three groups: the theoretical a priori criteria, the statistical criteria, and the econometric criteria. Under this section, our focus is on statistical criteria (first order tests). The two most commonly used first order tests in econometric analysis are: 37 i. The coefficient of determination (the square of the correlation coefficient i.e. R2). This test is used for judging the explanatory power of the independent variable(s). ii. The standard error tests of the estimators. This test is used for judging the statistical reliability of the estimates of the regression coefficients. 1. TESTS OF THE „GOODNESS OF FIT‟ WITH R2 r2 shows the percentage of total variation of the dependent variable that can be explained by the changes in the explanatory variable(s) included in the model. To elaborate this let‟s draw a horizontal line corresponding to the mean value of the dependent variable Y. (see figure„d‟ below). By fitting the line Yˆ ̂ 0 ̂ 1 X we try to obtain the explanation of the variation of the dependent variable Y produced by the changes of the explanatory variable X..Y Y = e Y Yˆ Y Y = Yˆ Yˆ ̂ 0 ̂1 X = Yˆ Y Y. X Figure „d‟. Actual and estimated values of the dependent variable Y. As can be seen from fig.(d) above, Y Y represents measures the variation of the sample observation value of the dependent variable around the mean. However the variation in Y that can be attributed the influence of X, (i.e. the regression line) is given by the vertical distance Yˆ Y. The part of the total variation in Y about 38 Y that can‟t be attributed to X is equal to e Y Yˆ which is referred to as the residual variation. In summary: ei Yi Yˆ = deviation of the observation Yi from the regression line. yi Y Y = deviation of Y from its mean. yˆ Yˆ Y = deviation of the regressed (predicted) value ( Yˆ ) from the mean. Now, we may write the observed Y as the sum of the predicted value ( Yˆ ) and the residual term (ei.). Yi Yˆ e ⏟ ⏟ ⏟i Observed Yi predicted Yi Re sidual From equation (2.34) we can have the above equation but in deviation form y yˆ e. By squaring and summing both sides, we obtain the following expression: y 2 ( yˆ 2 e)2 y 2 ( yˆ 2 ei2 2 yei) y 2 e 2 2ŷei i i But ŷei = e(Yˆ Y ) e(̂ ˆxi Y ) ̂ei ˆexi Yˆei (but e i 0 , ex i 0 ) ŷe 0............................................................................ (2.46) Therefore; y 2 yˆ 2 e 2 ………………………………...(2.47) ⏟i ⏟ ⏟i Total Explained Un exp lained var iation var iation var ation OR, 39 Total sum of Explained sum Re sidual sum square of square of square 1 –2––3 1––2––3 1––2––3 TSS ESS RSS i.e TSS ESS RSS ……………………………………….(2.48) Mathematically; the explained variation as a percentage of the total variation is explained as: ESS yˆ 2 ……………………………………….(2.49) TSS y 2 From equation (2.37) we have yˆ ˆx. Squaring and summing both sides give us yˆ 2 ˆ 2 x 2 (2.50) We can substitute (2.50) in (2.49) and obtain: ˆ 2 x 2 ESS / TSS …………………………………(2.51) 2 y xy x 2 ˆ x y 2 i , Since i i x 2 y x i2 2 xy xy................................................................. (2.52) x 2 y 2 Comparing (2.52) with the formula of the correlation coefficient: r = Cov (X,Y) / x2x2 = xy / n x2 x2 = xy / ( x2 y 2 )1/2 ………(2.53) Squaring (2.53) will result in: r2 = ( xy )2 / ( x2 y 2 ). ………….(2.54) Comparing (2.52) and (2.54), we see exactly the expressions. Therefore: xy xy ESS/TSS = r2 x2 y2 From (2.48), RSS=TSS-ESS. Hence R2 becomes; TSS RSS RSS ei2 R 2 1 1 2 ………………………….…………(2.55) TSS TSS y 40 From equation (2.55) we can drive; RSS e2 y 2 (1 R 2 ) (2.56) i i The limit of R2: The value of R2 falls between zero and one. i.e. 0 R 2 1. Interpretation of R2 Suppose R 2 0.9 , this means that the regression line gives a good fit to the observed data since this line explains 90% of the total variation of the Y value around their mean. The remaining 10% of the total variation in Y is unaccounted for by the regression line and is attributed to the factors included in the disturbance variable ui. Check yourself question: a. Show that 0 R 2 1. b. Show that the square of the coefficient of correlation is equal to ESS/TSS. Exercise: Suppose rxy is the correlation coefficient between Y and X and is give by: xi yi x i2 y i2 And let r y2 yˆ the square of the correlation coefficient between Y and Yˆ , and is (yŷ) 2 given by: ry2 yˆ y 2 ŷ 2 Show that: i) ry2yˆ R 2 ii) ryy ryx 2. TESTING THE SIGNIFICANCE OF OLS PARAMETERS To test the significance of the OLS parameter estimators we need the following: Variance of the parameter estimators Unbiased estimator of 2 41 The assumption of normality of the distribution of error term. We have already derived that: ˆ 2 var(ˆ) 2 x ˆ 2X 2 var(̂ ) nx 2 e 2 RSS ˆ 2 n2 n2 For the purpose of estimation of the parameters the assumption of normality is not used, but we use this assumption to test the significance of the parameter estimators; because the testing methods or procedures are based on the assumption of the normality assumption of the disturbance term. Hence before we discuss on the various testing methods it is important to see whether the parameters are normally distributed or not. We have already assumed that the error term is normally distributed with mean zero and variance 2 , i.e. U i ~ N(0, 2 ). Similarly, we also proved thatYi ~ N[( x), 2 ]. Now, we want to show the following: ˆ 2 1. ~ N , 2 x 2X 2 ˆ 2. ~ N , nx 2 To show whether ˆ and ˆ are normally distributed or not, we need to make use of one property of normal distribution. “........ any linear function of a normally distributed variable is itself normally distributed.” ˆ ki Yi k1Y1 k2 Y2i .... kn Yn ˆ wi Yi w1Y1 w2 Y2i .... wn Yn Since ˆ and ˆ are linear in Y, it follows that 42 ˆ 2 2X 2 ˆ ~ N , 2 ; ~ N , x nx 2 The OLS estimates ˆ and ˆ are obtained from a sample of observations on Y and X. Since sampling errors are inevitable in all estimates, it is necessary to apply test of significance in order to measure the size of the error and determine the degree of confidence in order to measure the validity of these estimates. This can be done by using various tests. The most common ones are: i) Standard error test ii) Student‟s t-test iii) Confidence interval All of these testing procedures reach on the same conclusion. Let us now see these testing methods one by one. i) Standard error test This test helps us decide whether the estimates ˆ and ˆ are significantly different from zero, i.e. whether the sample from which they have been estimated might have come from a population whose true parameters are zero. 0 and / or 0. Formally we test the null hypothesis H 0 : i 0 against the alternative hypothesis H1 : i 0 The standard error test may be outlined as follows. First: Compute standard error of the parameters. SE(ˆ) var(ˆ) SE(̂ ) var(̂ ) Second: compare the standard errors with the numerical values of ˆ and ˆ. Decision rule: If SE(ˆi ) 1 2 ̂ i , accept the null hypothesis and reject the alternative hypothesis. We conclude that ̂ i is statistically insignificant. 43 If SE(ˆi ) 1 2 ˆi , reject the null hypothesis and accept the alternative hypothesis. We conclude that ̂ i is statistically significant. The acceptance or rejection of the null hypothesis has definite economic meaning. Namely, the acceptance of the null hypothesis 0 (the slope parameter is zero) implies that the explanatory variable to which this estimate relates does not in fact influence the dependent variable Y and should not be included in the function, since the conducted test provided evidence that changes in X leave Y unaffected. In other words acceptance of H0 implies that the relation ship between Y and X is in fact Y (0)x , i.e. there is no relationship between X and Y. Numerical example: Suppose that from a sample of size n=30, we estimate the following supply function. Q 120 0.6 p ei SE : (1.7) (0.025) Test the significance of the slope parameter at 5% level of significance using the standard error test. SE(ˆ ) 0.025 (ˆ ) 0.6 1 2 ˆ 0.3 This implies that SE(ˆi ) 1 2 ˆi. The implication is ̂ is statistically significant at 5% level of significance. Note: The standard error test is an approximated test (which is approximated from the z-test and t-test) and implies a two tail test conducted at 5% level of significance. ii) Student‟s t-test Like the standard error test, this test is also important to test the significance of the parameters. From your statistics, any variable X can be transformed into t using the general formula: 44 X t , with n-1 degree of freedom. sx Where i value of the population mean sx sample estimate of the population standard deviation ( X X ) 2 sx n 1 n sample size We can derive the t-value of the OLS estimates ̂ i tˆ SE(ˆ) with n-k degree of freedom. ˆ tˆ SE(̂ ) Where: SE = is standard error k = number of parameters in the model. Since we have two parameters in simple linear regression with intercept different from zero, our degree of freedom is n-2. Like the standard error test we formally test the hypothesis: H 0 : i 0 against the alternative H1 : i 0 for the slope parameter; and H0 : 0 against the alternative H1 : 0 for the intercept. To undertake the above test we follow the following steps. Step 1: Compute t*, which is called the computed value of t, by taking the value of in the null hypothesis. In our case 0 , then t* becomes: ˆ 0 ˆ t* SE(ˆ) SE(ˆ) Step 2: Choose level of significance. Level of significance is the probability of making „wrong‟ decision, i.e. the probability of rejecting the hypothesis when it is actually true or the probability of committing a type I error. It is customary in 45 econometric research to choose the 5% or the 1% level of significance. This means that in making our decision we allow (tolerate) five times out of a hundred to be „wrong‟ i.e. reject the hypothesis when it is actually true. Step 3: Check whether there is one tail test or two tail test. If the inequality sign in the alternative hypothesis is , then it implies a two tail test and divide the chosen level of significance by two; decide the critical rejoin or critical value of t called tc. But if the inequality sign is either > or < then it indicates one tail test and there is no need to divide the chosen level of significance by two to obtain the critical value of to from the t-table. Example: If we have H 0 : i 0 against: H1 : i 0 Then this is a two tail test. If the level of significance is 5%, divide it by two to obtain critical value of t from the t-table. Step 4: Obtain critical value of t, called tc at and n-2 degree of freedom for two 2 tail test. Step 5: Compare t* (the computed value of t) and tc (critical value of t) If t*> tc , reject H0 and accept H1. The conclusion is ˆ is statistically significant. If t*< tc , accept H0 and reject H1. The conclusion is ˆ is statistically insignificant. Numerical Example: Suppose that from a sample size n=20 we estimate the following consumption function: C 100 0.70 e (75.5) (0.21) 46 The values in the brackets are standard errors. We want to test the null hypothesis: H 0 : i 0 against the alternative H1 : i 0 using the t-test at 5% level of significance. a. the t-value for the test statistic is: ˆ 0 ˆ 0.70 t* = 3.3 SE(ˆ) SE(ˆ) 0.21 b. Since the alternative hypothesis (H1) is stated by inequality sign ( ) ,it is a two tail test, hence we divide 2 0.05 2 0.025 to obtain the critical value of „t‟ at 2 =0.025 and 18 degree of freedom (df) i.e. (n-2=20-2). From thet- table „tc‟ at 0.025 level of significance and 18 df is 2.10. c. Since t*=3.3 and tc=2.1, t*>tc. It implies that ˆ is statistically significant. iii) Confidence interval Rejection of the null hypothesis doesn‟t mean that our estimate ˆ and ˆ is the correct estimate of the true population parameter and . It simply means that our estimate comes from a sample drawn from a population whose parameter is different from zero. In order to define how close the estimate to the true parameter, we must construct confidence interval for the true parameter, in other words we must establish limiting values around the estimate with in which the true parameter is expected to lie within a certain “degree of confidence”. In this respect we say that with a given probability the population parameter will be with in the defined confidence interval (confidence limits). We choose a probability in advance and refer to it as confidence level (interval coefficient). It is customarily in econometrics to choose the 95% confidence level. This means that in repeated sampling the confidence limits, computed from the 47 sample, would include the true population parameter in 95% of the cases. In the other 5% of the cases the population parameter will fall outside the confidence interval. In a two-tail test at level of significance, the probability of obtaining the specific t-value either –tc or tc is 2 at n-2 degree of freedom. The probability of obtaining ˆ any value of t which is equal to at n-2 degree of freedom is SE(ˆ) 1 2 2 i.e. 1 . i.e. Pr tc t* tc 1 …………………………………………(2.57) ˆ but t* …………………………………………………….(2.58) ˆ SE( ) Substitute (2.58) in (2.57) we obtain the following expression. ˆ Pr tc t c 1 ………………………………………..(2.59) SE(ˆ) Pr SE( ˆ )t c ̂ SE( ̂ )t c 1 by multiplying SE(ˆ ) Pr ̂ SE( ˆ )t ̂ SE( ̂ )t 1 by subtracting ̂ c c Pr ̂ SE( ̂ ) ̂ SE( ˆ )t 1 by multiplying by 1 c Pr̂ SE( ˆ )t ˆ SE( ˆ )t 1 int erchanging c c The limit within which the true lies at (1 )% degree of confidence is: [ ̂ SE( ̂ )t c , ̂ SE( ̂ )t c ] ; where t c is the critical value of t at 2 confidence interval and n-2 degree of freedom. The test procedure is outlined as follows. H0 : 0 H1 : 0 Decision rule: If the hypothesized value of in the null hypothesis is within the confidence interval, accept H0 and reject H1. The implication is that ˆ is 48 statistically insignificant; while if the hypothesized va