Classical Linear Regression Model (CLRM) Overview PDF

Classical Linear Regression Model (CLRM): Overview Koen Inghelbrecht (Ghent University) Research Methods in Finance 1 / 73 Review Data Handling What are our research questions? Whi...

Classical Linear Regression Model (CLRM): Overview Koen Inghelbrecht (Ghent University) Research Methods in Finance 1 / 73 Review Data Handling What are our research questions? Which kind of data do we need to answer our research questions? Where do we find the data? Which database do we need? Do we have to transform the raw data? (create variables) How do our variables look like? (graphs, summary stats, correlations) What are our initial conclusions based on analyzing the variables? (causality?) Next step: Look for formal relationships between di!erent variables using regression analysis 2 / 73 Overview Agenda Topic 2 What is a regression model? Simple regression Theory Interpretation of OLS estimates Nonlinear regression models Classical linear regression model (CLRM) Assumptions Properties OLS estimator Precision and standard errors Multiple regression Interpretation of OLS estimates Regression with dummy variables 3 / 73 Course Material Course Material Topic 2 Required reading: Textbook Brooks (2019): Chapter 3 3.1 What is a Regression Model? 3.2 Regression versus Correlation 3.3 Simple Regression 3.4 Some Further Terminology 3.5 The Assumptions Underlying the Model 3.6 Properties of the OLS Estimator 3.7 Precision and Standard Errors Textbook Brooks (2019): Chapter 4 4.1 Generalizing the Simple Model 4.6 Qualitative Variables Background reading: Textbook Koop (Analysis of Financial Data): Chapters 4, 5, 6, 7 Textbook Koop (Introduction to Econometrics): Chapters 3, 4, 5 4 / 73 Wooclap Wooclap: Q&A + Multiple Choice Questions URL: app.wooclap.com/RMF2 5 / 73 Regression Model What is a Regression Model? We use a regression model to help understand the relationships between many variables Regression: General: Describing and evaluating the relationship between a given variable and one or more variables y More specfic: An attempt to explain movements in a variable by reference to movements in one or more other variables X 2 = Y: E E 6 / 73 Regression Model What is a Regression Model? Denote the dependent variable by y and the independent variable(s) by x1 , x2 ,..., xk where there are k independent variables. Simple regression: This is the situation where y depends on only one x variable, i.e. k=1. Regression is di!erent from correlation: & y ↓ Fixed Stallistic If we say y and x are correlated, it means that we are treating y and x in a completely symmetrical way. In regression, we treat the dependent variable (y) and the independent variable(s) (x’s) very di!erently. The y variable is assumed to be random or “stochastic” in some way, i.e. to have a probability distribution. The x variables are, however, assumed to have fixed (“non-stochastic”) values in repeated samples. 7 / 73 Simple Regression Introduction Simple Regression: Introduction Used to help understand the relationships between two variables Simple Regression as a best fitting line through the points in the XY-plot that best captures the relationship between 2 variables · y y y 100 :" ja 80 60 ·. 40 20.. 0 x 10 20 30 40 50 x 2 Question: What do we mean by “best fitting” line? 8 / 73 Simple Regression Theory Simple Regression: Theory Finding a line of best fit We can use the general equation for a straight line to get the line that best “fits” the data: y = α + βx α = intercept of line β = slope of line However, this equation (y = α + βx) is completely deterministic. = zonder enige onzekerheid Is this realistic? No. So what we do is to add a random disturbance term, u, into the equation: yt = α + βxt +8 ut where t = 1, 2, 3,... H 9 / 73 Simple Regression Theory Simple Regression: Theory Why do we include a Disturbance term? The disturbance term can capture a number of features: 1 We always leave out some determinants of yt (i.e. important variables which a!ect yt may be omitted) 2 Even if straight line relationship were true, we would never get all points on an XY-plot lying precisely on it due to errors in the measurement of yt that cannot be modeled GDP GROWTH 3 Random outside influences on yt which we cannot model 4 True relationship probably more complicated, straight line may just be an approximation Due to 1., 2., 3. and 4. we add an disturbance term 10 / 73 Simple Regression Theory Simple Regression Model Y ↑ yt = α + βxt + ut with ut = disturbance term What we know: yt and xt What we do not know: α, β or ut - eST INTE Regression analysis uses data (yt and xt ) to make a guess or estimate of what α and β are Notation: α̂ and β̂ are the estimates of α and β 11 / 73 Simple Regression Theory Distinction Between Disturbance Terms and Residuals True Regression Line (Population Regression Function): yt = α + βxt + ut ut = yt → α → βxt ut = disturbance term Estimated Regression Line (Sample Regression Function): yt = α̂ + β̂xt + ût ût = yt → α̂ → β̂xt ût = residual term Question: Why is estimated β̂ di!erent from the true β? (2) popration = 1000 firs -> sample to Firms Blus 12 / 73 Simple Regression Theory The Population versus the Sample Population = total collection of all objects or people to be studied Interested in Population of interest predicting outcome the entire electorate of an election Sample = a selection of just some items from the population Random sample = a sample in which each individual item in the population is equally likely to be drawn Population regression function (PRF) = a description of the model that is thought to be generating the actual data and the true relationship between the variables (i.e. the true values of α and β) Sample regression function (SRF) = model used to infer likely values of the PRF 13 / 73 Simple Regression Theory How do we choose α̂ and β̂? Consider the following reasoning: 1 We want to find a best fitting line through XY-plot. 2 With more than two points it is not possible to find a line that fits perfectly through all points.Â 3 Hence, find line which makes the residuals as small as possible. 4 What do we mean by “as small as possible”? The one that minimizes the sum of squared residuals. 5 Hence, we obtain the “ordinary least squares” or OLS estimator. y y 10 - yt 8 Ste ût 6 ˆyt 4 2 i 0 x 0 1 2 3 4 5 6 7 xt x 14 / 73 ye = 2 + x + G min RSS Simple Regression Theory Derivation of OLS Estimator 1 We have data on t = 1,..., T time periods which we call yt and xt 2 Any line we fit (choice of α̂ and β̂) will yield residuals ût Residual sum of squares = RSS = ∑T 2 t =1 ût 3 4 OLS estimator chooses α̂ and β̂ to minimize RSS T ∑ (yt → ȳ ) (xt → x̄ ) t =1 Solution: β̂ = 2 and α̂ = ȳ → β̂x̄ ∑T t =1 (xt →x̄ ) - 15 / 73 Simple Regression Theory Derivation of OLS Estimator 16 / 73 Simple Regression Theory Jargon of Regression yt = dependent variable xt = explanatory (or independent) variable α and β are coe!cients α̂ and β̂ are OLS estimates of coe!cients “Run a regression of y on x” Estimator or Estimate? Estimators = formulae used to calculate the coe!cients = formule Estimates = actual numerical values for the coe!cients = bekomen getal door formule 17 / 73 Simple Regression Theory Questions? URL: app.wooclap.com/RMF2 18 / 73 Simple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates t Interpretation of α̂ * yt = α̂ + β̂xt + ût Estimated value of yt if xt = 0 This is often not of interest Y = 2 24 = 0 Example: xi = lot size, yi = house price α̂ = estimated value of a house with lot size = 0 19 / 73 Simple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates Accuracy of Intercept Estimate Care needs to be exercised when considering the intercept estimate, particularly if there are no or few observations close to the y-axis y WAGE y HOUSE LES Price & 0 I 1 L 0 PROFE x nor x Lot 125 & FIRM 20 / 73 Simple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates yt = α̂ + β̂xt + ût Interpretation of β̂ 1 β̂ is estimate of the marginal e"ect xt on yt dy 2 Using regression model: dx = β̂ 3 A measure of how much yt tends to change when you change xt 4 “If xt changes by 1 unit then yt tends to change by β̂ units”, where “units” refers to what the variables are measured in (e.g. $, Â£, %, hectares, meters, etc.). Example: β̂ = 0.000842 8425 yi = executive compensation (millions of $) = dependent variable xi = profits (millions of $) = explanatory variable Â Important: Interpretation? (using data on N = 70 companies) 21 / 73 Simple Regression Example Simple Regression: An Example y = x + B( + V Suppose that we have the following data on the excess returns on a fund manager’s portfolio (“fund XXX”) together with the excess returns on a market index: xE Year, t ExcessYe return Excess return on market index = rXXX ,t → rft = rmt → rft 1 17.8 13.7 2 39.0 23.2 3 12.8 6.9 4 24.2 16.8 5 17.2 12.3 We have some intuition that the beta (in the CAPM framework) on this fund is positive, and we therefore want to find whether there appears to be a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables. 22 / 73 Simple Regression Example Simple Regression: An Example Graph (Scatter Diagram) 45 40 35 -B Excess return on fund XXX 30 = 1 64. 25 20 15 10 - 1. 74/ 5 0 0 5 10 15 20 25 Excess return on market portfolio 23 / 73 Simple Regression Example Simple Regression: An Example In the CAPM example used above, plugging the 5 observations in to make up the OLS formulae given above would lead to the estimates: α̂ = →1.74 β̂ = 1.64 We would write the fitted line as: 20% Il yˆt = →1.74 + 1.64xt Question: If an analyst tells you that she expects the market to yield a return 20% higher than the risk-free rate next year, what would you expect the return on fund XXX to be? Solution: We can say that the expected value of y = ’-1.74 + 1.64 * value of x’, so plug x = 20 into the equation to get the expected value for y: ŷt = →1.74 + 1.64 ↑ 20 = 31.06 24 / 73 Simple Regression R Time for a break! TIME FOR 10 BREAK! A MIN 31 / 73 Simple Regression R Example To Do Ufora: Content/Topic 2/Datasets/capm2.gdt Frequency: Monthly data (end-of-the-month) Period: January 2002 to February 2018 Source: Refinitiv Datastream Variables: Stock Price Index S&P500 (SANDP) Stock Price Ford (FORD) Stock Price General Electric (GE) Stock Price Microsoft (MICROSOFT) Stock Price Oracle (ORACLE) US Risk-free Rate (3-Month Treasury Bill, Yearly Basis, in %) (USTB3M) Variables (Topic 1): RF, RSANDP, ERSANDP, RFORD, ERFORD 25 / 73 Simple Regression R Simple Regression: Model Goal: Explain movements in the excess return of Ford by reference to movements in the excess return of the S&P500 The capital asset pricing model (CAPM) can be written as: E ( Ri ) = Rf + β i [ E ( Rm ) → Rf ] -Y The regression equation takes the form: (RFord → Rf )t = α + β (RS&P500 → Rf )t + ut - - Ye 26 26 / 73 Simple Regression R Simple Regression: Output Ford you e 27 / 73 Simple Regression R Simple Regression: Output Microsoft 28 / 73 Simple Regression R Multiple Choice Question (Wooclap) msB Question: Which of the following statements is correct? ⑰ 1 Microsoft has a lower exposure with respect to the market than Ford 2 Microsoft has a equal exposure with respect to the market than Ford 3 Microsoft has a higher exposure with respect to the market than Ford 4 I don’t know URL: app.wooclap.com/RMF1 29 / 73 Simple Regression R Questions? URL: app.wooclap.com/RMF2 30 / 73 Simple Regression Nonlinear Regression Models Nonlinear Regression Models So far regression of yt on xt : yt = α + βxt + ut In order to use OLS, we need a model which is linear in the % parameters (α and β) Linear in the parameters means that the parameters are not multiplied together, divided, squared or cubed etc. It does not necessarily have to be linear in the variables (y and x) 32 / 73 Simple Regression Nonlinear Regression Models Nonlinear Regression Models Regression equation expressed in ‘double logarithmic form’: ln Yt = α + β ln Xt + ut Y WAGs Car $ Then, let yt = ln Yt and xt = ln Xt : Xi = Profic Firm $ yt = α + βxt + ut Here, the coe!cients can be interpreted as elasticities: dy dyt d ln Yt y0 β= = = dx dxt d ln Xt x0 Elasticities are useful as they are unit-free β can be interpreted as ’a rise in X of 1% will lead on average, everything else being equal, to a rise in Y of β%’ 33 / 73 Simple Regression Nonlinear Regression Models Nonlinear Regression Models Nonlinear regression: Regression of yt (or ln (yt ) or yt2 ) on xt2 (or 1/xt or ln (xt ) or xt3 etc.) yt = α + βxt2 + ut Question: How might you know if relationship is nonlinear? Answer: Consulting financial theory or using theoretical insights Careful examination of XY-plots or residual plots or hypothesis testing procedures 34 / 73 Simple Regression Nonlinear Regression Models Nonlinear Regression Models Bso B20 2 yt = α + βx 2 t + ut + BeC4 Figure 4.2: A Quadratic Relationship Between 200 X and Y HY · 180 160 140 120 100 80 60 40 20 ↑ of is 0 0 1 % 2 3 4 5 6 20 Es 24 = AGE 35 / 73 Simple Regression Nonlinear Regression Models Questions? URL: app.wooclap.com/RMF2 36 / 73 Classical Linear Regression Model Assumptions Classical Linear Regression Model (CLRM): Assumptions ye a + Ba + - +non = Model used so far is known as classical (normal) linear regression model (CLRM) We observe data for xt , but since yt also depends on ut , we must be specific about how the ut are generated. We usually make the following set of assumptions about the ut ’s (the unobservable error terms): Technical Notation ↓ Interpretation 1 E ( ut ) = 0 ↓ Errors have zero mean 2 Var (ut ) = ε 2 ↓ Variance of the errors is constant 3 Cov (ui , uj ) = 0 ↓ Errors are statistically independent 4 Cov (ut , xt ) = 0 ↓ No relationship between error and x 5 ut is normally distributed ↓ To make inferences about parameters 37 / 73 D Classical Linear Regression Model Properties Classical Linear Regression Model (CLRM): Properties If assumptions of CLRM hold, then: 1 OLS estimator is unbiased: E ( β̂) = β The expected value of the OLS estimator is equal to the thing being estimated On average (in repeated sampling) the OLS estimate will be precisely equal to the value we want to estimate No systematic over- or underestimation of the true coe!cients Note: we cannot expect that β̂ will be exact equal to β 2 OLS estimator is e!cient (relative to other estimators) An estimator is said to be e!cient relative to other (unbiased) estimators if it has the smallest variance This implies that OLS estimator estimates β most accurately 38 / 73 population = 1000 -o B SAMPcen = to - B() Mi 2 = 70 - B(2) 3 = za - B(3 #B) - B Classical Linear Regression Model Properties E!cient Estimator If the estimator is e!cient, we are minimizing the probability that it is a long way o" from the true value of β. SE(B) & SELB) B si B) r a 39 / 73 Classical Linear Regression Model Properties Classical Linear Regression Model (CLRM): Properties If assumptions of CLRM hold, then: 1 OLS estimator is unbiased 2 OLS estimator is e!cient 3 OLS estimator has the following normal distribution ! " σ2 # $ 2 β̂ ⊋ N β, 2 or β̂ ⊋ N β, σβ̂ ∑ (xi → x̄ ) Used for hypothesis testing (i.e. t-test) 4 OLS estimators are the best linear unbiased estimators (BLUE) Best means ’has smallest variance’ (i.e. e!cient) Linear means ’OLS estimator is a linear function of random variable y ’ Referred to as the ’Gauss-Markov Theorem’ Why are these 4 properties important? Which one is most important? Relaxing one of the assumptions will have an impact on these ’desirable’ properties = Pitfalls in econometrics 40 / 73 Classical Linear Regression Model Properties Consistency versus Unbiasedness Consistent: T = + 0 B B = The OLS estimators ε̂ and β̂ are consistent That is, the estimates will converge to their true values as the sample size increases to infinity Need assumptions E (xt ut ) = 0 and Var (ut ) = σ2 < ∞ to prove this no ommited variable bias no heterogeneity Consistency implies that lim Pr [| β̂ → β| > ϱ] = 0 ↓ϱ > 0 T ↑∞ Unbiased: The OLS estimates of ε̂ and β̂ are unbiased That is E (ε̂) = ε and E ( β̂) = β. Thus on average the estimated value will be equal to the true values. To prove this also requires the assumption that E (ut ) = 0 Unbiasedness is a stronger condition than consistency (holds for both small and large samples) 41 / 73 Classical Linear Regression Model Properties Unbiasedness versus E!ciency OLS estimator has the smallest variance (i.e. e!cient) among the class of linear unbiased estimators It is possible to find another estimator with a lower variance than the OLS estimator, but that would not be linear and unbiased Hence, there is trade-of between bias and variance. Example: Efficient M B = 2 10 ↑ B = 10 NoCom 10 E 10 E, Bias is considered a more serious problem than variance => that’s why OLS estimator is core of econometric model-building 42 / 73 Classical Linear Regression Model Properties Questions? URL: app.wooclap.com/RMF2 43 / 73 Precision and Standard Errors Precision and Standard Errors Y ↑ ↑ ↑ y = 2 + 34 + U ε̂ and β̂ are only estimates of ε and β and specific to the sample used in their estimation Key question: How accurate/precise are these estimates? Statistical procedures allow us to formally address this question. 44 / 73 Precision and Standard Errors Precision of OLS Estimates What we need is some measure of the accuracy, reliability or precision of the estimators (ε̂ and β̂). The precision of the estimate is given by its standard error: % & SE (ε̂) = s ' & ∑ xt2 T ∑ (xt → x̄ )2 ( 1 6666 SE ( β̂) =/G s ∑ (xt → x̄ )2 ↑ with s the estimated standard deviation of the residuals, i.e. ( ↳4 s = ∑ ût2 Y ↑T → 2 See where ∑ ût2 is the residual sum of squares (RSS) and T is the sample size. 45 / 73 Precision and Standard Errors Driving Factors What Factors A!ect Precision of OLS Estimates? More precise estimates if 1 Larger number of data points 2 Less scattering (i.e. less variability in residuals; s is lower) 3 More variability in x Example: The next 4 figures all contain artificially generated data with ε = 0, β = 1 46 / 73 Data Set ! " 90% Confid. 95% Confid. 99% Confid. Precision and Standard Errors! Driving Factors Interval 5.3 Interval Figure 1.00 0.01 Interval [0.99,1.01] Figure 5.1 0.91 0.89 Figure 5.4[-1.57,3.39] [-0.92,2.75] 1.52 1.73 [-1.33,4.36] [-3.64,5.47] What Factors A!ect Figure 5.2 Figure 5.3 Precision 1.04 1.00 0.17 0.01 of OLS Estimates? [0.75,1.32] [0.99,1.01] [0.70,1.38] [0.99,1.02] [0.59,1.49] [0.98,1.03] ! "! Figure 5.4 1.52 1.73 [-1.33,4.36] [-1.88,4.91] [-2.98,6.02] 0.91 0.89 S5(p) ! "! ! "! 0.91 0.89 1.04 0.17 ! Figure 5.1: Very Small Sample Size "! ! " ! Variance Figure 5.2: Large Sample Size, Large Error · 5 1.04 0.17 1.00 0.01 8 4 ! "! ! "! 3 1.00 0.01 6 1.52 1.73 2 ! "! 4 1.52 1.73 Y 2 Y 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0 1 2 3 4 5 6 -1 -2 -2 X -4 X 47 / 73 ! Figure 5.4 1.52 1.73 [-1.33,4.36] [-1.88,4.91] [-2.98,6.02] Precision and Standard Errors Driving Factors 0.91 0.89 What Factors ! 0.91 A!ect " 0.89 Precision of OLS Estimates? ! ! 1.04 " 0.17 ! ! "! ! "! 1.04 0.17 1.00 0.01 ! "! ! "! 1.00 0.01 1.52 1.73 ! " ! Small Error Variance Figure 5.4: Limited Range of X Values Figure 5.3: Large Sample Size, 1.52 1.73 10 6 8 - 5 6 4 4 3 Y Y 2 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 2] 0 -2 0 1 2 3 4 5 6 -1 -4 X X Bins 48 / 73 Precision and Standard Errors Driving Factors What Factors A!ect Precision of OLS Estimates? % & 19 & SE (ε̂) = s ' & ∑x / 2 t T ∑ (x → x̄ )t 2 The term ∑ xt2 appears in the SE(ε̂) The reason is that ∑ xt2 measures how far the points are away from the y-axis. How should we interpret this term? Sanch EN LARGE &I 2. * 49 / 73 Precision and Standard Errors Driving Factors Questions? URL: app.wooclap.com/RMF2 50 / 73 Simple Regression Overview Simple Regression Model yt = ε + βxt + ut with ut = error What we know: xt and yt What we do not know: ε, β or ut Regression analysis uses data (xt and yt ) to make a guess or estimate of what ε and β are Notation: ε̂ and β̂ are the estimates of ε and β ↓ So() SICB 51 / 73 Multiple Regression Examples Multiple Regression BUT: What if our dependent (y) variable depends on more than one independent variable? Example: Data on N = 546 houses sold in Windsor, Canada Dependent variable: yi =sales price of house Explanatory (independent) variables: x1i =lot size of property (in square feet) x2i =number of bedrooms x3i =number of bathrooms x4i =number of storeys (excluding basement) Similarly, stock returns might depend on several factors. 52 / 73 Multiple Regression Estimation OLS Estimation Ye = 2 + Bylst BePet + +... + u Multiple regression model: S # yt = β 1 + β 2 x2t + β 3 x3t +... + β k xkt + ut , t = 1, 2,..., T - OLS estimates: β̂ 1 , β̂ 2 , β̂ 3 ,..., β̂ k Minimize Residual Sum of Squares: Min RSS = ∑ û) i2 * Be e, = ∑ yt → β̂1 → β̂2 x2t → β̂3 x3t +... → β̂k xkt Solution to minimization problem: A mess Excel/R will calculate OLS estimates 53 / 73 Multiple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates Mathematical Intuition Total vs. partial derivative Simple regression: dY dX = β ∂Y Multiple Regression: ∂Xj = βj Verbal intuition β j is marginal e"ect of Xj on Y , ceteris paribus β j is the e"ect of a small change in the jth explanatory variable on the dependent variable, holding all the other explanatory variables constant (or eliminating the e"ect of all other explanatory variables) Why is ceteris paribus condition important? 54 / 73 Multiple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates Example: Explaining House Prices Coeff. St.Err t-Stat P-val. Lower Upper 95% 95% Bea Interc. Size Bed. -4010 5.429 2825 3603 0.369 1215 -1.113 14.703 2.325 0.266 2.E-41 0.020 -11087 4.704 438.3 3068 6.155 5211 n Bath. Storeys 17105 7635 1734 1008 9.862 7.574 3.E-21 1.E-13 13698 5655 20512 9615 Fitted regression line: ŷ = →4010 + 5.429x1 + 2825x2 + 17105x3 + 7635x4 · 55 / 73 Multiple Regression Interpretation of OLS Estimates Interpretation of OLS Estimates Since β̂ 1 = 5.43 An extra square foot of lot size will tend to add $5.43 onto the price of a house, ceteris paribus. For houses with the same number of bedrooms, bathrooms and storeys, an extra square foot of lots size will tend to add $5.43 onto the price of a house. Since β̂ 2 = 2, 824.61 Adding one bedroom to your house will tend to increase its value by $2, 824.61, ceteris paribus. If we consider houses with comparable lot sizes and numbers of bathrooms and storeys, then those with an extra bedroom tend to be worth $2,824.61 more. 56 / 73 Multiple Regression R Multiple Regression: Model Goal: Explain movements in the excess return of Ford by reference to movements in the excess return of the S&P500 and in the riskfree rate The regression equation takes the form: (RFord → Rf )t = β 1 + β 2 (RS&P500 → Rf ) + β 3 Rf + ut 57 / 73 Multiple Regression R Multiple Regression: Output Ford Ye let chet 58 / 73 Multiple Regression R Multiple Choice Question (Wooclap) Question: How would you interpret the estimate of the coe!cient with respect to the riskfree rate (based on insights from present value models)? 1 A higher risk-free rate implies the discount rate decreases which has a negative e"ect on the stock price and hence the stock return of Ford 2 A higher risk-free rate implies the discount rate decreases which has a positive e"ect on the stock price and hence the stock return of Ford 3 A higher risk-free rate implies the discount rate increases which has a negative e"ect on the stock price and hence the stock return of Ford 4 A higher risk-free rate implies the discount rate increases which has a positive e"ect on the stock price and hence the stock return of Ford URL: app.wooclap.com/RMF2 = ↑ 59 / 73 Regression with Dummy Variables Regression with Qualitative/Dummy Variables Definition: Dummy variable is either 0 or 1. Use to turn qualitative (Yes/No) data into 1/0. StXM Important Concepts: Dummy variable trap Intercept dummies Slope dummy variables Komia YX 60 / 73 - · SALARY N AG5; Sacri = Be + Benc + By sexi + Bo (AG SeXi) + We - (Ba + By + (pc + Basi no : + Vi 1 ! Regression with Dummy Variables Questions? URL: www.wooclap.com/RMF2 61 / 73 Self-Study Self-Study Three types of self-study: Multiple Choice Questions (Wooclap) Self-study Questions (Textbook) PC Exercises with R(Studio) Solutions will be made available. 62 / 73 Self-Study MC Questions Multiple Choice Question (Wooclap) Question: Which of the following statements is correct concerning the conditions required for OLS to be a usable estimation technique? 1 The model must be linear in the parameters 2 The model must be linear in the variables 3 The model must be linear in the variables and the parameters 4 The model must be linear in the residuals 5 None of the answers is correct. URL: app.wooclap.com/RMF2 63 / 73 MC Questions Multiple Choice Question (Wooclap) Question: Which of these is NOT a reason for adding a disturbance term to a regression model yt = α + βxt + ut ? 1 Some determinants of the dependent variable may be omitted from the model 2 Some determinants of the dependent variable may be unobservable 3 Some determinants of the independent variable may be omitted from the model 4 There may be errors in the way that the dependent variable is measured which cannot be modelled 5 None of the answers is correct. URL: app.wooclap.com/RMF2 64 / 73 MC Questions Multiple Choice Question (Wooclap) Question: What is the most appropriate interpretation of the assumption concerning the regression disturbance terms? 1 The errors are nonlinearly independent of one another 2 The errors are linearly dependent of one another 3 The covariance of the errors is constant and finite over all its values 4 The errors are linearly independent of one another 5 None of the answers is correct. URL: app.wooclap.com/RMF2 65 / 73 MC Questions Multiple Choice Question (Wooclap) Question: Which one of the following is NOT an assumption of the classical linear regression model? 1 The explanatory variables are uncorrelated with the error terms 2 The disturbance terms have zero mean 3 The dependent variable is not correlated with the disturbance terms 4 The disturbance terms are independent of one another 5 None of the answers is correct. URL: app.wooclap.com/RMF2 66 / 73 MC Questions Multiple Choice Question (Wooclap) Question: Consider a bivariate regression model with coe!cient standard errors calculated using the usual formulae. Which of the following statements is/are correct regarding the standard error estimator for the slope coe!cient? (i) It varies positively with the square root of the residual variance (s) (ii) It varies positively with the spread of x about its mean value (iii) It varies positively with the spread of x about zero (iv) It varies positively with the sample size T 1 (i) only 2 (i) and (iv) only 3 (i), (ii) and (iv) only 4 (i), (ii), (iii) and (iv) 5 None of the answers is correct. URL: app.wooclap.com/RMF2 67 / 73 (b) Why are the vertical distances MC Questions squared before being added Self-study Questions together? (3A.36) (c) Why are the squares of the vertical distances taken rather Self-study Questions (Textbook) than the absolute values? 2. Explain, with SELF-STUDY the use of equations, the difference between the QUESTIONS sample regression function and the population regression function. 1. (a) Why does OLS estimation involve taking vertical deviations 3. Whatofisthe an points to theIsline estimator? therather OLS than horizontal estimator distances? superior to all other estimators? (b) Why are Why theorvertical why not? distances squared before being added 4. Whattogether? five assumptions are usually made about the unobservable errorWhy (c) termsareinthe thesquares classicalof linear regression the vertical modeltaken distances (CLRM)? rather Briefly explain than the meaning the absolute values?of each. Why are these assumptions made? with the use of equations, the difference between the 2. Explain, 5. Which sample ofregression the following modelsand function can the be estimated population(following regressiona suitable function.rearrangement if necessary) using ordinary least squares (OLS),iswhere 3. What X, y, Z areIsvariables an estimator? the OLSand α, β, γ are estimator parameters superior to all to be other estimated? estimators?(Hint: Why or thewhy models not?need to be linear in the parameters.) 4. What five assumptions are usually 206 made about the unobservable error terms in the classical linear regression model (CLRM)?(3.39) Briefly explain the meaning of each. Why are these assumptions made? (3.40) 5. Which of the following models can be estimated (following a suitable rearrangement if necessary) using ordinary least squares (3.41) (OLS), where X, y, Z are variables and α, β, γ are parameters to be estimated? (Hint: the models need to be linear in the parameters.) (3.42) 206 (3.43) 6. The capital asset pricing model (CAPM) can be written as 68 / 73 MC Questions PC Exercise with R PC Exercise: Data Ufora: Content/Topic 2/Datasets/equity.xls Cross-sectional data on N = 309 firms who sold new shares in year 1996 in US. Variables (in millions of US dollars, except for SEO): Value = total (market) value of all shares (Y) Debt = long-term debt held by firm Sales = total sales of firm Income = net income of the firm Assets = book value of assets of the firm SEO = dummy variable (SEO = 1 if SEO and 0 if IPO) 69 / 73 MC Questions PC Exercise with R PC Exercise: Questions 1 Run simple regressions of the variable Value on respectively Debt, Sales, Income and Assets. Interpret the OLS estimates for the constant and slope coe!cient. 2 Express the variables in 1000s of US dollars (instead of in millions of US dollars) and re-estimate the simple regressions. Does the transformation of the variables have an e"ect on your coe!cient estimates? 3 Run a multiple regression of the variable Value on Debt, Sales, Income and Assets. Interpret the OLS estimates for the constant and slope coe!cients. Is the interpretation di"erent as for the simple regressions? Explain the “partial e"ect” and how this is related to the “ceteris paribus” condition. 4 Extend the multiple regression by including the dummy variable SEO. Interpret the estimated coe!cient for the dummy variable. Are there reasons to suspect that IPOs are undervalued? 70 / 73 Key Concepts Key Concepts Regression model Disturbance term Population Sample Linear model Nonlinear model Simple regression Multiple regressoin OLS estimator OLS estimate Constant Slope Elasticities RSS Marginal e"ect Ceteris paribus Assumptions CLRM Consistency E!ciency Unbiasedness Precision Standard errors Dummy variables Dummy variable trap Intercep dummy Slope dummy variable 71 / 73 Summary Summary Simple regression quantifies the e"ect of an explanatory variable, x, on a dependent variable, y. The relationship between y and x is assumed to take the form, y = α + βx, where is the intercept and the slope of a straight line. This is called the regression line. The regression line is the best fitting line through an XY graph. No line will ever fit perfectly through all the points in an XY graph. The distance between each point and the line is called a residual. The ordinary least squares, OLS, estimator is the one which minimizes the sum of squared residuals. OLS provides estimates of α and β which are labelled α̂ and β̂. 72 / 73 Summary Summary Regression coe!cients should be interpreted as marginal e"ects (i.e. as measures of the e"ect on y of a small change in x). Precision of OLS estimates depends on number of data points, variability of the explanatory variable and variability of the errors. Regression lines do not have to be linear. To carry out nonlinear regression, merely replace y and/or x in the regression model by a suitable nonlinear transformation (e.g. ln (y ) or x 2 ). The multiple regression model is very similar to the simple regression model. The chapter emphasized only di"erences between the two. The interpretation of regression coe!cients is subject to ceteris paribus conditions. For instance, β j measures the marginal e"ect of xj on y , holding the other explanatory variables constant. Dummy variables can take on a value of either 0 or 1. They are often used with qualitative data. 73 / 73 Classical Linear Regression Model (CLRM): Hypothesis Testing Koen Inghelbrecht (Ghent University) Research Methods in Finance 1 / 81 Review Classical Linear Regression Model (CLRM): Overview What is a regression model? Simple versus Multiple Regression Interpretation of OLS Estimates Classical Linear Regression Model (CLRM): Assumptions + Properties Precision and standard errors Next step: Use standard errors for hypothesis testing 2 / 81 Overview Agenda Topic 3 Statistical Inference Hypothesis Testing Probability Distribution of OLS Estimators Test of Significance, Confidence Interval, t-ratio Goodness of Fit Statistics: R 2 and adjusted R 2 3 / 81 Course Material Course Material Required reading: Textbook Brooks (2019): Chapter 3 3.8 An Introduction to Statistical Inference 3.9 A Special Type of Hypothesis Test 3.10 An Example of a Simple t-test of a Theory 3.11 Can UK Unit Trust Managers Beat the Market? 3.13 The Exact Significance Level Textbook Brooks (2019): Chapter 4 4.7 Goodness of Fit Statistics 4.8 Hedonic Pricing Models Background reading: Textbook Koop (Analysis of Financial Data): Chapters 4, 5, 6, 7 Textbook Koop (Introduction to Econometrics): Chapters 3, 4, 5 4 / 81 Wooclap Wooclap: Q&A + Multiple Choice Questions URL: app.wooclap.com/RMF3 5 / 81 Statistical Inference Introduction An Introduction to Statistical Inference We want to make inferences about the likely population values from the regression parameters Example: Suppose we have the following regression results: ŷt = 20.3 + 0.5091xt (14.38) (0.2561) β̂ = 0.5091 is a single (point) estimate of the unknown population parameter, β. How “reliable” is this estimate? The reliability of the point estimate is measured by the coefficient’s standard error 6 / 81 Statistical Inference Hypothesis Testing Hypothesis Testing: Some Concepts We can use the information in the sample to make inferences about the population We will always have two hypotheses that go together, the null hypothesis (denoted H0 ) and the alternative hypothesis (denoted H1 ). Null hypothesis = statement or statistical hypothesis actually being tested Alternative hypothesis = remaining outcomes of interest For example, suppose given the regression results above, we are interested in the hypothesis that the true value of β is in fact 0.5. We would use the notation: H0 : β = 0.5 H1 : β ̸= 0.5 This would be known as a two-sided test 7 / 81 Statistical Inference Hypothesis Testing Hypothesis Testing: Some Concepts Sometimes we may have some prior information that, for example, we would expect β > 0.5 rather than β < 0.5. In this case, we would do a one-sided test: H0 : β = 0.5 H1 : β > 0.5 or H0 : β = 0.5 H1 : β < 0.5 There are two ways to conduct a hypothesis test: 1 via the test of significance approach 2 via the confidence interval approach We need statistical decision rule for formal testing of such hypothesis 8 / 81 Statistical Inference Probability Distribution Probability Distribution of OLS Estimators We assume that ut ∼ N (0, σ2 ) Since the least squares estimators are linear combinations of the random variables i.e. β̂ = ∑ wt yt The weighted sum of normal random variables is also normally distributed, so α̂ ∼ N (α, Var (α̂)) β̂ ∼ N ( β, Var ( β̂)) What if the errors are not normally distributed? Will the parameter estimates still be normally distributed? Yes, if the other assumptions of the CLRM hold, and the sample size is sufficiently large 9 / 81 Statistical Inference Probability Distribution Probability Distribution of OLS Estimators Standard normal variates can be constructed from α̂ and β̂: α̂ − α p ∼ N (0, 1) var (α̂) β̂ − β test statistic = q ∼ N (0, 1) var ( β̂) But var(α̂) and var( β̂) are unknown, so v ∑ xt2 u α̂ − α u ∼ tT − 2 with SE (α̂) = s t SE (α̂) T ∑ (xt − x̄ )2 s β̂ − β 1 ∼ tT −2 with SE ( β̂) = s SE ( β̂) ∑ (xt − x̄ )2 The standard error of the coefficient is a measure of how confident one is in the estimate of the coefficient 10 / 81 Statistical Inference t-distribution A Note on the t and the Normal Distribution You should all be familiar with the normal distribution and its characteristic “bell” shape. We can scale a normal variate to have zero mean and unit variance by subtracting its mean and dividing by its standard deviation. There is, however, a specific relationship between the t- and the standard normal distribution. Both are symmetrical and centered on zero. The t-distribution has another parameter, its degrees of freedom. We will always know this (for the time being from the number of observations −2). 11 / 81 Statistical Inference t-distribution What Does the t-Distribution Look Like? 12 / 81 Statistical Inference t-distribution Comparing the t and the Normal Distribution In the limit, a t-distribution with an infinite number of degrees of freedom is a standard normal, i.e. t (∞) = N (0, 1) Percentiles from distributions = Critical values (See Statistical tables): Significance level N (0, 1) t (40) t (4) 50% 0 0 0 5% 1.64 1.68 2.13 5% in total = 2.5% 1.96 2.02 2.78 0.5% 2.57 2.70 4.60 The reason for using the t-distribution rather than the standard normal is that we had to estimate σ2 , the variance of the disturbances. Note: Critical values for the t-distribution are larger in absolute value as there is increased uncertainty due to fact that the error variance must be estimated. 13 / 81 Statistical Inference t-distribution Critical Values of Student’s t-distribution 1 sided test 2 sided test 14 / 81 Statistical Inference Questions Questions? VRAGEN? (5’) URL: app.wooclap.com/RMF3 15 / 81 Statistical Inference Test of Significance Test of Significance Approach Assume the regression equation is given by H0: B = B* yt = α + βxt + ut for t = 1, 2,..., T H1: B =/ B* Steps involved in doing a test of significance: 1 Estimate α̂, β̂ and SE ( α̂ ), SE ( β̂ ) in the usual way 2 Calculate the test statistic: β̂− β∗ test statistic = SE ( β̂) where β∗ is the value of β under the null hypothesis. 3 We need some tabulated distribution with which to compare the estimated test statistics. Test statistics derived in this way can be shown to follow a t-distribution with T-2 degrees of freedom. 4 We need to choose a “significance level”, often denoted α (also called size of the test). It is conventional to use a significance level of 5% (but 10% and 1% are also commonly used). 16 / 81 Statistical Inference Test of Significance Test of Significance Approach 5 Given a significance level, we can determine a rejection region and non-rejection region (B^ - B*)/SE(B^) For a 2-sided test: H0: B = B* Test stat =< 1.96 H1 B =/ B* Test stat > 1.96 SE lower, Test stat higher H0 H1 H1 -1.96 0 1.96 17 / 81 Statistical Inference Test of Significance Test of Significance Approach Rejection region for a 1-sided test (upper tail): H0: B = B* H1: B > B* H0 H1 0 1.645 18 / 81 Statistical Inference Test of Significance Test of Significance Approach Rejection region for a 1-sided test (lower tail): 19 / 81 Statistical Inference Test of Significance Test of Significance Approach 1... 6 Use the t-tables to obtain a critical value or values with which to compare the test statistic 7 Finally perform the test. If the test statistic lies in the rejection region then reject the null hypothesis (H0 ), else do not reject H0. Remarks: If sample is sufficiently large, any null hypothesis can be rejected for fixed (e.g. 5%) size of test. Why? SE gets smaller so Test statistic gets higher Say: ’The null hypothesis is not rejected’; Do not say ’The null hypothesis is accepted’. 20 / 81 Statistical Inference Confidence Interval Approach Confidence Interval Approach Uncertainty about precision of the estimate can be summarized in a “confidence interval” Point estimate versus interval estimate β̂ = point estimate Confidence interval = interval estimate An example of its usage: We estimate a parameter, say to be 0.93, and a “95% confidence interval” to be (0.77, 1.09). This means that we are 95% confident that the interval containing the true (but unknown) value of β. Confidence intervals are almost invariably two-sided, although in theory a one-sided interval can be constructed. Note: Test of Significance and Confidence Interval approaches always give the same answer 21 / 81 Statistical Inference Confidence Interval Approach Confidence Interval Approach 1 Calculate α̂, β̂ and SE (α̂), SE ( β̂) as before. 2 Choose a significance level, α, (again the convention is 5%). This is equivalent to choosing a (1-α)×100% confidence interval, i.e. 5% significance level = 95% confidence interval 3 Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of freedom. 4 The confidence interval is given by ( β̂ − tcrit × SE ( β̂), β̂ + tcrit × SE ( β̂)) 5 Perform the test: If the hypothesised value of β (β∗ ) lies outside the confidence interval, then reject the null hypothesis that β = β∗ , otherwise do not reject the null. 22 / 81 Statistical Inference Confidence Interval Approach Confidence Interval Approach Example: Confidence intervals for the data sets in the 4 figures that contain artificially generated data with α = 0, β = 1 Figure 5.1: Very Small Sample Size Figure 5.4: Limited Range of X Values Figure 5.2: Large Sample Size, Large Error Variance 5 Figure 5.3: Large Sample Size, Small Error Variance 10 8 4 6 8 6 3 5 6 4 4 2 4 Y 3 2 Y 1 Y Y 2 2 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0 1 2 3 4 5 6 1 0 0 0.5 1 1.5 2 2.5 3 3.5 -1 -2 0 -2 0 1 2 3 4 5 6 -2 X -1 -4 -4 X X X Data Set ! "! 90% Confid. 95% Confid. 99% Confid. =SE Interval Interval Interval Figure 5.1 0.91 0.89 [-0.92,2.75] [-1.57,3.39] [-3.64,5.47] Figure 5.2 1.04 0.17 [0.75,1.32] [0.70,1.38] [0.59,1.49] Figure 5.3 1.00 0.01 [0.99,1.01] [0.99,1.02] [0.98,1.03] Figure 5.4 1.52 1.73 [-1.33,4.36] [-1.88,4.91] [-2.98,6.02] Interpret!results "(remember ! factors influencing accuracy of OLS estimates) 0.91 0.89 23 / 81 Statistical Inference Example Example H0: B = 1 Consider again the following regression results: H1: B = 1 ŷt = 20.3 + 0.5091xt , T = 22 (14.38) (0.2561) SE( ) SE( B ) Using both the test of significance and confidence interval approaches, test the hypothesis that β = 1 against a two-sided alternative. The first step is to obtain the critical value: We want tcrit = t20;5% 24 / 81 Statistical Inference Example Determining the Rejection Region 25 / 81 Statistical Inference Example Performing the Test The hypotheses are: H0 : β = 1 vs. H1 : β ̸ = 1 Test of significance approach Confidence interval approach β̂ − β∗ test stat = SE ( β̂) Find tcrit = t20;5% = ±2.086 0.5091 − 1 = = −1.917 0.2561 β̂ ± tcrit · SE ( β̂) = 0.5091 ± 2.086 · 0.2561 = (−0.0251, 1.0433) Do not reject H0 since test statistic Do not reject H0 since 1 lies lies within non-rejection region within the confidence interval 26 / 81 Statistical Inference Other Hypothesis Testing Other Hypothesis What if we wanted to test H0 : β = 0 or H0 : β = 2? Note that we can test these with the confidence interval approach For interest (!), test H0 : β = 0 vs. H1 : β ̸ = 0 H0 : β = 2 vs. H1 : β ̸ = 2 27 / 81 Statistical Inference Size of the Test = 5% -> H0 Changing the Size of the Test =10% -> H1 But note that we looked at only a 5% size of test. In marginal cases (e.g. H0 : β = 1), we may get a completely different answer if we use a different size of test. This is where the test of significance approach is better than a confidence interval. For example, say we wanted to use a 10% size of test. Consider again the following regression results: ŷt = 20.3 + 0.5091xt , T = 22 (14.38) (0.2561) Using the test of significance approach: β̂ − β∗ 0.5091 − 1 test stat = = = −1.917 SE ( β̂) 0.2561 as above. The only thing that changes is the critical t-value. 28 / 81 Statistical Inference Size of the Test Changing the Size of the Test: New Rejection Region always better to reject statistic with lower sig level Conclusion: t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we would reject H0. 29 / 81 Statistical Inference t-ratio A Special Type of Hypothesis Test: The t − ratio Recall that the formula for a test of significance approach to hypothesis testing using a t-test was: β̂ − β∗ test statistic = SE ( β̂) H0 : β = 0 If the test is: H1 : β ̸ = 0 i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a t-ratio test. Since β∗ = 0, test stat = β̂ SE ( β̂) The ratio of the coefficient to its SE is known as the t-ratio or t-statistic 30 / 81 Statistical Inference t-ratio The t-ratio: An Example Suppose that we have the following parameter estimates, standard errors and t-ratios for an intercept and slope respectively: α̂ β̂ Coefficient 1.10 -19.88 SE 1.35 1.98 t-ratio 0.81 -10.04 Compare this with a tcrit with 15-2 = 13 d.f. (2.5% in each tail for a 5% test) = 2.16 5% = 3.01 1% Do we reject H0 : α = 0? (No) H0 : β = 0? (Yes) 31 / 81 Statistical Inference t-ratio What does the t-ratio tell us? If we reject H0 , we say that the result is significant. If the coefficient is not “significant”, then it means that the variable is not helping to explain variations in y. Variables that are not significant are sometimes removed from the regression model. In practice there are good statistical reasons for always having a constant even if it is not significant. Look at what happens if no intercept is included: yt xt 32 / 81 Statistical Inference Terminology Some More Terminology If we reject the null hypothesis at the 5% level, we say that the result of the test is statistically significant. Jargon: “The coefficient on xt is significantly different from zero.” “xt has statistically significant explanatory power for yt.” “The (null) hypothesis that β = 0 can be rejected at the 5% significance level.” Note that a statistically significant result may be of no practical significance. E.g. if a shipment of cans of beans is expected to weigh 450g per tin, but the actual mean weight of some tins is 449g, the result may be highly statistically significant but presumably nobody would care about 1g of beans. Advice: Take into account both statistical and economic significance! 33 / 81 Statistical Inference p-value The Exact Significance Level or p-value This is equivalent to choosing an infinite number of critical t-values from tables. It gives us the marginal significance level where we would be indifferent between rejecting and not rejecting the null hypothesis. If the test statistic is large in absolute value, the p-value will be small, and vice versa p-value gives the plausibility of the null hypothesis. Example: test statistic = 1.47 (distributed as a t62 ) p-value = 0.12 Interpretation: Do we reject at the 5% level?...........................No 0.12 > 0.05 Do we reject at the 10% level?.........................No 0.12 > 0.10 Do we reject at the 20% level?.........................Yes 0.12 < 0.20 34 / 81 Statistical Inference Questions Questions? VRAGEN? (5’) URL: app.wooclap.com/RMF3 35 / 81 Break Time for a break! TIME FOR 10 BREAK! A MIN 36 / 81 Statistical Inference R Example Ufora: Content/Topic 3/Datasets/capm3.gdt Frequency: Monthly data (end-of-the-month) Period: January 2002 to February 2018 Source: Refinitiv Datastream Variables: Stock Price Index S&P500 (SANDP) Stock Price Ford (FORD) Stock Price General Electric (GE) Stock Price Microsoft (MICROSOFT) Stock Price Oracle (ORACLE) US Risk-free Rate (3-Month Treasury Bill, Yearly Basis, in %) (USTB3M) Variables (Topic 1/2): RF, RSANDP, ERSANDP, RFORD, ERFORD,RMICROSOFT, ERMICROSOFT 37 / 81 Statistical Inference R Hypothesis Testing: Test of Significance Approach Goal: Test whether the CAPM beta of Microsoft with respect to S&P500 is equal to one -> What does a CAPM beta of 1 means? The capital asset pricing model (CAPM) can be written as: E (Ri ) = Rf + β i [E (Rm ) − Rf ] The regression equation takes the form: (RMS − Rf )t = α + β (RS&P500 − Rf )t + ut Two-sided t-test: H0 : β = 1 H1 : β ̸ = 1 38 / 81 Statistical Inference R Hypothesis Testing: Test of Significance Approach Regression Output for Microsoft: 39 / 81 Statistical Inference R Hypothesis Testing: Test of Significance Approach Steps involved in doing a test of significance: 1 Estimate coefficients and standard errors 2 Calculate the test statistic: β̂− β∗ 1.00785−1 test statistic = SE ( β̂) = 0.0964965 = 0.08135 3 Use the t-tables to obtain a critical value (assuming a significance level of 5%) not don't 4 Perform the test. Test statistic lies in the rejection region => reject the null hypothesis (H0 ) 40 / 81 Statistical Inference R Hypothesis Testing: Test of Significance Approach Regression Output for Ford: 41 / 81 Statistical Inference R Multiple Choice Question (Wooclap) Question: Test whether the CAPM beta of Ford with respect to S&P500 is significantly higher than 1. What is the correct answer? 1 The test statistic is equal to 4.14431 and the critical value to 1.97246, hence the beta is not significantly higher than 1 2 The test statistic is equal to 4.14431 and the critical value to 1.97246, hence the beta is significantly higher than 1 3 The test statistic is equal to 4.14431 and the critical value to 1.65287, hence the beta is not significantly higher than 1 4 The test statistic is equal to 4.14431 and the critical value to 1.65287, hence the beta is significantly higher than 1 5 I don’t know 1.65 because it is 2 sided URL: app.wooclap.com/RMF3 42 / 81 Statistical Inference R Hypothesis Testing: Confidence Interval Approach Goal: Test the following hypothesis for the CAPM beta of Ford: H0 : β = 1 H1 : β ̸ = 1 Output when doing confidence interval approach in R: 43 / 81 Statistical Inference R Hypothesis Testing: Confidence Interval Approach Goal: Test the following hypothesis for the CAPM beta of Microsoft: H0 : β = 1 H1 : β ̸ = 1 Output when doing confidence interval approach in R: 44 / 81 Statistical Inference R Hypothesis Testing: t-ratio The capital asset pricing model (CAPM) implies that the alpha (i.e. constant) in the regression model is zero: E (Ri ) = Rf + β i [E (Rm ) − Rf ] versus (RFord − Rf )t = α + β (RS&P500 − Rf )t + ut Goal: Test the following hypothesis for the CAPM alpha of Ford: H0 : α = 0 H1 : α ̸ = 0 45 / 81 Statistical Inference R Hypothesis Testing: t-ratio

Classical Linear Regression Model (CLRM) Overview PDF

Document Details

Tags

Related

Summary

Full Transcript