Introductory Econometrics for Finance PDF

Chapter 1 Introduction Introductory Econometrics for Finance © Chris Brooks 2014 1 The Nature and Purpose of Econometrics What is Econometrics? Literal meaning is “measurement in economics”. Definition of financial econometrics: The application of statistical and mathematical techniques to problems in finance. Introductory Econometrics for Finance © Chris Brooks 2014 2 Examples of the kind of problems that may be solved by an Econometrician 1. Testing whether financial markets are weak-form informationally efficient. 2. Testing whether the CAPM or APT represent superior models for the determination of returns on risky assets. 3. Measuring and forecasting the volatility of bond returns. 4. Explaining the determinants of bond credit ratings used by the ratings agencies. 5. Modelling long-term relationships between prices and exchange rates Introductory Econometrics for Finance © Chris Brooks 2014 3 Examples of the kind of problems that may be solved by an Econometrician (cont’d) 6. Determining the optimal hedge ratio for a spot position in oil. 7. Testing technical trading rules to determine which makes the most money. 8. Testing the hypothesis that earnings or dividend announcements have no effect on stock prices. 9. Testing whether spot or futures markets react more rapidly to news. 10.Forecasting the correlation between the returns to the stock indices of two countries. Introductory Econometrics for Finance © Chris Brooks 2014 4 What are the Special Characteristics of Financial Data? Frequency & quantity of data Stock market prices are measured every time there is a trade or somebody posts a new quote. Quality Recorded asset prices are usually those at which the transaction took place. No possibility for measurement error but financial data are “noisy”. Introductory Econometrics for Finance © Chris Brooks 2014 5 Types of Data and Notation There are 3 types of data which econometricians might use for analysis: 1. Time series data 2. Cross-sectional data 3. Panel data, a combination of 1. & 2. The data may be quantitative (e.g. exchange rates, stock prices, number of shares outstanding), or qualitative (e.g. day of the week). Examples of time series data Series Frequency GNP or unemployment monthly, or quarterly government budget deficit annually money supply weekly value of a stock market index as transactions occur Introductory Econometrics for Finance © Chris Brooks 2014 6 Time Series versus Cross-sectional Data Examples of Problems that Could be Tackled Using a Time Series Regression - How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals. - How the value of a company’s stock price has varied when it announced the value of its dividend payment. - The effect on a country’s currency of an increase in its interest rate Cross-sectional data are data on one or more variables collected at a single point in time, e.g. - A poll of usage of internet stock broking services - Cross-section of stock returns on the New York Stock Exchange - A sample of bond credit ratings for UK banks Introductory Econometrics for Finance © Chris Brooks 2014 7 Cross-sectional and Panel Data Examples of Problems that Could be Tackled Using a Cross-Sectional Regression - The relationship between company size and the return to investing in its shares - The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt. Panel Data has the dimensions of both time series and cross-sections, e.g. the daily prices of a number of blue chip stocks over two years. It is common to denote each observation by the letter t and the total number of observations by T for time series data, and to to denote each observation by the letter i and the total number of observations by N for cross-sectional data. Introductory Econometrics for Finance © Chris Brooks 2014 8 Continuous and Discrete Data Continuous data can take on any value and are not confined to take specific numbers. Their values are limited only by precision. o For example, the rental yield on a property could be 6.2%, 6.24%, or 6.238%. On the other hand, discrete data can only take on certain values, which are usually integers o For instance, the number of people in a particular underground carriage or the number of shares traded during a day. They do not necessarily have to be integers (whole numbers) though, and are often defined to be count numbers. o For example, until recently when they became ‘decimalised’, many financial asset prices were quoted to the nearest 1/16 or 1/32 of a dollar. Introductory Econometrics for Finance © Chris Brooks 2014 9 Cardinal, Ordinal and Nominal Numbers Another way in which we could classify numbers is according to whether they are cardinal, ordinal, or nominal. Cardinal numbers are those where the actual numerical values that a particular variable takes have meaning, and where there is an equal distance between the numerical values. o Examples of cardinal numbers would be the price of a share or of a building, and the number of houses in a street. Ordinal numbers can only be interpreted as providing a position or an ordering. o Thus, for cardinal numbers, a figure of 12 implies a measure that is `twice as good' as a figure of 6. On the other hand, for an ordinal scale, a figure of 12 may be viewed as `better' than a figure of 6, but could not be considered twice as good. Examples of ordinal numbers would be the position of a runner in a race. Introductory Econometrics for Finance © Chris Brooks 2014 10 Cardinal, Ordinal and Nominal Numbers (Cont’d) Nominal numbers occur where there is no natural ordering of the values at all. o Such data often arise when numerical values are arbitrarily assigned, such as telephone numbers or when codings are assigned to qualitative data (e.g. when describing the exchange that a US stock is traded on. Cardinal, ordinal and nominal variables may require different modelling approaches or at least different treatments, as should become evident in the subsequent chapters. Introductory Econometrics for Finance © Chris Brooks 2014 11 Returns in Financial Modelling It is preferable not to work directly with asset prices, so we usually convert the raw prices into a series of returns. There are two ways to do this: Simple returns or log returns p − pt −1  p  Rt = t 100% Rt = ln  t  100 % pt −1  pt −1  where, Rt denotes the return at time t pt denotes the asset price at time t ln denotes the natural logarithm We also ignore any dividend payments, or alternatively assume that the price series have been already adjusted to account for them. Introductory Econometrics for Finance © Chris Brooks 2014 12 Log Returns The returns are also known as log price relatives, which will be used throughout this book. There are a number of reasons for this: 1. They have the nice property that they can be interpreted as continuously compounded returns. 2. Can add them up, e.g. if we want a weekly return and we have calculated daily log returns: r1 = ln p1/p0 = ln p1 - ln p0 r2 = ln p2/p1 = ln p2 - ln p1 r3 = ln p3/p2 = ln p3 - ln p2 r4 = ln p4/p3 = ln p4 - ln p3 r5 = ln p5/p4 = ln p5 - ln p4 ⎯⎯⎯⎯⎯ ln p5 - ln p0 = ln p5/p0 Introductory Econometrics for Finance © Chris Brooks 2014 13 A Disadvantage of using Log Returns There is a disadvantage of using the log-returns. The simple return on a portfolio of assets is a weighted average of the simple returns on the individual assets: N Rpt =  wip Rit i =1 But this does not work for the continuously compounded returns. Introductory Econometrics for Finance © Chris Brooks 2014 14 Real Versus Nominal Series The general level of prices has a tendency to rise most of the time because of inflation We may wish to transform nominal series into real ones to adjust them for inflation This is called deflating a series or displaying a series at constant prices We do this by taking the nominal series and dividing it by a price deflator: real seriest = nominal seriest  100 / deflatort (assuming that the base figure is 100) We only deflate series that are in nominal price terms, not quantity terms. Introductory Econometrics for Finance © Chris Brooks 2014 15 Deflating a Series If we wanted to convert a series into a particular year’s figures (e.g. house prices in 2010 figures), we would use: real seriest = nominal seriest  deflatorreference year / deflatort This is the same equation as the previous slide except with the deflator for the reference year replacing the assumed deflator base figure of 100 Often the consumer price index, CPI, is used as the deflator series. Introductory Econometrics for Finance © Chris Brooks 2014 16 Steps involved in the formulation of econometric models Economic or Financial Theory (Previous Studies) Formulation of an Estimable Theoretical Model Collection of Data Model Estimation Is the Model Statistically Adequate? No Yes Reformulate Model Interpret Model Use for Analysis 17 Introductory Econometrics for Finance © Chris Brooks 2014 Some Points to Consider when reading papers in the academic finance literature 1. Does the paper involve the development of a theoretical model or is it merely a technique looking for an application, or an exercise in data mining? 2. Is the data of “good quality”? Is it from a reliable source? Is the size of the sample sufficiently large for asymptotic theory to be invoked? 3. Have the techniques been validly applied? Have diagnostic tests been conducted for violations of any assumptions made in the estimation of the model? Introductory Econometrics for Finance © Chris Brooks 2014 18 Some Points to Consider when reading papers in the academic finance literature (cont’d) 4. Have the results been interpreted sensibly? Is the strength of the results exaggerated? Do the results actually address the questions posed by the authors? 5. Are the conclusions drawn appropriate given the results, or has the importance of the results of the paper been overstated? Introductory Econometrics for Finance © Chris Brooks 2014 19 Bayesian versus Classical Statistics The philosophical approach to model-building used here throughout is based on ‘classical statistics’ This involves postulating a theory and then setting up a model and collecting data to test that theory Based on the results from the model, the theory is supported or refuted There is, however, an entirely different approach known as Bayesian statistics Here, the theory and model are developed together The researcher starts with an assessment of existing knowledge or beliefs formulated as probabilities, known as priors The priors are combined with the data into a model Introductory Econometrics for Finance © Chris Brooks 2014 20 Bayesian versus Classical Statistics (Cont’d) The beliefs are then updated after estimating the model to form a set of posterior probabilities Bayesian statistics is a well established and popular approach, although less so than the classical one Some classical researchers are uncomfortable with the Bayesian use of prior probabilities based on judgement If the priors are very strong, a great deal of evidence from the data would be required to overturn them So the researcher would end up with the conclusions that he/she wanted in the first place! In the classical case by contrast, judgement is not supposed to enter the process and thus it is argued to be more objective. Introductory Econometrics for Finance © Chris Brooks 2014 21 Chapter 2 Mathematical and Statistical Foundations ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 1 Functions A function is a mapping or relationship between an input or set of inputs and an output We write that y, the output, is a function f of x, the input, or y = f(x) y could be a linear function of x where the relationship can be expressed on a straight line Or it could be non-linear where it would be expressed graphically as a curve If the equation is linear, we would write the relationship as y = a + bx where y and x are called variables and a and b are parameters a is the intercept and b is the slope or gradient ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 2 Straight Lines The intercept is the point at which the line crosses the y-axis Example: suppose that we were modelling the relationship between a student’s average mark, y (in percent), and the number of hours studied per year, x Suppose that the relationship can be written as a linear function y = 25 + 0.05x The intercept, a, is 25 and the slope, b, is 0.05 This means that with no study (x=0), the student could expect to earn a mark of 25% For every hour of study, the grade would on average improve by 0.05%, so another 100 hours of study would lead to a 5% increase in the mark ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 3 Plot of Hours Studied Against Mark Obtained ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 4 Straight Lines In the graph above, the slope is positive – i.e. the line slopes upwards from left to right But in other examples the gradient could be zero or negative For a straight line the slope is constant – i.e. the same along the whole line In general, we can calculate the slope of a straight line by taking any two points on the line and dividing the change in y by the change in x  (Delta) denotes the change in a variable For example, take two points x=100, y=30 and x=1000, y=75 We can write these using coordinate notation (x,y) as (100,30) and (1000,75) We would calculate the slope as ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 5 Roots The point at which a line crosses the x-axis is known as the root A straight line will have one root (except for a horizontal line such as y=4 which has no roots) To find the root of an equation set y to zero and rearrange 0 = 25 + 0.05x So the root is x = −500 In this case it does not have a sensible interpretation: the number of hours of study required to obtain a mark of zero! ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 6 Quadratic Functions A linear function is often not sufficiently flexible to accurately describe the relationship between two series We could use a quadratic function instead. We would write it as y = a + bx + cx2 where a, b, c are the parameters that describe the shape of the function Quadratics have an additional parameter compared with linear functions The linear function is a special case of a quadratic where c=0 a still represents where the function crosses the y-axis As x becomes very large, the x2 term will come to dominate Thus if c is positive, the function will be -shaped, while if c is negative it will be -shaped. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 7 The Roots of Quadratic Functions A quadratic equation has two roots The roots may be distinct (i.e., different from one another), or they may be the same (repeated roots); they may be real numbers (e.g., 1.7, -2.357, 4, etc.) or what are known as complex numbers The roots can be obtained either by factorising the equation (contracting it into parentheses), by ‘completing the square’, or by using the formula: ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 8 The Roots of Quadratic Functions (Cont’d) If b2 > 4ac, the function will have two unique roots and it will cross the x- axis in two separate places If b2 = 4ac, the function will have two equal roots and it will only cross the x-axis in one place If b2 < 4ac, the function will have no real roots (only complex roots), it will not cross the x-axis at all and thus the function will always be above the x-axis. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 9 Calculating the Roots of Quadratics - Examples Determine the roots of the following quadratic equations: 1. y = x2 + x − 6 2. y = 9x2 + 6x + 1 3. y = x2 − 3x + 1 4. y = x2 − 4x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 10 Calculating the Roots of Quadratics - Solutions We solve these equations by setting them in turn to zero We could use the quadratic formula in each case, although it is usually quicker to determine first whether they factorise 1. x2 + x − 6 = 0 factorises to (x − 2)(x + 3) = 0 and thus the roots are 2 and −3, which are the values of x that set the function to zero. In other words, the function will cross the x-axis at x = 2 and x = −3 2. 9x2 + 6x + 1 = 0 factorises to (3x + 1)(3x + 1) = 0 and thus the roots are −1/3 and −1/3. This is known as repeated roots – since this is a quadratic equation there will always be two roots but in this case they are both the same. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 11 Calculating the Roots of Quadratics – Solutions Cont’d 3. x2 − 3x + 1 = 0 does not factorise and so the formula must be used with a = 1, b = −3, c = 1 and the roots are 0.38 and 2.62 to two decimal places 4. x2 − 4x = 0 factorises to x(x − 4) = 0 and so the roots are 0 and 4. All of these equations have two real roots But if we had an equation such as y = 3x2 − 2x + 4, this would not factorise and would have complex roots since b2 − 4ac < 0 in the quadratic formula. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 12 Powers of Number or of Variables A number or variable raised to a power is simply a way of writing repeated multiplication So for example, raising x to the power 2 means squaring it (i.e., x2 = x × x). Raising it to the power 3 means cubing it (x3 = x × x × x), and so on The number that we are raising the number or variable to is called the index, so for x3, the index would be 3 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 13 Manipulating Powers and their Indices Any number or variable raised to the power one is simply that number or variable, e.g., 31 = 3, x1 = x, and so on Any number or variable raised to the power zero is one, e.g., 50 = 1, x0 = 1, etc., except that 00 is not defined (i.e., it does not exist) If the index is a negative number, this means that we divide one by that number – for example, x−3 = 1/(x3) = 1/(x×x×x ) If we want to multiply together a given number raised to more than one power, we would add the corresponding indices together – for example, x2 × x3 = x2x3 = x2+3 = x5 If we want to calculate the power of a variable raised to a power (i.e., the power of a power), we would multiply the indices together – for example, (x2)3 = x2×3 = x6 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 14 Manipulating Powers and their Indices (Cont’d) If we want to divide a variable raised to a power by the same variable raised to another power, we subtract the second index from the first – for example, x3 / x2 = x3−2 = x If we want to divide a variable raised to a power by a different variable raised to the same power, the following result applies: (x / y)n = xn / yn The power of a product is equal to each component raised to that power – for example, (x × y)3 = x3 × y3 The indices for powers do not have to be integers, so x1/2 is the notation we would use for taking the square root of x, sometimes written √x Other, non-integer powers are also possible, but are harder to calculate by hand (e.g. x0:76, x−0:27, etc.) In general, x1/n = n√x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 15 The Exponential Function, e It is sometimes the case that the relationship between two variables is best described by an exponential function For example, when a variable grows (or reduces) at a rate in proportion to its current value, we would write y = ex e is a simply number: 2.71828... It is also useful for capturing the increase in value of an amount of money that is subject to compound interest The exponential function can never be negative, so when x is negative, y is close to zero but positive It crosses the y-axis at one and the slope increases at an increasing rate from left to right. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 16 A Plot of the Exponential Function ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 17 Logarithms Logarithms were invented to simplify cumbersome calculations, since exponents can then be added or subtracted, which is easier than multiplying or dividing the original numbers There are at least three reasons why log transforms may be useful. 1. Taking a logarithm can often help to rescale the data so that their variance is more constant, which overcomes a common statistical problem known as heteroscedasticity. 2. Logarithmic transforms can help to make a positively skewed distribution closer to a normal distribution. 3. Taking logarithms can also be a way to make a non-linear, multiplicative relationship between variables into a linear, additive one. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 18 How do Logs Work? Consider the power relationship 23 = 8 Using logarithms, we would write this as log28 = 3, or ‘the log to the base 2 of 8 is 3’ Hence we could say that a logarithm is defined as the power to which the base must be raised to obtain the given number More generally, if ab = c, then we can also write logac = b If we plot a log function, y = log(x), it would cross the x-axis at one – see the following slide It can be seen that as x increases, y increases at a slower rate, which is the opposite to an exponential function where y increases at a faster rate as x increases. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 19 A Graph of a Log Function ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 20 How do Logs Work? Natural logarithms, also known as logs to base e, are more commonly used and more useful mathematically than logs to any other base A log to base e is known as a natural or Naperian logarithm, denoted interchangeably by ln(y) or log(y) Taking a natural logarithm is the inverse of a taking an exponential, so sometimes the exponential function is called the antilog The log of a number less than one will be negative, e.g. ln(0.5) ≈ −0.69 We cannot take the log of a negative number – So ln(−0.6), for example, does not exist. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 21 The Laws of Logs For variables x and y: ln (x y) = ln (x) + ln (y) ln (x/y) = ln (x) − ln (y) ln (yc) = c ln (y) ln (1) = 0 ln (1/y) = ln (1) − ln (y) = −ln (y) ln(ex) = eln(x) = x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 22 Sigma Notation If we wish to add together several numbers (or observations from variables), the sigma or summation operator can be very useful Σ means ‘add up all of the following elements.’ For example, Σ(1 + 2 + 3) =6 In the context of adding the observations on a variable, it is helpful to add ‘limits’ to the summation For instance, we might write where the i subscript is an index, 1 is the lower limit and 4 is the upper limit of the sum This would mean adding all of the values of x from x1 to x4. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 23 Properties of the Sigma Operator ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 24 Pi Notation Similar to the use of sigma to denote sums, the pi operator (Π) is used to denote repeated multiplications. For example means ‘multiply together all of the xi for each value of i between the lower and upper limits.’ It also follows that ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 25 Differential Calculus The effect of the rate of change of one variable on the rate of change of another is measured by a mathematical derivative If the relationship between the two variables can be represented by a curve, the gradient of the curve will be this rate of change Consider a variable y that is a function f of another variable x, i.e. y = f (x): the derivative of y with respect to x is written or sometimes f ′(x). This term measures the instantaneous rate of change of y with respect to x, or in other words, the impact of an infinitesimally small change in x Notice the difference between the notations Δy and dy ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 26 Differentiation: The Basics 1. The derivative of a constant is zero – e.g. if y = 10, dy/dx = 0 This is because y = 10 would be a horizontal straight line on a graph of y against x, and therefore the gradient of this function is zero 2. The derivative of a linear function is simply its slope e.g. if y = 3x + 2, dy/dx = 3 But non-linear functions will have different gradients at each point along the curve In effect, the gradient at each point is equal to the gradient of the tangent at that point The gradient will be zero at the point where the curve changes direction from positive to negative or from negative to positive – this is known as a turning point. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 27 The Tangent to a Curve ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 28 The Derivative of a Power Function or of a Sum The derivative of a power function n of x, i.e. y = cxn is given by dy/dx = cnxn−1 For example: – If y = 4x3, dy/dx = (4 × 3)x2 = 12x2 – If y = 3/x = 3x−1, dy/dx= (3 × −1)x−2 = −3x−2 = −3/x2 The derivative of a sum is equal to the sum of the derivatives of the individual parts: e.g., if y = f (x) + g (x), dy/dx = f ′(x) + g′(x) The derivative of a difference is equal to the difference of the derivatives of the individual parts: e.g., if y = f (x) − g (x), dy/dx = f ′(x) − g′(x). ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 29 The Derivatives of Logs and Exponentials The derivative of the log of x is given by 1/x, i.e. d(log(x))/dx = 1/x The derivative of the log of a function of x is the derivative of the function divided by the function, i.e. d(log(f (x)))/dx = f ′(x)/f (x) E.g., the derivative of log(x3 + 2x − 1) is (3x2 + 2)/(x3 + 2x − 1) The derivative of ex is ex. The derivative of e f (x) is given by f ′(x)e f (x) E.g., if y = e3x2, dy/dx = 6xe3x2 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 30 Higher Order Derivatives It is possible to differentiate a function more than once to calculate the second order, third order,..., nth order derivatives The notation for the second order derivative, which is usually just termed the second derivative, is To calculate second order derivatives, differentiate the function with respect to x and then differentiate it again For example, suppose that we have the function y = 4x5 + 3x3 + 2x + 6, the first order derivative is ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 31 Higher Order Derivatives (Cont’d) The second order derivative is The second order derivative can be interpreted as the gradient of the gradient of a function – i.e., the rate of change of the gradient How can we tell whether a particular turning point is a maximum or a minimum? The answer is that we would look at the second derivative When a function reaches a maximum, its second derivative is negative, while it is positive for a minimum. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 32 Maxima and Minima of Functions Consider the quadratic function y = 5x2 + 3x − 6 Since the squared term in the equation has a positive sign (i.e., it is 5 rather than, say, −5), the function will have a ∪-shape rather than an ∩- shape, and thus it will have a minimum rather than a maximum: dy/dx = 10x + 3, d2y/dx2 = 10 Since the second derivative is positive, the function indeed has a minimum To find where this minimum is located, take the first derivative, set it to zero and solve it for x So we have 10x + 3 = 0, and x = −3/10 = −0.3. If x = −0.3, y is found by substituting −0.3 into y = 5x2 + 3x − 6 = 5 × (−0.3)2 + (3 × −0.3) − 6 = −6.45. Therefore, the minimum of this function is found at (−0.3,−6.45). ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 33 Partial Differentiation In the case where y is a function of more than one variable (e.g. y = f (x1, x2,... , xn)), it may be of interest to determine the effect that changes in each of the individual x variables would have on y Differentiation of y with respect to only one of the variables, holding the others constant, is partial differentiation The partial derivative of y with respect to a variable x1 is usually denoted ∂y/∂x1 All of the rules for differentiation explained above still apply and there will be one (first order) partial derivative for each variable on the right hand side of the equation. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 34 How to do Partial Differentiation We calculate these partial derivatives one at a time, treating all of the other variables as if they were constants. To give an illustration, suppose y = 3x13 + 4x1 − 2x24 + 2x22, the partial derivative of y with respect to x1 would be ∂y/∂x1 = 9x12 + 4, while the partial derivative of y with respect to x2 would be ∂y/∂x2 = −8x23 + 4x2 The ordinary least squares (OLS) estimator gives formulae for the values of the parameters that minimise the residual sum of squares, denoted by L The minimum of L is found by partially differentiating this function and setting the partial derivatives to zero Therefore, partial differentiation has a key role in deriving the main approach to parameter estimation that we use in econometrics. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 35 Integration Integration is the opposite of differentiation If we integrate a function and then differentiate the result, we get back the original function Integration is used to calculate the area under a curve (between two specific points) Further details on the rules for integration are not given since the mathematical technique is not needed for any of the approaches used here. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 36 Matrices - Background Some useful terminology: – A scalar is simply a single number (although it need not be a whole number – e.g., 3, −5, 0.5 are all scalars) – A vector is a one-dimensional array of numbers (see below for examples) – A matrix is a two-dimensional collection or array of numbers. The size of a matrix is given by its numbers of rows and columns Matrices are very useful and important ways for organising sets of data together, which make manipulating and transforming them easy Matrices are widely used in econometrics and finance for solving systems of linear equations, for deriving key results, and for expressing formulae. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 37 Working with Matrices The dimensions of a matrix are quoted as R × C, which is the number of rows by the number of columns Each element in a matrix is referred to using subscripts. For example, suppose a matrix M has two rows and four columns. The element in the second row and the third column of this matrix would be denoted m23. More generally mij refers to the element in the ith row and the jth column. Thus a 2 × 4 matrix would have elements If a matrix has only one row, it is a row vector, which will be of dimension 1 × C, where C is the number of columns, e.g. (2.7 3.0 −1.5 0.3) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 38 Working with Matrices A matrix having only one column is a column vector, which will be of dimension R× 1, where R is the number of rows, e.g. When the number of rows and columns is equal (i.e. R = C), it would be said that the matrix is square, e.g. the 2 × 2 matrix: A matrix in which all the elements are zero is a zero matrix. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 39 Working with Matrices 2 A symmetric matrix is a special square matrix that is symmetric about the leading diagonal so that mij = mji ∀ i, j, e.g. A diagonal matrix is a square matrix which has non-zero terms on the leading diagonal and zeros everywhere else, e.g. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 40 Working with Matrices 3 A diagonal matrix with 1 in all places on the leading diagonal and zero everywhere else is known as the identity matrix, denoted by I, e.g. The identity matrix is essentially the matrix equivalent of the number one Multiplying any matrix by the identity matrix of the appropriate size results in the original matrix being left unchanged So for any matrix M, MI = IM = M In order to perform operations with matrices , they must be conformable The dimensions of matrices required for them to be conformable depend on the operation. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 41 Matrix Addition or Subtraction Addition and subtraction of matrices requires the matrices concerned to be of the same order (i.e. to have the same number of rows and the same number of columns as one another) The operations are then performed element by element ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 42 Matrix Multiplication Multiplying or dividing a matrix by a scalar (that is, a single number), implies that every element of the matrix is multiplied by that number More generally, for two matrices A and B of the same order and for c a scalar, the following results hold – A+B=B+A – A+0=0+A=A – cA = Ac – c(A + B) = cA + cB – A0 = 0A = 0 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 43 Matrix Multiplication Multiplying two matrices together requires the number of columns of the first matrix to be equal to the number of rows of the second matrix Note also that the ordering of the matrices is important, so in general, AB  BA When the matrices are multiplied together, the resulting matrix will be of size (number of rows of first matrix × number of columns of second matrix), e.g. (3 × 2) × (2 × 4) = (3 × 4). More generally, (a × b) × (b × c) ×(c × d) × (d × e) = (a × e), etc. In general, matrices cannot be divided by one another. – Instead, we multiply by the inverse. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 44 Matrix Multiplication Example The actual multiplication of the elements of the two matrices is done by multiplying along the rows of the first matrix and down the columns of the second ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 45 The Transpose of a Matrix The transpose of a matrix, written A′ or AT, is the matrix obtained by transposing (switching) the rows and columns of a matrix If A is of dimensions R × C, A′ will be C × R. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 46 The Rank of a Matrix The rank of a matrix A is given by the maximum number of linearly independent rows (or columns). For example, In the first case, all rows and columns are (linearly) independent of one another, but in the second case, the second column is not independent of the first (the second column is simply twice the first) A matrix with a rank equal to its dimension is a matrix of full rank A matrix that is less than of full rank is known as a short rank matrix, and is singular Three important results: Rank(A) = Rank (A′); Rank(AB) ≤ min(Rank(A), Rank(B)); Rank (A′A) = Rank (AA′) = Rank (A) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 47 The Inverse of a Matrix The inverse of a matrix A, where defined and denoted A−1, is that matrix which, when pre-multiplied or post multiplied by A, will result in the identity matrix, i.e. AA−1 = A−1A = I The inverse of a matrix exists only when the matrix is square and non- singular Properties of the inverse of a matrix include: – I−1 = I – (A−1)−1 = A – (A′)−1 = (A−1)′ – (AB)−1 = B−1A−1 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 48 Calculating Inverse of a 22 Matrix The inverse of a 2 × 2 non-singular matrix whose elements are will be The expression in the denominator, (ad − bc) is the determinant of the matrix, and will be a scalar If the matrix is the inverse will be As a check, multiply the two matrices together and it should give the identity matrix I. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 49 The Trace of a Matrix The trace of a square matrix is the sum of the terms on its leading diagonal For example, the trace of the matrix , written Tr(A), is 3 + 9 = 12 Some important properties of the trace of a matrix are: – Tr(cA) = cTr(A) – Tr(A′) = Tr(A) – Tr(A + B) = Tr(A) + Tr(B) – Tr(IN) = N ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 50 The Eigenvalues of a Matrix Let Π denote a p × p square matrix, c denote a p × 1 non-zero vector, and λ denote a set of scalars λ is called a characteristic root or set of roots of the matrix Π if it is possible to write Πc = λc This equation can also be written as Πc = λIpc where Ip is an identity matrix, and hence (Π − λIp)c = 0 Since c  0 by definition, then for this system to have a non-zero solution, the matrix (Π − λIp) is required to be singular (i.e. to have a zero determinant), and thus |Π − λIp| = 0 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 51 Calculating Eigenvalues: An Example Let Π be the 2 × 2 matrix Then the characteristic equation is |Π − λIp| This gives the solutions λ = 6 and λ = 3 The characteristic roots are also known as eigenvalues The eigenvectors would be the values of c corresponding to the eigenvalues. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 52 Portfolio Theory and Matrix Algebra - Basics Probably the most important application of matrix algebra in finance is to solving portfolio allocation problems Suppose that we have a set of N stocks that are included in a portfolio P with weights w1,w2,... ,wN and suppose that their expected returns are written as E(r1),E(r2),... ,E(rN). We could write the N × 1 vectors of weights, w, and of expected returns, E(r), as The expected return on the portfolio, E(rP ) can be calculated as E(r)′w. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 53 The Variance-Covariance Matrix The variance-covariance matrix of the returns, denoted V includes all of the variances of the components of the portfolio returns on the leading diagonal and the covariances between them as the off-diagonal elements. The variance-covariance matrix of the returns may be written For example: – σ11 is the variance of the returns on stock one, σ22 is the variance of returns on stock two, etc. – σ12 is the covariance between the returns on stock one and those on stock two, etc. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 54 Constructing the Variance-Covariance Matrix In order to construct a variance-covariance matrix we would need to first set up a matrix containing observations on the actual returns , R (not the expected returns) for each stock where the mean, ri (i = 1,... ,N), has been subtracted away from each series i. We would write The general entry, rij , is the jth time-series observation on the ith stock. The variance-covariance matrix would then simply be calculated as V = (R′R)/(T − 1) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 55 The Variance of Portfolio Returns Suppose that we wanted to calculate the variance of returns on the portfolio P – A scalar which we might call VP We would do this by calculating VP = w′V w Checking the dimension of VP , w′ is (1 × N), V is (N × N) and w is (N × 1) so VP is (1 × N × N × N × N × 1), which is (1 × 1) as required ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 56 The Correlation between Returns Series We could define a correlation matrix of returns, C, which would be This matrix would have ones on the leading diagonal and the off-diagonal elements would give the correlations between each pair of returns Note that the correlation matrix will always be symmetrical about the leading diagonal Using the correlation matrix, the portfolio variance is VP = w′SCSw where S is a diagonal matrix containing the standard deviations of the portfolio returns. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 57 Selecting Weights for the Minimum Variance Portfolio Although in theory the optimal portfolio on the efficient frontier is better, a variance-minimising portfolio often performs well out-of-sample The portfolio weights w that minimise the portfolio variance, VP is written We also need to be slightly careful to impose at least the restriction that all of the wealth has to be invested (weights sum to one) This restriction is written as w′· 1N = 1, where 1N is a column vector of ones of length N. The minimisation problem can be solved to where MV P stands for minimum variance portfolio ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 58 Selecting Optimal Portfolio Weights In order to trace out the mean-variance efficient frontier, we would repeatedly solve this minimisation problem but in each case set the portfolio’s expected return equal to a different target value, We would write this as This is sometimes called the Markowitz portfolio allocation problem – It can be solved analytically so we can derive an exact solution But it is often the case that we want to place additional constraints on the optimisation, e.g. – Restrict the weights so that none are greater than 10% of overall wealth – Restrict them to all be positive (i.e. long positions only with no short selling) In such cases the Markowitz portfolio allocation problem cannot be solved analytically and thus a numerical procedure must be used ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 59 Selecting Optimal Portfolio Weights If the procedure above is followed repeatedly for different return targets, it will trace out the efficient frontier In order to find the tangency point where the efficient frontier touches the capital market line, we need to solve the following problem If no additional constraints are required on weights, this can be solved as Note that it is also possible to write the Markowitz problem where we select the portfolio weights that maximise the expected portfolio return subject to a target maximum variance level. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 60 Chapter 3 A brief overview of the classical linear regression model ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 1 Regression Regression is probably the single most important tool at the econometrician’s disposal. But what is regression analysis? It is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables (usually known as the independent variable(s)). ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 2 Some Notation Denote the dependent variable by y and the independent variable(s) by x1, x2,... , xk where there are k independent variables. Some alternative names for the y and x variables: y x dependent variable independent variables regressand regressors effect variable causal variables explained variable explanatory variable Note that there can be many x variables but we will limit ourselves to the case where there is only one x variable to start with. In our set-up, there is only one y variable. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 3 Regression is different from Correlation If we say y and x are correlated, it means that we are treating y and x in a completely symmetrical way. In regression, we treat the dependent variable (y) and the independent variable(s) (x’s) very differently. The y variable is assumed to be random or “stochastic” in some way, i.e. to have a probability distribution. The x variables are, however, assumed to have fixed (“non-stochastic”) values in repeated samples. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 4 Simple Regression For simplicity, say k=1. This is the situation where y depends on only one x variable. Examples of the kind of relationship that may be of interest include: – How asset returns vary with their level of market risk – Measuring the long-term relationship between stock prices and dividends. – Constructing an optimal hedge ratio ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 5 Simple Regression: An Example Suppose that we have the following data on the excess returns on a fund manager’s portfolio (“fund XXX”) together with the excess returns on a market index: Year, t Excess return Excess return on market index = rXXX,t – rft = rmt - rft 1 17.8 13.7 2 39.0 23.2 3 12.8 6.9 4 24.2 16.8 5 17.2 12.3 We have some intuition that the beta on this fund is positive, and we therefore want to find whether there appears to be a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 6 Graph (Scatter Diagram) 45 Excess return on fund XXX 40 35 30 25 20 15 10 5 0 0 5 10 15 20 25 Excess return on market portfolio ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 7 Finding a Line of Best Fit We can use the general equation for a straight line, y=a+bx to get the line that best “fits” the data. However, this equation (y=a+bx) is completely deterministic. Is this realistic? No. So what we do is to add a random disturbance term, u into the equation. yt =  + xt + ut where t = 1,2,3,4,5 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 8 Why do we include a Disturbance term? The disturbance term can capture a number of features: - We always leave out some determinants of yt - There may be errors in the measurement of yt that cannot be modelled. - Random outside influences on yt which we cannot model ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 9 Determining the Regression Coefficients So how do we determine what  and  are? Choose  and  so that the (vertical) distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible): y x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 10 Ordinary Least Squares The most common method used to fit a line to the data is known as OLS (ordinary least squares). What we actually do is take each distance and square it (i.e. take the area of each of the squares in the diagram) and minimise the total sum of the squares (hence least squares). Tightening up the notation, let yt denote the actual data point t ŷt denote the fitted value from the regression line û t denote the residual, yt - ŷt ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 11 Actual and Fitted Value y yi û i ŷi xi x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 12 How OLS Works 5 So min. uˆ1 + uˆ 2 + uˆ3 + uˆ 4 + uˆ5 , or minimise 2 2 2 2 2  uˆ t =1 2 t. This is known as the residual sum of squares. But what was û t ? It was the difference between the actual point and the line, yt - ŷt. ( So minimising  ty − ˆ y t )2 is equivalent to minimising  t ˆ u 2 with respect to $ and $. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 13 Deriving the OLS Estimator ˆ t = ˆ + ˆxt , so let But y L =  ( yt − yˆ t ) 2 =  ( yt − ˆ − ˆxt ) 2 t i Want to minimise L with respect to (w.r.t.) $ and $ , so differentiate L w.r.t. $ and $ L ˆ t  = −2 ( yt − ˆ − ˆxt ) = 0 (1) L = −2 xt ( yt − ˆ − ˆxt ) = 0 (2) ˆ t From (1),  ( y t − ˆ − ˆxt ) = 0  y t − Tˆ − ˆ  xt = 0 t But  y t = Ty and  x t = Tx. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 14 Deriving the OLS Estimator (cont’d) So we can write Ty − Tˆ − Tˆx = 0 or y − ˆ − ˆx = 0 (3) From (2),  xt ( yt − ˆ − ˆxt ) = 0 (4) t From (3), ˆ = y − ˆx (5) Substitute into (4) for $ from (5),  xt ( yt − y + ˆx − ˆxt ) = 0 t  t t  t x y − y x + ˆ x  t x − ˆ   t =0 x 2 t  t t x y − T y x + ˆ Tx 2 − ˆ   t =0 x 2 t ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 15 Deriving the OLS Estimator (cont’d) Rearranging for $ , ˆ (Tx 2 −  xt2 ) = Tyx −  xt yt So overall we have ˆ =  xt yt − Tx y andˆ = y − ˆx   xt2 − Tx 2 This method of finding the optimum is known as ordinary least squares. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 16 What do We Use $ and $ For? In the CAPM example used above, plugging the 5 observations in to make up the formulae given above would lead to the estimates $ = -1.74 and $= 1.64. We would write the fitted line as: yˆ t = −1.74 + 1.64x t Question: If an analyst tells you that she expects the market to yield a return 20% higher than the risk-free rate next year, what would you expect the return on fund XXX to be? Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”, so plug x = 20 into the equation to get the expected value for y: yˆ i = −1.74 + 1.64 20 = 31.06 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 17 Accuracy of Intercept Estimate Care needs to be exercised when considering the intercept estimate, particularly if there are no or few observations close to the y-axis: y 0 x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 18 The Population and the Sample The population is the total collection of all objects or people to be studied, for example, Interested in Population of interest predicting outcome the entire electorate of an election A sample is a selection of just some items from the population. A random sample is a sample in which each individual item in the population is equally likely to be drawn. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 19 The DGP and the PRF The population regression function (PRF) is a description of the model that is thought to be generating the actual data and the true relationship between the variables (i.e. the true values of  and ). The PRF is yt =  + xt + ut The SRF is yˆ t = ˆ + ˆxt and we also know that uˆt = yt − yˆ t. We use the SRF to infer likely values of the PRF. We also want to know how “good” our estimates of  and  are. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 20 Linearity In order to use OLS, we need a model which is linear in the parameters ( and  ). It does not necessarily have to be linear in the variables (y and x). Linear in the parameters means that the parameters are not multiplied together, divided, squared or cubed etc. Some models can be transformed to linear ones by a suitable substitution or manipulation, e.g. the exponential regression model Yt = e X t eut ln Yt =  +  ln X t + ut Then let yt=ln Yt and xt=ln Xt yt =  + xt + ut ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 21 Linear and Non-linear Models This is known as the exponential regression model. Here, the coefficients can be interpreted as elasticities. Similarly, if theory suggests that y and x should be inversely related:  yt =  + + ut xt then the regression can be estimated using OLS by substituting 1 zt = xt But some models are intrinsically non-linear, e.g.  yt =  + xt + ut ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 22 Estimator or Estimate? Estimators are the formulae used to calculate the coefficients Estimates are the actual numerical values for the coefficients. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 23 The Assumptions Underlying the Classical Linear Regression Model (CLRM) The model which we have used is known as the classical linear regression model. We observe data for xt, but since yt also depends on ut, we must be specific about how the ut are generated. We usually make the following set of assumptions about the ut’s (the unobservable error terms): Technical Notation Interpretation 1. E(ut) = 0 The errors have zero mean 2. Var (ut) = 2 The variance of the errors is constant and finite over all values of xt 3. Cov (ui,uj)=0 The errors are statistically independent of one another 4. Cov (ut,xt)=0 No relationship between the error and corresponding x variate ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 24 The Assumptions Underlying the CLRM Again An alternative assumption to 4., which is slightly stronger, is that the xt’s are non-stochastic or fixed in repeated samples. A fifth assumption is required if we want to make inferences about the population parameters (the actual  and ) from the sample parameters ( $ and $ ) Additional Assumption 5. ut is normally distributed ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 25 Properties of the OLS Estimator If assumptions 1. through 4. hold, then the estimators $ and$ determined by OLS are known as Best Linear Unbiased Estimators (BLUE). What does the acronym stand for? “Estimator” - $ is an estimator of the true value of . “Linear” - $ is a linear estimator “Unbiased” - On average, the actual value of the $ and $’s will be equal to the true values. “Best” - means that the OLS estimator $ has minimum variance among the class of linear unbiased estimators. The Gauss-Markov theorem proves that the OLS estimator is best. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 26 Consistency/Unbiasedness/Efficiency Consistent The least squares estimators $ and $ are consistent. That is, the estimates will converge to their true values as the sample size increases to infinity. Need the assumptions E(xtut)=0 and Var(ut)=2 <  to prove this. Consistency implies that  lim Pr ˆ −    = 0   0 T →  Unbiased The least squares estimates of $ and $ are unbiased. That is E($)= and E($ )= Thus on average the estimated value will be equal to the true values. To prove this also requires the assumption that E(ut)=0. Unbiasedness is a stronger condition than consistency. Efficiency An estimator $ of parameter  is said to be efficient if it is unbiased and no other unbiased estimator has a smaller variance. If the estimator is efficient, we are minimising the probability that it is a long way off from the true value of . ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 27 Precision and Standard Errors Any set of regression estimates of $ and $ are specific to the sample used in their estimation. Recall that the estimators of  and  from the sample parameters ($ and $) are ˆ =  t 2 t given by x y − Tx y andˆ = y − ˆx  xt − Tx 2 What we need is some measure of the reliability or precision of the estimators ( $ and $ ). The precision of the estimate is given by its standard error. Given assumptions 1 - 4 above, then the standard errors can be shown to be given by  t =s  t , 2 2 x x SE (ˆ ) = s T  ( xt − x ) 2 T  xt2 − T 2 x 2 1 1 SE ( ˆ ) = s =s  ( xt − x ) 2 t x 2 − T x 2 where s is the estimated standard deviation of the residuals. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 28 Estimating the Variance of the Disturbance Term The variance of the random variable ut is given by Var(ut) = E[(ut)-E(ut)]2 which reduces to Var(ut) = E(ut2) We could estimate this using the average of ut2: 1 s2 = T  ut2 Unfortunately this is not workable since ut is not observable. We can use the sample counterpart to ut, which is û t : 1 2 2 s = T  uˆ t But this estimator is a biased estimator of 2. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 29 Estimating the Variance of the Disturbance Term (cont’d) An unbiased estimator of  is given by s=  t ˆ u 2 T −2 where  uˆ 2 t is the residual sum of squares and T is the sample size. Some Comments on the Standard Error Estimators 1. Both SE($ ) and SE($ ) depend on s2 (or s). The greater the variance s2, then the more dispersed the errors are about their mean value and therefore the more dispersed y will be about its mean value. 2. The sum of the squares of x about their mean appears in both formulae. The larger the sum of squares, the smaller the coefficient variances. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 30 Some Comments on the Standard Error Estimators Consider what happens if (  tx − x )2 is small or large: y y y y x x 0 x 0 x ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 31 Some Comments on the Standard Error Estimators (cont’d) 3. The larger the sample size, T, the smaller will be the coefficient variances. T appears explicitly in SE($ ) and implicitly in SE( $ ). ( T appears implicitly since the sum  tx − x )2 is from t = 1 to T. 4. The term  xt appears in the SE($ ). 2 The reason is that  xt measures how far the points are away from the 2 y-axis. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 32 Example: How to Calculate the Parameters and Standard Errors Assume we have the following data calculated from a regression of y on a single variable x and a constant over 22 observations. Data:  xt yt = 830102 , T = 22, x = 416.5, y = 86.65, x 2 t = 3919654 , RSS = 130.6 830102 − (22 * 416.5 * 86.65) Calculations: $ = 2 = 0.35 3919654 − 22 *(416.5) $ = 86.65 − 035. = −5912. * 4165. We write yˆt = ˆ + ˆxt yˆ t = 59.12 + 0.35 xt ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 33 Example (cont’d) SE(regression), s =  uˆ t2 = 130.6 = 2.55 T −2 20 3919654 SE ( ) = 2.55 * = 3.35 ( (22  3919654 ) − 22  416.5 2 ) 1 SE (  ) = 2.55 * = 0.0079 ( 3919654 − 22  416.5 2 ) We now write the results as yˆ t = − 59.12 + 0.35xt (3.35) (0.0079) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 34 An Introduction to Statistical Inference We want to make inferences about the likely population values from the regression parameters. Example: Suppose we have the following regression results: yˆ t = 20.3 + 0.5091xt (14.38) (0.2561) $ = 0.5091 is a single (point) estimate of the unknown population parameter, . How “reliable” is this estimate? The reliability of the point estimate is measured by the coefficient’s standard error. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 35 Hypothesis Testing: Some Concepts We can use the information in the sample to make inferences about the population. We will always have two hypotheses that go together, the null hypothesis (denoted H0) and the alternative hypothesis (denoted H1). The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The alternative hypothesis represents the remaining outcomes of interest. For example, suppose given the regression results above, we are interested in the hypothesis that the true value of  is in fact 0.5. We would use the notation H0 :  = 0.5 H1 :   0.5 This would be known as a two sided test. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 36 One-Sided Hypothesis Tests Sometimes we may have some prior information that, for example, we would expect  > 0.5 rather than  < 0.5. In this case, we would do a one-sided test: H0 :  = 0.5 H1 :  > 0.5 or we could have had H0 :  = 0.5 H1 :  < 0.5 There are two ways to conduct a hypothesis test: via the test of significance approach or via the confidence interval approach. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 37 The Probability Distribution of the Least Squares Estimators We assume that ut  N(0,2) Since the least squares estimators are linear combinations of the random variables i.e. $ =  wt yt The weighted sum of normal random variables is also normally distributed, so $  N(, Var()) $  N(, Var()) What if the errors are not normally distributed? Will the parameter estimates still be normally distributed? Yes, if the other assumptions of the CLRM hold, and the sample size is sufficiently large. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 38 The Probability Distribution of the Least Squares Estimators (cont’d) Standard normal variates can be constructed from $ and $ : ˆ −  ˆ −  ~ N (0,1) and ~ N (0,1) var ( ) var ( ) But var() and var() are unknown, so ˆ −  ˆ −  ~ tT −2 and ~ tT −2 SE (ˆ ) ˆ SE (  ) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 39 Testing Hypotheses: The Test of Significance Approach Assume the regression equation is given by , yt =  + xt + ut for t=1,2,...,T The steps involved in doing a test of significance are: 1. Estimate $ , $ and SE($ ) , SE( $ ) in the usual way 2. Calculate the test statistic. This is given by the formula $ −  * test statistic = SE ( $ ) where  * is the value of  under the null hypothesis. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 40 The Test of Significance Approach (cont’d) 3. We need some tabulated distribution with which to compare the estimated test statistics. Test statistics derived in this way can be shown to follow a t- distribution with T-2 degrees of freedom. As the number of degrees of freedom increases, we need to be less cautious in our approach since we can be more sure that our results are robust. 4. We need to choose a “significance level”, often denoted . This is also sometimes called the size of the test and it determines the region where we will reject or not reject the null hypothesis that we are testing. It is conventional to use a significance level of 5%. Intuitive explanation is that we would only expect a result as extreme as this or more extreme 5% of the time as a consequence of chance alone. Conventional to use a 5% size of test, but 10% and 1% are also commonly used. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 41 Determining the Rejection Region for a Test of Significance 5. Given a significance level, we can determine a rejection region and non- rejection region. For a 2-sided test: f(x) 2.5% 95% non-rejection 2.5% rejection region region rejection region ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 42 The Rejection Region for a 1-Sided Test (Upper Tail) f(x) 95% non-rejection region 5% rejection region ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 43 The Rejection Region for a 1-Sided Test (Lower Tail) f(x) 95% non-rejection region 5% rejection region ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 44 The Test of Significance Approach: Drawing Conclusions 6. Use the t-tables to obtain a critical value or values with which to compare the test statistic. 7. Finally perform the test. If the test statistic lies in the rejection region then reject the null hypothesis (H0), else do not reject H0. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 45 A Note on the t and the Normal Distribution You should all be familiar with the normal distribution and its characteristic “bell” shape. We can scale a normal variate to have zero mean and unit variance by subtracting its mean and dividing by its standard deviation. There is, however, a specific relationship between the t- and the standard normal distribution. Both are symmetrical and centred on zero. The t-distribution has another parameter, its degrees of freedom. We will always know this (for the time being from the number of observations -2). ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 46 What Does the t-Distribution Look Like? normal distribution t-distribution ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 47 Comparing the t and the Normal Distribution In the limit, a t-distribution with an infinite number of degrees of freedom is a standard normal, i.e. t () = N (01,) Examples from statistical tables: Significance level N(0,1) t(40) t(4) 50% 0 0 0 5% 1.64 1.68 2.13 2.5% 1.96 2.02 2.78 0.5% 2.57 2.70 4.60 The reason for using the t-distribution rather than the standard normal is that we had to estimate  2, the variance of the disturbances. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 48 The Confidence Interval Approach to Hypothesis Testing An example of its usage: We estimate a parameter, say to be 0.93, and a “95% confidence interval” to be (0.77,1.09). This means that we are 95% confident that the interval containing the true (but unknown) value of . Confidence intervals are almost invariably two-sided, although in theory a one-sided interval can be constructed. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 49 How to Carry out a Hypothesis Test Using Confidence Intervals 1. Calculate $ , $ and SE($ ) , SE( $ ) as before. 2. Choose a significance level, , (again the convention is 5%). This is equivalent to choosing a (1-)100% confidence interval, i.e. 5% significance level = 95% confidence interval 3. Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of freedom. 4. The confidence interval is given by ( ˆ − t crit  SE( ˆ ), ˆ + t crit  SE( ˆ )) 5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval, then reject the null hypothesis that  = *, otherwise do not reject the null. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 50 Confidence Intervals Versus Tests of Significance Note that the Test of Significance and Confidence Interval approaches always give the same answer. Under the test of significance approach, we would not reject H0 that  = * if the test statistic lies within the non-rejection region, i.e. if $ −  * −tcrit £ $ £ +tcrit SE (  ) Rearranging, we would not reject if − t crit  SE ( ˆ ) £ ˆ −  * £ +t crit  SE ( ˆ ) ˆ − t crit  SE ( ˆ ) £  * £ ˆ + t crit  SE ( ˆ ) But this is just the rule under the confidence interval approach. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 51 Constructing Tests of Significance and Confidence Intervals: An Example Using the regression results above, yˆ t = 20.3 + 0.5091xt , T=22 (14.38) (0.2561) Using both the test of significance and confidence interval approaches, test the hypothesis that  =1 against a two-sided alternative. The first step is to obtain the critical value. We want tcrit = t20;5% ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 52 Determining the Rejection Region f(x) 2.5% rejection region 2.5% rejection region -2.086 +2.086 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 53 Performing the Test The hypotheses are: H0 :  = 1 H1 :   1 Test of significance Confidence interval approach approach test stat = $ −  * ˆ  t crit  SE ( ˆ ) SE ( $ ) 05091. −1 = 0.5091  2.086  0.2561 = = −1917. 0.2561 = (−0.0251,1.0433 ) Do not reject H0 since Since 1 lies within the test stat lies within confidence interval, non-rejection region do not reject H0 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 54 Testing other Hypotheses What if we wanted to test H0 :  = 0 or H0 :  = 2? Note that we can test these with the confidence interval approach. For interest (!), test H0 :  = 0 vs. H1 :   0 H0 :  = 2 vs. H1 :   2 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 55 Changing the Size of the Test But note that we looked at only a 5% size of test. In marginal cases (e.g. H0 :  = 1), we may get a completely different answer if we use a different size of test. This is where the test of significance approach is better than a confidence interval. For example, say we wanted to use a 10% size of test. Using the test of significance approach, $ −  * test stat = SE ( $ ) 05091. −1 = = −1917. 0.2561 as above. The only thing that changes is the critical t-value. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 56 Changing the Size of the Test: The New Rejection Regions f(x) 5% rejection region 5% rejection region -1.725 +1.725 ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 57 Changing the Size of the Test: The Conclusion t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we would reject H0. Caution should therefore be used when placing emphasis on or making decisions in marginal cases (i.e. in cases where we only just reject or not reject). ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 58 Some More Terminology If we reject the null hypothesis at the 5% level, we say that the result of the test is statistically significant. Note that a statistically significant result may be of no practical significance. E.g. if a shipment of cans of beans is expected to weigh 450g per tin, but the actual mean weight of some tins is 449g, the result may be highly statistically significant but presumably nobody would care about 1g of beans. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 59 The Errors That We Can Make Using Hypothesis Tests We usually reject H0 if the test statistic is statistically significant at a chosen significance level. There are two possible errors we could make: 1. Rejecting H0 when it was really true. This is called a type I error. 2. Not rejecting H0 when it was in fact false. This is called a type II error. Reality H0 is true H0 is false Significant Type I error  Result of (reject H0) = Test Insignificant Type II error ( do not  = reject H0) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 60 The Trade-off Between Type I and Type II Errors The probability of a type I error is just , the significance level or size of test we chose. To see this, recall what we said significance at the 5% level meant: it is only 5% likely that a result as or more extreme as this could have occurred purely by chance. Note that there is no chance for a free lunch here! What happens if we reduce the size of the test (e.g. from a 5% test to a 1% test)? We reduce the chances of making a type I error... but we also reduce the probability that we will reject the null hypothesis at all, so we increase the probability of a type II error: less likely to falsely reject Reduce size → more strict → reject null of test criterion for hypothesis more likely to rejection less often incorrectly not reject So there is always a trade off between type I and type II errors when choosing a significance level. The only way we can reduce the chances of both is to increase the sample size. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 61 A Special Type of Hypothesis Test: The t-ratio Recall that the formula for a test of significance approach to hypothesis testing using a t-test was $i −  i* test statistic = SE( $i ) If the test is H 0 : i = 0 H 1 : i  0 i.e. a test that the population coefficient is zero against a two-sided alternative, this is known as a t-ratio test: $i Since  i* = 0, test stat = SE ( $i ) The ratio of the coefficient to its SE is known as the t-ratio or t-statistic. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 62 The t-ratio: An Example Suppose that we have the following parameter estimates, standard errors and t-ratios for an intercept and slope respectively. Coefficient 1.10 -4.40 SE 1.35 0.96 t-ratio 0.81 -4.63 Compare this with a tcrit with 15-3 = 12 d.f. (2½% in each tail for a 5% test) = 2.179 5% = 3.055 1% Do we reject H0: 1 = 0? (No) H0: 2 = 0? (Yes) ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 63 What Does the t-ratio tell us? If we reject H0, we say that the result is significant. If the coefficient is not “significant” (e.g. the intercept coefficient in the last regression above), then it means that the variable is not helping to explain variations in y. Variables that are not significant are usually removed from the regression model. In practice there are good statistical reasons for always having a constant y t even if it is not significant. Look at what happens if no intercept is included: xt ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 64 An Example of the Use of a Simple t-test to Test a Theory in Finance Testing for the presence and significance of abnormal returns (“Jensen’s alpha” - Jensen, 1968). The Data: Annual Returns on the portfolios of 115 mutual funds from 1945-1964. The model: R jt − R ft =  j +  j ( Rmt − R ft ) + u jt for j = 1, …, 115 We are interested in the significance of j. The null hypothesis is H0: j = 0. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 65 Frequency Distribution of t-ratios of Mutual Fund Alphas (gross of transactions costs) Source Jensen (1968). Reprinted with the permission of Blackwell publishers. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 66 Frequency Distribution of t-ratios of Mutual Fund Alphas (net of transactions costs) Source Jensen (1968). Reprinted with the permission of Blackwell publishers. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 67 Can UK Unit Trust Managers “Beat the Market”? We now perform a variant on Jensen’s test in the context of the UK market, considering monthly returns on 76 equity unit trusts. The data cover the period January 1979 – May 2000 (257 observations for each fund). Some summary statistics for the funds are: Mean Minimum Maximum Median Average monthly return, 1979-2000 1.0% 0.6% 1.4% 1.0% Standard deviation of returns over time 5.1% 4.3% 6.9% 5.0% Jensen Regression Results for UK Unit Trust Returns, January 1979-May 2000 R jt − R ft =  j +  j ( Rmt − R ft ) +  jt ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 68 Can UK Unit Trust Managers “Beat the Market”? : Results Estimates of Mean Minimum Maximum Median  -0.02% -0.54% 0.33% -0.03%  0.91 0.56 1.09 0.91 t-ratio on  -0.07 -2.44 3.11 -0.25 In fact, gross of transactions costs, 9 funds of the sample of 76 were able to significantly out-perform the market by providing a significant positive alpha, while 7 funds yielded significant negative alphas. ‘Introductory Econometrics for Finance’ © Chris Brooks 2013 69 The Overreaction Hypothesis and the UK Stock Market Motivation Two studies by DeBondt and Thaler (1985, 1987) showed that stocks which experience a poor performance over a 3 to 5 year period tend to outperform stocks which had previously performed

Introductory Econometrics for Finance PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue