Sample Moments PDF
Document Details
Uploaded by WellGalaxy
Tags
Summary
This chapter details sample moments, focusing on the estimation of the mean from sample data. The chapter also covers how sample moments extend to more than one data series with detailed calculations for variance, skewness, and kurtosis.
Full Transcript
Sample Moments 5 Learning Objectives After completing this reading you should be able to: Estimate the mean, variance, and standard deviation using Explain how the Law of Large...
Sample Moments 5 Learning Objectives After completing this reading you should be able to: Estimate the mean, variance, and standard deviation using Explain how the Law of Large Numbers (LLN) and Central sample data. Limit Theorem (CLT) apply to the sample mean. Explain the difference between a population moment and Estimate and interpret the skewness and kurtosis of a a sample moment. random variable. Distinguish between an estimator and an estimate. Use sample data to estimate quantiles, including the median. Describe the bias of an estimator and explain what the bias measures. Estimate the mean of two variables and apply the CLT. Explain what is meant by the statement that the mean Estimate the covariance and correlation between two estimator is BLUE. random variables. Describe the consistency of an estimator and explain the Explain how coskewness and cokurtosis are related to usefulness of this concept. skewness and kurtosis. 63 00006613-00000002_CH05_p063-082.indd 63 26/07/22 8:38 PM This chapter describes how sample moments are used to an observed data set. In contrast, an estimate is the value pro- estimate unknown population moments. In particular, this duced by an application of the estimator to data.1 chapter pays special attention to the estimation of the mean. The mean estimator is a function of random variables, and so it This is because when data are generated from independent and is also a random variable. Its properties can be examined using identically distributed (iid) random variables, the mean estimator the tools developed in the previous chapters. has several desirable properties. The expectation of the mean is It is (on average) equal to the population mean. n ] = EJ a Xi R = a E[Xi] = a m = m 1 n 1 n 1 n As the number of observations grows, the sample mean E[m (5.2) ni = 1 ni = 1 ni = 1 becomes arbitrarily close to the population mean. The expected value of the mean estimator is the same as the The distribution of the sample mean can be approximated population mean.2 While the primary focus is on the case where using a standard normal distribution. Xi are iid, this result shows that the expectation of the mean This final property is widely used to test hypotheses about pop- estimator is m whenever the mean is constant (i.e., E[Xi] = m for ulation parameters using observed data. all i). This property is useful in applications to time-series that Data can also be used to estimate higher-order moments such are not always iid but have a constant mean. as variance, skewness, and kurtosis. The first four (standard- The bias of an estimator is defined as: ized) moments (i.e., mean, variance, skewness, and kurtosis) are Bias(un) = E[un] - u (5.3) widely used in finance and risk management to describe the key features of data sets. where u is the true (or population) value of the parameter that we are estimating (e.g., m or s2). It measures the difference Quantiles provide an alternative method to describe the distri- between the expected value of the estimator and the popula- bution of a data set. Quantile measures are particularly useful tion value estimated. Applying this definition to the mean: in applications to financial data because they are robust to extreme outliers. Finally, this chapter shows how univariate Bias(m n ) = E[m n] - m = m - m = 0 sample moments can extend to more than one data series. Because the mean estimator’s bias is zero, it is unbiased. The variance of the mean estimator can also be computed using 5.1 THE FIRST TWO MOMENTS the standard properties of random variables.3 Recall that the general formula for the variance of a sum is the sum of the vari- ances plus any covariances between the random variables: Estimating the Mean n ] = V J a Xi R = 2 V J a Xi R = 2 b a V[Xi] + Covariances r When working with random variables, we are interested in the 1 n 1 n 1 n V[m value of population parameters such as the mean (m) or the ni = 1 n i=1 n i=1 variance (s2). However, these values are not observable, and so (5.4) data are used to estimate these values. The population mean is Since Xi are iid, these variables are uncorrelated and so have 0 estimated using the sample mean (i.e., average) of the data. The covariance. Thus: V J a Xi R = 2 a V[Xi] = 2 (ns2) = s > n mean estimator is defined as: 1 n 1 n 1 2 (5.5) nia 1 n ni = 1 n i=1 n n = m Xi (5.1) =1 A mean estimate m n is obtained from Equation 5.1 by replacing the random variables (i.e., Xi ) with their observed 1 This definition only covers point estimators, that is, an estimator whose values (i.e., xi ). The hat-notation (n) is used to distinguish value is a single point (a scalar or finite-dimensional vector). This is the an estimator or a sample estimate (in this case m n ) from the most widely used class of estimators. unknown population parameter (in this case m) of interest. 2 This result only depends on the linearity of the expectation operator, Note that X is another common symbol for the sample mean. so that the expectation of the sum is the sum of the expectation. 3 In this chapter, the random variables Xi are assumed to be The variance of the sample mean estimator is distinct from variance independent and identically distributed, so that E[Xi] = m and of the underlying data (i.e., X1, X2, c Xn). The variance of the sample mean depends on the sample size n, whereas the variance of the data V[Xi] = s2 for all i. does not. These two are commonly confused when first encountered because the variance of the sample mean depends on the variance of The mean estimator is a function that transforms the data into an the data (i.e., s2). See the box, Standard Errors and Standard Deviation, estimate of the population mean. More generally, an estimator is for more discussion about the distinction between the variance of an a mathematical procedure that calculates an estimate based on estimator and the variance of the data used in the estimator. 64 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 64 26/07/22 8:38 PM The variance of the mean estimator depends on two values: the variance of the data (i.e., s2) and the number of observa- tions (i.e., n). The variance in the data is Average Variance Calculated noise that obscures the mean. The more variable the data, the harder it is to esti- mate the mean of that data. The variance of the mean estimator also decreases as the number of observations increases, and so larger samples produce estimates of the mean that tend to be closer to the popula- tion mean. This occurs because there is more information in the sample when the number of data points is larger, increasing the accuracy of the estimated mean. Number of Trials Estimating the Variance Figure 5.1 The figure shows the average variance, calculated using the and Standard Deviation sample variance and the unbiased estimator, for an increasing number of Recall that the variance of a random vari- samples of size 20. The values of the biased and unbiased estimators converge able is defined as: toward 0.95 and 1, respectively, as the number of samples increases. s2 = V[X] = E[(X - E[X])2] Note that this bias is small when n is large.5 The bias arises Given a set of n iid random variables Xi, the sample estimator of because the sample variance depends on the estimator of the the variance is mean. Estimation of the mean consumes a degree of freedom, n 2 = a (Xi - m 1 n s n )2 (5.6) and the sample mean tends to resemble the sample of data a ni = 1 little too well (especially for smaller sample sizes). This slight n 2) is obtained from A sample variance estimate (i.e., the value of s “overfitting” of the observed data produces a slight underesti- Equation 5.6 by replacing the Xi with the observed values xi. mation of the population variance. Note that the sample variance estimator and the definition of the Because the bias is known, an unbiased estimator for the population variance have a strong resemblance. Replacing the expectation operator E[.] with the averaging operator n-1 a i = 1[.] n v ariance can be constructed as: a n n 1 transforms the expression for the population moment into the s2 = n2 = s n )2 (X - m (5.8) n - 1 n - 1i = 1 i estimator moment. The sample average is known as the sample analog to the expectation operator. It is a powerful tool that is Note that this procedure divides by n - 1, rather than by n, and used throughout statistics, econometrics, and this chapter to the resulting estimator is unbiased since transform population moments into estimators.4 E[s2] = s2 Unlike the mean, the sample variance is a biased estimator. It To see this principle in action, suppose that multiple samples can be shown that: are drawn from an N(0,1) distribution, each sample has a size of s2 n - 1 2 20, and a sample variance s n 2 is calculated for each sample. The n 2] = s 2 - E[s = s (5.7) results are as shown in Figure 5.1. n n The sample variance is therefore biased with: As expected, the value converges to 0.95 as the number of n 2) = E[s Bias(s n 2] - s 2 samples of 20 increases because: n - 1 2 n - 1 2 19 = s - s2 n 2] = E[s s = (1) = 0.95 n n 20 = s2/n Also shown is the unbiased variance estimator s2, which is simply n the biased estimator multiplied by. 4 Recall that the population variance can be equivalently defined as n - 1 n 2 = n -1 g ni= 1X2i - (n -1 g nj= 1Xj)2. This E[X2] - E[X]2. The sample analog can be applied here, and an alternative 5 n 2 is asymptotically estimator of the sample variance is s This is an example of a finite sample bias, and s 2 form is numerically identical to Equation 5.6. unbiased because lim S Bias(sn ) = 0. n ∞ Chapter 5 Sample Moments 65 00006613-00000002_CH05_p063-082.indd 65 26/07/22 8:38 PM STANDARD ERRORS VERSUS SCALING OF THE MEAN S TANDARD DEVIATIONS AND STANDARD DEVIATION The variance of the sample mean depends on the variance The most familiar and straightforward way to calculate of the data. The standard deviation of the mean estimator is returns is to take the difference between the current 2 and previous price and divide it by the previous price. 2s n = s2n But returns can also be measured using the difference and thus, it depends on the standard deviation of between log prices (i.e., the natural logarithm of the the data. returns). Log returns are convenient, as the two-period The standard deviation of the mean (or any other estimator) return is just the sum of two consecutive returns: is known as a standard error. Standard errors and standard ln P2 - ln P0 = (ln P2 - ln P1) + (ln P1 - ln P0), deviations are similar, although these terms are not used = R 2 + R1 interchangeably. Standard deviation refers to the uncer- where Pi is the price of the asset in period i and the return tainty in a random variable or data set, whereas standard Ri is defined using consecutive prices. Note that log-returns error is used to refer to the uncertainty of an estimator. This can be validly summed over time, whereas simple returns distinction is important, because the standard error of an cannot. The convention followed here is that the first price estimator declines as the sample size increases. Meanwhile, occurs at time 0 so that the first return, which is measured standard deviation does not change with the sample size. using the price at the end of period one, is R1. Standard errors are important in hypothesis testing and The return over n periods is then: R1 + R 2 + g + R n = a Ri when performing Monte Carlo simulations. In both of n these applications, the standard error provides an indica- tion of the accuracy of the estimate. i=1 If returns are iid with mean E[Ri] = m and variance V[Ri] = s2, then the mean and variance of the n-period return are nm and ns2, respectively. Note that the mean follows directly from The difference in the bias of these two estimators might sug- the expectation of sums of random variables, because the gest that s2 is a better estimator of the population variance than expectation of the sum is the sum of the expectation. n 2. However, this is not necessarily the case because s s n 2 has a The variance relies crucially on the iid property, so that 2 2 2 smaller variance than s (i.e., V[s n ] 6 V[s ]). Financial statistics the covariance between the returns on a given asset over typically involve large datasets, and so the choice between s n 2 or time is 0. In practice, financial returns are not iid but can be s2 makes little difference in practice. The convention is to prefer uncorrelated, which is sufficient for this relationship to hold. n 2 when the sample size is moderately large (i.e., n Ú 30). Finally, standard deviation is the square root of the variance, s and so the n-period standard deviation scales with 2n. The sample standard deviation is estimated using the square root of the sample variance (i.e., s n = 2s n 2 or s = 2s2). The standard deviation are also measured in percentage returns. square root is a nonlinear function and so both estimators of the This is not true for other statistics, such as the variance. standard deviation are biased. However, this bias diminishes as n becomes large and is typically small in large financial data sets. One challenge when using asset price data is the choice of sam- pling frequency. Most assets are priced at least once per day, and many assets have prices that are continuously available through- Presenting the Mean and Standard out the trading day (e.g., equities, some sovereign bonds, and Deviation futures). Other return series, such as hedge fund returns, are only Means and standard deviations are the most widely reported available at lower frequencies (e.g., once per month or even once statistics. Their popularity is due to several factors. per quarter or year). This can create challenges when describing the mean or standard deviation of financial returns. For example, The mean and standard deviation are often sufficient to sampling over one day could give an asset’s average return (i.e., describe the data (e.g., if the data are normally distributed n Daily) as 0.1%, whereas sampling over one week (i.e., m m n Weekly) or are generated by some other one- or two-parameter could indicate an average return of 0.485%. distributions). In practice, it is preferred to report the annualized mean and stan- These two statistics provide guidance about the likely range dard deviation, regardless of the sampling frequency. Annualized of values that can be observed. sample means are scaled by a constant that measures the number The mean and standard deviation are in the same units as the of sample periods in a year. For example, a monthly mean is mul- data, and so can be easily compared. For example, if the data tiplied by 12 to produce an annualized mean. Similarly, a weekly are percentage returns of a financial asset, the mean and mean is multiplied by 52 to produce an annualized version, and 66 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 66 26/07/22 8:38 PM a daily mean is multiplied by the number of trading days per returns). The annual versions of the means are quite close to one year (e.g., 252 or 260). Note that the scaling convention for daily another, as are the annualized standard deviations. statistics varies by country (and possibly by the year) because the number of trading days differs across markets. 5.2 HIGHER MOMENTS Returning to the previous example with the samples from one day and one week: Beyond the mean and standard deviation, two higher order moments are also commonly measured in financial data: skew- n Annual = 252m m n Daily = 52m n Weekly ness and kurtosis. = (252)(0.001) = (52)(0.00485) Recall that the skewness is a standardized version of the third = 2.52% central moment, and so it is unit- and scale-free. The population Meanwhile, standard deviations use the square root of the same value of the skewness is defined as: scale factors because standard deviations scale with the square E[(X - E[X])3] m3 root of the time interval. For example, the standard deviation Skewness(X) = = , (5.9) s3 3 E[(X - E[X])2] 2 computed from daily data is multiplied by 2252 to produce an annualized standard deviation. where m3 is the third central moment and s is the standard deviation. The third moment raises deviations from the mean to the third Estimating the Mean and Variance power. This has the effect of amplifying larger shocks relative Using Data to smaller ones (compared to the variance, which only squares As an example, consider a set of four data series extracted from deviations). Because the third power also preserves the sign of the Federal Reserve Economic Data (FRED) database of the the deviations, an asymmetric random variable will not have a Federal Reserve Bank of St. Louis:6 third moment equal to 0. When large deviations tend to come from the left tail, the distribution is negatively skewed. If they 1. The ICE BoAML US Corp Master Total Return Index Value, are more likely to come from the right tail, then the distribution which measures the return on a diversified portfolio of cor- is positively skewed. porate bonds; Kurtosis is similarly defined using the fourth moment standard- 2. The Russell 2000 Index, a small-cap equity index that mea- ized by the squared variance. It is also unit- and scale-free: sures the average performance of 2,000 firms; E[(X - E[X])4] m4 3. The Gold Fixing Price in the London bullion market, the Kurtosis(X) = k = = (5.10) E[(X - E[X])2]2 s4 benchmark price for gold; and The kurtosis uses a higher power than the variance, and so it is 4. The spot price of West Texas Intermediate Crude Oil, the more sensitive to large observations (i.e., outliers). It discards leading price series for US crude oil. sign information (because a negative deviation from the mean All data in this set are from the period between February 1, 1989 becomes positive when raised to the fourth power), and so it and December 28, 2018. These prices are sampled daily, weekly measures the relative likelihood of seeing an observation from (on Thursdays), and at the end of the month.7 Returns are the tail of the distribution. computed using log returns, defined as the difference between The kurtosis is usually compared to the kurtosis of a normal dis- consecutive log prices (i.e., ln Pt - ln Pt - 1 ). tribution, which is 3. Random variables with a kurtosis greater Table 5.1 reports the means and standard deviations for these than 3 are often called heavy-tailed/fat-tailed, because the fre- four series. Note how the values change across the three sam- quency of large deviations is higher than that of a random vari- pling intervals. The bottom rows of each panel contain the annu- able with a normal distribution.8 alized versions of each statistic, which largely remove the effect The estimators of the skewness and the kurtosis both use the of the sampling frequency. The scale factors applied are 245 principle of the sample analog to the expectation operator. The (for daily returns), 52 (for weekly returns), and 12 (for monthly estimators for the skewness and kurtosis are n3 m n4 m 6 and 4 (5.11) See Federal Reserve Bank of St. Louis. Federal Reserve Economic n s 3 n s Data: FRED: St. Louis Fed. Retrieved from https://fred.stlouisfed.org/ These data are freely available. 8 Many software packages report excess kurtosis, which is the kurtosis 7 minus 3 (i.e., k – 3), and so care is needed when using reported kurtosis Thursday-to-Thursday prices are preferred when computing returns because public holidays are more likely to occur on a Friday. to assess tail heaviness. Chapter 5 Sample Moments 67 00006613-00000002_CH05_p063-082.indd 67 26/07/22 8:38 PM Table 5.1 Sample Statistics of the Returns of Four Representative Financial Assets. The Returns Are Computed from Daily, Weekly (Using Thursday Prices), and End-of-Month Prices. Each Panel Reports the Mean, Standard Deviation, Skewness, and Kurtosis. The Annualized Mean and Standard Deviation Are Reported and Are Based on Scale Factors of 245, 52, and 12 for Daily, Weekly, and Monthly Data, Respectively. The Row Labeled n Reports the Number of Observations. The Final Row in the Top Panel Reports the Average Number of Observations per Year Daily Data Stocks Bonds Crude Gold Mean 0.044% 0.027% 0.045% 0.021% Std. Dev. 1.297% 0.301% 2.497% 1.002% Skewness - 0.202 - 0.321 0.092 - 0.095 Kurtosis 6.39 2.62 12.78 6.56 Ann. Mean 10.7% 6.7% 10.9% 5.2% Ann. Std. Dev. 20.3% 4.7% 39.1% 15.7% n 7326 7326 7326 7326 n/Year 245 245 245 245 Weely Data Stocks Bonds Crude Gold Mean 0.205% 0.128% 0.191% 0.096% Std. Dev. 2.747% 0.652% 5.119% 2.157% Skewness - 0.676 - 0.396 - 0.074 0.211 Kurtosis 8.66 2.58 2.81 4.83 Ann. Mean 10.6% 6.7% 9.9% 5.0% Ann. Std. Dev. 19.8% 4.7% 36.9% 15.6% n 1561 1561 1561 1561 Monthly Data Stocks Bonds Crude Gold Mean 0.866% 0.559% 0.709% 0.427% Std. Dev. 5.287% 1.462% 9.375% 4.460% Skewness - 0.445 - 0.748 0.277 0.147 Kurtosis 0.92 4.73 2.24 1.35 Ann. Mean 10.4% 6.7% 8.5% 5.1% Ann. Std. Dev. 18.3% 5.1% 32.5% 15.4% n 359 359 359 359 The third and fourth central moments are estimated by: Table 5.1 also reports the estimated skewness and kurtosis for n ia the four assets and three sampling horizons. These moments are 1 n n3 = m n )3 (xi - m (5.12) scale-free, and so there is no need to annualize them. =1 n 4 = a (xi - m 1 n Skewness, while being unit-free, does not follow a simple scaling m n )4 (5.13) ni = 1 law across sampling frequencies. Financial data are heteroskedastic 68 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 68 26/07/22 8:38 PM HISTOGRAMS AND KERNEL DENSITY PLOTS Histograms are simple graphical tools used to represent the common kernel density estimators use a weight function that frequency distribution of a data series. Histograms divide the declines as the distance between two points increases, thus pro- range of the data into m bins and then tabulate the number ducing a smooth curve. of values that fall within the bounds of each bin. Kernel den- Figure 5.2 shows two examples of histograms and kernel sity plots are smoothed versions of histogram plots. densities. The left panel shows the daily returns of the Russell A density plot differs from a histogram in two ways. First, rather 2000 Index, while the right panel shows its monthly returns. than using discrete bins with well-defined edges, a density plot Because there are over 7,000 daily returns, the density plot computes the number of observations that are close to any is only slightly smoother than the histogram. In the monthly point on the x-axis. In effect, a density plot uses as many bins as example, there are only 383 observations and so the differ- there are data points in the sample. Second, a density plot uses ence between the two is more pronounced. Kernel density a weighted count where the weight depends on the distance plots are generally preferred over histograms when visualiz- between the point on the x-axis and the observed value. Most ing the distribution of a dataset. Figure 5.2 Histograms and densities for daily (left panel) and monthly (right panel) Russell 2000 Index returns. (i.e., the volatility of return changes over time) and the differences in monthly, or even quarterly) have a distribution that is closer to the volatility dynamics across assets produce a different scaling of that of a normal than returns sampled at higher frequencies. skewness.9 Gold has a positive skewness, especially at longer hori- This is because large short-term changes are diluted over longer zons, indicating that positive surprises are more likely than negative horizons by periods of relative calmness. surprises. The skewness of crude oil returns is negative when returns Figure 5.3 contains density plots of the four financial assets in are sampled at the daily horizon but becomes positive when sam- the previous example. Each plot compares the density of the pling over longer horizons. Bond and stock market returns are usu- assets’ returns to that of a normal distribution with the same ally negatively skewed at all horizons since large negative returns mean and variance. These four plots highlight some common occur more frequently than comparably large positive returns. features that apply across asset classes. The most striking fea- All four asset return series have excess kurtosis, although ture of all four plots is the heavy tails. Note that the extreme bonds have fewer extreme observations than the other three tails (i.e., the values outside {2s) of the empirical densities are asset classes. Kurtosis declines at longer horizons, and it is above those of the normal. The more obvious hint that these a stylized fact that returns sampled at low frequencies (i.e., distributions are heavy-tailed is the sharp peak in the center of each density. Because both the empirical density and the nor- 9 mal density match the sample variance, the contribution of the Time-varying volatility is a pervasive property of financial asset returns. This topic is covered in Chapter 13. heavy tails to the variance must be offset by an increase in mass Chapter 5 Sample Moments 69 00006613-00000002_CH05_p063-082.indd 69 26/07/22 8:38 PM Small-Cap Stocks Investment-Grade Bonds Gold West Texas Intermediate Crude Figure 5.3 Density plots of daily returns data from small-cap stocks, corporate bonds, gold, and crude oil. The solid line in each plot is a data-based density estimate of each asset’s returns. The dashed line in each panel is the density of a normal random variable with the same mean and variance as the corresponding asset returns. The vertical lines indicate {2s. near the center. At the same time, regions near {1s lose mass Linear estimators of the mean can be expressed as: n = a wiXi, to the center and the tail to preserve the variance. The negative n m skewness is also evident in stock, bond, and crude oil returns. i=1 All three densities appear to be leaning to the right, which is where wi are weights that do not depend on Xi. In the sample evidence of a long left-tail (although the fat tails themselves are mean estimator, for example, wi = 1/n. hard to see with figures of this size). The unbiasedness of the mean estimator was shown earlier in this chapter. Showing that the mean is BLUE is also straightfor- 5.3 THE BLUE MEAN ESTIMATOR ward, but it is quite tedious and so is left as an exercise.11 The mean estimator is the Best Linear Unbiased Estimator BLUE is a desirable property for an estimator, because it estab- (BLUE) of the population mean when the data are iid.10 In this lishes that the estimator is the best estimator (in the sense of context, best indicates that the mean estimator has the lowest 11 To show that this claim is true, consider another linear estimator that variance of any linear unbiased estimator (LUE). that g w g ∼ uses a set of weights of the form w i = wi + di. Unbiasedness requires ∼ i = 1, and so di = 0. The remaining steps only require com- 10 BLUE only requires that the mean and variance are the same for all puting the variance, which is equal to the variance of the sample mean observations. These conditions are satisfied when random variables are iid. estimator plus a term that is always positive. 70 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 70 26/07/22 8:38 PM having the smallest variance) among all linear and unbiased estima- s2 = 1. The figures on the right use data generated by a Pois- tors. It does not, however, imply that there are no superior esti- son with shape parameter equal to 3. Both distributions are mators to the sample mean. It only implies that these estimators right-skewed, and the Poisson is a discrete distribution. must either be biased or nonlinear. Maximum likelihood estimators The top panels show the PDF and the PMF of the simulated data of the mean are generally more accurate than the sample mean, used in the illustration. The middle panel shows the distribution although they are usually nonlinear and often biased in finite of the sample mean using simulated data. The number of simu- samples. Maximum likelihood estimation will be discussed in detail lated data points varies from 10 to 640 in multiples of 4. This ratio in Chapter 6. between the sample sizes ensures that the standard deviation halves each time the sample size increases. The dashed line in the 5.4 LARGE SAMPLE BEHAVIOR center of the plots shows the population mean. These finite-sample OF THE MEAN distributions of the mean estimators are calculated using data simu- lated from the assumed distribution. The sample mean is computed The mean estimator is always unbiased, and the variance of from each simulated sample, and 10,000 independent samples are the mean estimator takes a simple form when the data are iid. constructed. The plots in the center row show the empirical density Two moments, however, are usually not enough to completely of the estimated sample means for each sample size. describe the distribution of the mean estimator. If data are iid Two features are evident from the LLN plots. First, the density normally distributed, then the mean estimator is also normally of the sample mean is not a normal. In sample means computed distributed.12 However, it is generally not possible to establish from the simulated log-normal data, the distribution of the the exact distribution of the mean based on a finite sample sample mean is right-skewed, especially when n is small. Sec- of n observations. Instead, modern econometrics exploits the ond, the distribution becomes narrower and more concentrated behavior of the mean in large samples (formally, when n S ∞ ) to around the population mean as the sample size increases. The approximate the distribution of the mean in finite samples. collapse of the distribution around the population mean is evi- The Law of Large Numbers (LLN) establishes the large sample dence that the LLN applies to these estimators. behavior of the mean estimator and provides conditions where The LLN also applies to other sample moments. For example, an average converges to its expectation. There are many LLNs, when the data are iid and E[X2i ] is finite, then the LLN ensures that but the simplest for iid random variables is the Kolmogorov a.s. n 2 S s2, and so the sample variance estimator is also consistent.14 s Strong Law of Large Numbers. This LLN states that if {Xi} is a sequence of iid random variables with m K E[Xi], then: Consistency is an important property of an estimator, although it is not enough to understand the estimator’s distribution. For n ia 1 n a.s. nn = m Xi S m example, the distribution of m n - m is not easy to study because =1 it collapses to 0 as n S ∞. a.s. The symbol S means converges almost surely.13 This property The solution is to rescale the difference between the estimate ensures that the probability of m n n being far from m converges to and population value by 2n. This is the required value to stabi- 0 as n S ∞. lize the distribution because When an LLN applies to an estimator, the estimator is said to be s2 consistent. Consistency requires that an estimator is asymptoti- V[m n] = n cally unbiased, and so any finite sample bias must diminish as n so that n 2 has this property). A consistent estimator, how- increases (e.g., s ever, has another key feature: as the sample size n grows larger, the V[2nm n ] = nV[m n] = s variance of the estimator converges to zero (i.e., V[m n n] S 0). This ensures that the chance of observing a large deviation of a sample The Centrl Limit Theorem (CLT) states that the sampling distri- estimate from the population value is negligible in large samples. bution of the mean for any random sample of observations will: Figure 5.4 illustrates the LLN using simulated data. The figures tend towards the normal distribution, and on the left all use log-normal data with parameters m = 0 and have a mean equal to the population mean as the sam- ple size tends to infinity. 12 This result only depends on the property that the sum of normal ran- dom variables is also normally distributed. 13 Formally the LLN can be applied to two averages, n -1 a x2i S E[X2] Almost sure convergence is a technical concept from probability 14 a.s. and n -1 a xi S E[X]. The sample variance estimator can be equivalently theory. An estimator that converges almost surely must converge for any a.s. sequence of values that can be produced by the random variables Xi, i = 1, 2, c. expressed using these two averages. Chapter 5 Sample Moments 71 00006613-00000002_CH05_p063-082.indd 71 26/07/22 8:38 PM Log-Normal (0,1) Poisson (3) PDF PMF Law of Large Numbers Central Limit Theorem Figure 5.4 These figures demonstrate the consistency of the sample mean estimator using the Law of Large Numbers (middle panels) and with respect to the normal distribution when scaled (bottom panels). The figures are generated using simulated data. The left panels use simulated data from a log-normal distribution with parameters M = 0, S 2 = 1. The right panels use simulated data from a discrete distribution, the Poisson with shape parameter equal to 3. The top panels show the PDF of the log-normal and the PMF of the Poisson. 72 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 72 26/07/22 8:38 PM Note that the CLT can also be applied to the rescaled difference panel), the CLT is not an accurate approximation when n = 10 under some additional assumptions. because of the evident right-skewness (due to the skewness in the random data sampled from the log-normal). When n = 40, The simplest CLT is known as the Lindberg-Lévy CLT. This CLT however, the skew has substantially disappeared and the approx- states that if {Xi} are iid, then imation by a standard normal distribution appears to be accu- d n - m) S N(0, s2), 2n(m rate. Note that skewness in the distribution of the sample mean d where m = E[Xi] and s2 = V[Xi]. The symbol S denotes conver- is a finite-sample property, and when n = 160, the distribution is gence in distribution. extremely close to the standard normal. This shows that the sam- ple mean estimator is asymptotically normally distributed even if This CLT requires a further assumption in addition to those the underlying data follow some other (non-normal) distribution. required for the LLN: that the variance s2 is finite (as the LLN only requires that the mean is finite). This CLT can be alterna- In the application of the CLT to the mean of data generated tively expressed as: from Poisson random variables (i.e., the right panel), the CLT appears to be accurate for all sample sizes. n - m m mn - m d 2n¢ ≤ = £ s ≥ S N(0,1) s 2n 5.5 THE MEDIAN AND OTHER QUANTILES This version makes it clear that the mean, when scaled by its standard error (i.e., s/2n), is asymptotically standard normally The median, like the mean, measures the central tendency of a distri- distributed. bution. Specifically, it is the 50% quantile of the distribution and the CLTs extend LLNs and provide an approximation to the distribution point where the probabilities of observing a value above or below it of the sample mean estimator. Furthermore, they do not require are equal (i.e., 50%). When a distribution is symmetric, the median is knowledge of the distribution of the random variables generating in the center of the distribution and is the same as the mean. When the data. In fact, only independence and some moment conditions distributions are asymmetric, the median is larger (smaller) than the are required for the CLT to apply to a sample mean estimator. mean if the distribution is left-skewed (right-skewed). The CLT is an asymptotic result and so technically only holds Estimation of the median from a data sample is simple. First, the asymptotically (i.e., in the limit). In practice, the CLT is used as data are sorted from smallest to largest. When the sample size is an approximation in finite samples so that the distribution of the odd, the value in position (n + 1)/2 of the sorted list is used to sample mean is approximated by: estimate the median: 2 median(x) = x(n + 1)/2 (5.14) n ∼ N (m, s n) m When the sample size is even, the median is estimated using the This expression for the mean shows that (in large samples) the average of the two central points of the sorted list: distribution of the sample mean estimator (mn ) is centered on the median(x) = (1/2)(xn/2 + xn/2 + 1). (5.15) population mean (m), and the variance of the sample average Two other commonly reported quantiles are the 25% and 75% declines as n grows. quantiles. These are estimated using the same method as the The bottom panels of Figure 5.4 show the empirical distribution median. More generally, the a-quantile is estimated from the for 10,000 simulated values of the sample mean. The CLT says sorted data using the value in location an. When an is not an that the sample mean is normally distributed in large samples, so integer value, then the usual practice is to take the average of 2 n ∼ N(m, s n). This value is standardized by subtracting the that m the points immediately above and below an.15 mean and dividing by the standard errors of the mean so that: These two quantile estimates (qn 25 and q n 75) are frequently used mn - m Z = s together to estimate the inter-quartile range: 2n IQR = q n 75 - q n 25 is a standard normal random variable. The simulated values show whether the CLT provides an accurate approximation to The IQR is a measure of dispersion and so is an alternative to the (unknown) true distribution of the sample mean for different the standard deviation. Other quantile measures can be con- sample sizes n. structed to measure the extent to which the series is asymmetric or to estimate the heaviness of the tails. Determining when n is large enough is the fundamental ques- tion when applying a CLT. Returning to Figure 5.4, note that the 15 There is a wide range of methods to interpolate quantiles when an is shaded areas in the bottom panels are the PDF of a standard not an integer, and different methods are used across the range of com- normal. For data generated from the log-normal (i.e., the left mon statistical software. Chapter 5 Sample Moments 73 00006613-00000002_CH05_p063-082.indd 73 26/07/22 8:38 PM Two features make quantiles attractive. First, quantiles have the (i.e., bonds, stocks, and crude oil). Note that the median scales same units as the underlying data, and so they are easy to inter- similarly to the mean, and the weekly median is about five times pret. For example, the 25% quantile for a sample of asset returns larger than the daily value. estimates the point where there is 25% probability of observing The interquartile range also agrees with the standard deviation a smaller return (and a 75% probability of observing a larger estimates; crude is the riskiest of the assets, followed by stocks return). Meanwhile, the interquartile range estimates a central and gold, while bonds are the safest (assuming that the risk of interval where there is 50% probability of observing a return. each asset can be measured by the dispersion of its returns). The second attractive feature is robustness to outliers (i.e., val- ues very far from the mean). For example, suppose that some observed data {x1, c, xn} are contaminated with an outlier. 5.6 MULTIVARIATE MOMENTS Both the median and IQR are unaffected by the presence of an outlier. However, this is not true of the mean estimator, which The sample analog can be extended from univariate statistics (i.e., gives weight 1/n to the outlier. If the outlier is far from the other involving a single random variable) to multivariate statistics (i.e., (valid) observations, this distortion can be large. The variance, involving two or more random variables). because it squares the difference between the outlier and the The extension of the mean is straightforward. The sample mean mean, is even more sensitive to outliers. of two series is just the collection of the two univariate sample Table 5.2 contains the median, quantile, and IQR estimates for means. However, extending the sample variance estimator to the four-asset example discussed in this chapter. The median multivariate datasets requires estimating the variance of each return is above the mean return for the negatively skewed assets series and the covariance between each pair. The higher-order moments, such as skewness and kurtosis, can be similarly Table 5.2 Quantile Estimates for the Four Asset defined in a multivariate setting. Classes Using Data Sampled Daily, Weekly, and Monthly. The Median Is the 50% Quantile. The Inter-Quartile Range (IQR) Is the Difference between Covariance and Correlation the 75% Quantile q(.75) and the 25% Quantile q(.25) Recall that covariance measures the linear dependence between Daily Data two random variables and is defined as: sXY K Cov[X,Y ] = E[(X - E[X ])(Y - E[Y ])] Bonds Stocks Gold Crude The sample covariance estimator uses the sample analog to the Median 0.034% 0.103% 0.014% 0.058% expectation operator q(.25) - 0.140% - 0.534% - 0.445% -1.21% n ia 1 n q(.75) 0.207% 0.660% 0.496% 1.30% nXY = s (Xi - m n X)(Yi - m n Y), (5.16) =1 IQR 0.347% 1.19% 0.941% 2.51% where m n X is the sample mean of X and m n Y is the sample mean Weekly Data of Y. Like the sample variance estimator, the sample covariance estimator is biased toward zero. Dividing by n - 1, rather than Bonds Stocks Gold Crude n, produces an unbiased estimate of the covariance. Median 0.159% 0.459% 0.044% 0.224% Correlation is the standardized version of the covariance and is q(.25) - 0.244% - 1.12% - 1.01% -2.64% usually preferred because it does not depend on the scale of X or q(.75) 0.542% 1.69% 1.24% 3.03% Y. The correlation is estimated by dividing the sample covariance by the product of the sample standard deviations of each variable: IQR 0.786% 2.81% 2.24% 5.67% nXY s snXY Monthly Data rn XY = = (5.17) nX2 2s 2s nY2 nX s s nY Bonds Stocks Gold Crude The sample correlation estimator is biased in finite samples, Median 0.640% 1.56% 0.265% 0.839% even if unbiased estimators are used to estimate the variances q(.25) - 0.220% - 2.47% - 2.29% -5.14% and covariance. This occurs for the same reason that the sample estimator of the standard deviation is always biased: Unbi- q(.75) 1.42% 4.12% 2.84% 6.72% asedness is not preserved through nonlinear transformations IQR 1.64% 6.59% 5.14% 11.9% (because E[g(Z)] ≠ g(E[Z]) for nonlinear functions g). 74 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 74 26/07/22 8:38 PM Table 5.3 contains the estimated correlations between gold and stocks is small (i.e., between - 2 and 4%). In contrast, the returns in the four-asset example. These correlations the sample correlation between the Russell 1000 (a large-cap are estimated using returns computed from daily, weekly, stock index) and the Russell 2000 (a small-cap stock index) is and monthly data. Note that correlations are unit free by 87%, 88%, and 85% when measured using daily, weekly, or construction. monthly data, respectively. This feature is useful when construct- ing diversified portfolios, because the correlation between the The highest correlation (between gold and crude oil) ranges assets held appears in the variance of the portfolio. between 13% and 19% across the different sampling frequencies. Stocks and bonds have a negative correlation when measured using daily data and a positive correlation when measured using Sample Mean of Two Variables monthly data. Correlations measured at different frequencies do Estimating the means of two random variables is no different not have to be the same, and here the monthly correlations are from estimating the mean of each separately, so that: generally larger than the daily correlations. This is because short- a n y = a yi term returns are often driven by short-term issues that are spe- 1 n 1 n nX = m xi and m cific to an individual asset (e.g., liquidity), whereas longer-term ni = 1 ni = 1 returns are more sensitive to macroeconomic changes. When data are iid, the CLT applies to each estimator. However, These are all relatively small correlations, and cross-asset class it is more useful to consider the joint behavior of the two mean correlations are generally smaller than correlations within an estimators by treating them as a bivariate statistic. The CLT can asset class. For example, the estimate of the correlation between be applied by stacking the two mean estimators into a vector: nX m n = J m R (5.18) nY m Table 5.3 Sample Correlations between Bond, This 2-by-1 vector is asymptotically normally distributed if the Stock, Gold, and Crude Oil Returns. The Correlations multivariate random variable Z = [X, Y ] is iid.16 Are Measured Using Data Sampled Daily, Weekly, and The CLT for the vector depends on the 2-by-2 covariance matrix for Monthly the data. A covariance matrix collects the two variances (one for X Daily Data and one for Y ) and the covariance between X and Y into a matrix: Bonds Stocks Gold Crude sX2 sXY J R (5.19) sXY sY2 Bonds – - 16.4% 3.0% -9.9% The elements along the leading diagonal of the covariance Stocks - 16.4% – -2.1% 12.5% matrix are the variances of each series, and those on the off- Gold 3.0% - 2.1% – 13.0% diagonal are the covariance between the pair. Crude - 9.9% 12.5% 13.0% – The CLT for a pair of mean estimators is virtually identical to that Weekly Data of a single mean estimator, where the scalar variance is replaced by the covariance matrix. The CLT for bivariate iid data series Bonds Stocks Gold Crude states that: Bonds – - 2.4% 2.3% -7.7% n X - mX m 0 s2 sXY 2nJ R S N ¢ J R ,J X R≤ (5.20) Stocks - 2.4% – 4.0% 13.2% n Y - mY m 0 sXY sY2 Gold 2.3% 4.0% – 15.2% so that the scaled difference between the vector of means is asymptotically multivariate normally distributed. In practice, this Crude - 7.7% 13.2% 15.2% – CLT is applied by treating the mean estimators as a bivariate Monthly Data normal random variable: Bonds Stocks Gold Crude sX2 sXY n m m n n Bonds – 15.1% 19.6% -0.2% J X R ∼ N § J x R ,D T¥ nY m my sXY sY2 Stocks 15.1% – -0.2% 11.6% n n Gold 19.6% - 0.2% – 18.9% Crude - 0.2% 11.6% 18.9% – 16 This assumes that each component has a finite variance. Chapter 5 Sample Moments 75 00006613-00000002_CH05_p063-082.indd 75 26/07/22 8:38 PM Table 5.4 Estimated Moments of the Monthly The two coskewness measures are Returns on the Russell 2000 and the BoAML Total E[(X - E[X])2(Y - E[Y ])] Return Index During the Period between 1987 and s(X,X,Y) = sX2sY 2018. The Means, Variance, and Covariance Are All and Annualized. The Correlation Is Scale-Free E[(X - E[X ])(Y - E[Y ])2] Moment nS M n S2 nB M n B2 SSB R s(X,Y,Y ) = (5.21) S S sXsY2 10.4 335.4 6.71 25.6 14.0 0.151 The coskewness standardizes the cross-third moments by the variance of one of the variables and the standard deviation of the other, and so is scale- and unit-free. Table 5.4 presents the annualized estimates of the means, vari- These measures both capture the likelihood of the data taking ances, covariance, and correlation for the monthly returns on a large directional value whenever the other variable is large in the Russell 2000 and the BoAML Total Return Index (respectively magnitude. When there is no sensitivity to the direction of one symbolized by S and B). Note that the returns on the Russell variable to the magnitude of the other, the two coskewnesses are 2000 are much more volatile than the returns on the corporate 0. For example, the coskewness in a bivariate normal is always 0, bond index. The correlation between the returns on these two even when the correlation is different from 0. Written using the indices (i.e., 0.151) is positive but relatively small. same notation as equation (5.21), the univariate skewness esti- The CLT depends on the population values for the two variances mators are s(X,X,X) and s(Y,Y,Y). (sS2 and sB2) and the covariance between the two (sSB = rsSsB). Coskewnesses are estimated by applying the sample analog to the To operationalize the CLT, the population values are replaced expectation operator to the definition of coskewness. For example: with estimates computed using the sample variance of each n a i = 1(xi - m n series and the sample covariance between the two series. 1 n X)2(yi - m n Y) sn(X,X,Y) = Applying the bivariate CLT using the numbers in Table 5.4: nX2s s nY n S - mS m 0 335.4 14.0 Table 5.5 contains the skewness and coskewness for the four 2nJ R S N¢ J R ,J R≤ n B - mB m 0 14.0 25.6 assets (i.e., bonds, stocks, crude oil, and gold) using data sampled Recall that in practice, the definition is applied as if the mean monthly. The estimates of the coskewness are mostly negative. estimators follow the limiting distribution, and so the estimates The bond-stock and crude oil-stock pairs have the largest coskew- are treated as if they are normally distributed: ness, although neither relationship is symmetric. For example, nS m m 0.934 0.039 s(X,X,Y ) in the bond-stock pair indicates that stock returns tend JR ∼ N¢ J S R ,J R ≤, to be negative when bond volatility is high. However, s(X,Y,Y) is nB m mB 0.039 0.071 where the covariance matrix is equal to the previous covariance close to 0 so that the sign of bond returns does not appear to be matrix divided by n (which is 359 in this example). strongly linked to the volatility of stock returns. The most important insight from the bivariate CLT is that cor- Table 5.5 Skewness and Coskewness in the Group relation in the data produces a correlation between the sample of Four Asset Classes Estimated Using Monthly means. Moreover, the correlation between the means is identical Data. The Far Left and Far Right Columns Contain to the correlation between the data series. Skewness Measures of the Variables Labeled X and Y, Respectively. The Middle Two Columns Report the Coskewness and Cokurtosis Estimated Coskewness Like variance, skewness and kurtosis can be extended to pairs X Y s(X,X,X) s(X,X,Y) s(X,Y,Y) s(Y,Y,Y) of random variables. When computing cross pth moments, there Bonds Crude -0.179 -0.008 0.040 0.104 are p - 1 different measures. Applying the principle to the first Bonds Gold -0.179 -0.018 - 0.032 - 0.101 four moments there are Bonds Stocks -0.179 -0.082 0.012 - 0.365 No cross means, Crude Gold 0.104 -0.010 - 0.098 - 0.101 One cross variance (covariance), Crude Stocks 0.104 -0.064 - 0.127 - 0.365 Two measures of cross-skewness (coskewness), and Three cross-kurtoses (cokurtosis). Gold Stocks -0.101 -0.005 0.014 - 0.365 76 Financial Risk Manager Exam Part I: Quantitative Analysis 00006613-00000002_CH05_p063-082.indd 76 26/07/22 8:38 PM k (X,X,Y,Y) k (X,X,X,Y) Figure 5.5 Plots of the cokurtoses for the symmetric case (K(X,X,Y,Y ), left) and the asymmetric case (K(X,Y,Y,Y ), right). Note that K(X,Y,Y,Y ) has the same shape as K(X,X,X,Y ) and so is omitted. The cokurtosis uses combinations of powers that add to 4, and When examining kurtosis, the value is usually compared to the so there are three configurations: (1,3), (2,2), and (3,1). The (2,2) kurtosis of a normal distribution (which is 3). Comparing a cokur- measure is the easiest to understand, and captures the sensitiv- tosis to that of a normal is more difficult, because the cokurtosis ity of the magnitude of one series to the magnitude of the other of a bivariate normal depends on the correlation. series (i.e., the strength of the relationship between the volatil- Figure 5.5 contains plots of the cokurtoses for normal data as a ity of X and the volatility of Y ). If both series tend to be large in function of the correlation between the variables (i.e., r), magnitude at the same time, then the (2,2) cokurtosis is large. which is always between -1 and 1. The symmetric cokurtosis The other two kurtosis measures, (3,1) and (1,3), capture the k(X,X,Y,Y ) ranges between 1 and 3: it is 1 when returns are agreement of the return signs when the power 3 return is large. uncorrelated (and therefore independent, because the two The three measures of cokurtosis are defined as: ra