FRM 2023 Level 1 Book 2 - Reading 12 PDF
Document Details
Uploaded by SelfSatisfactionCarnelian7833
GARP
Tags
Summary
This document is a reading on Fundamentals of Probability for GARP FRM Part I Quantitative Analysis. It covers important probability theory terms and concepts, including independent and mutually exclusive events, discrete and conditional probabilities, and Bayes rule. It also covers basic probability concepts such as random variables, and explores events and event spaces.
Full Transcript
The following is a review of the Quantitative Analysis principles designed to address the learning objectives set forth by GARP®. Cross-reference to GARP FRM Part I Quantitative Analysis, Chapter 1. READING 12 FUNDAMENTALS OF PROBABILITY...
The following is a review of the Quantitative Analysis principles designed to address the learning objectives set forth by GARP®. Cross-reference to GARP FRM Part I Quantitative Analysis, Chapter 1. READING 12 FUNDAMENTALS OF PROBABILITY Study Session 4 EXAM FOCUS This reading covers important terms and concepts associated with probability theory. Speci ically, we will examine the difference between independent and mutually exclusive events, discrete probability functions, and the difference between unconditional and conditional probabilities. Bayes’ rule is also examined as a way to update a given set of prior probabilities. For the exam, be able to calculate conditional probabilities, joint probabilities, and probabilities based on a probability function. Also, understand when and how to apply Bayes’ formula. MODULE 12.1: BASICS OF PROBABILITY When an outcome is unknown, such as the outcome (realization) of the lip of a coin or the high temperature tomorrow in Dubai, we refer to it as a random variable. We can describe a random variable with the probabilities of its possible outcomes. For the lip of a fair coin, we refer to the probability of heads as P(heads), which is 50%. We can think of a probability as the likelihood that an outcome will occur. If we lip a fair coin 100 times, we expect that on average it will be heads 50 times. A probability equal to 0 for an outcome means that the outcome will not happen. A probability equal to 1 for an outcome means it will happen with certainty. Probabilities cannot be less than 0 or greater than 1. The probability that a random variable will have a speci ic outcome, given that some other outcome has occurred, is referred to as a conditional probability. The probability that A will occur, given that B has occurred, is written as P(A | B). For example, the probability that a day’s high temperature in Seattle will be between 70 and 80 degrees is an unconditional probability (i.e., marginal probability). The probability that the high temperature will be between 70 and 80 degrees, given that the sky is cloudy that day, is a conditional probability. The probability that both A and B will occur is written P(AB) and referred to as the joint probability of A and B (both occurring). Events and Event Spaces LO 12.a: Describe an event and an event space. An event is a single outcome or a combination of outcomes for a random variable. Consider a random variable that is the result of rolling a fair six-sided die. The outcomes with positive probability (those that may happen) are the integers 1, 2, 3, 4, 5, and 6. For the event x = 3, we can write P(3) = 1/6 = 16.7%. Other possible events include getting a 3 or 4, P(3 or 4) = 2/6 = 33.3%, and getting an even number, P(x is even) = P(x = 2, 4, or 6) = 3/6 = 50%. The probability that the realization of this random variable is equal to one of the possible outcomes (x = 1, 2, 3, 4, 5, or 6) is 100%. The event space for a random variable is the set of all possible outcomes and combinations of outcomes. Consider a lip of a fair coin. The event space is heads, tails, heads and tails, and neither heads nor tails. P(heads) and P(tails) are both 50%. The probability of both heads and tails is zero, as is the probability of neither heads nor tails. PROFESSOR’S NOTE The notation P(A ∪ B) is sometimes used to mean the probability of A or B, and the notation P(A ∩ B) is sometimes used to mean the probability of A and B. Independent and Mutually Exclusive Events LO 12.b: Describe independent events and mutually exclusive events. Two events are independent events if knowing the outcome of one does not affect the probability of the other. When two events are independent, the following two probability relationships must hold: 1. P(A) × P(B) = P(AB). The probability that both A and B will happen is the product of their unconditional probabilities. 2. P(A | B) = P(A). The conditional probability of A given that B occurs is simply the unconditional probability of A occurring. This means B occurring does not change the probability of A. Consider lipping a coin twice. Getting heads on the irst lip does not change the probability of getting heads on the second lip. The two events are independent. In this case, the joint probability of getting heads on both lips is simply the product of their unconditional expectations. Given that the probability of getting heads is 50%, the probability of getting heads on two lips in a row is 0.5 × 0.5 = 25%. If A1 , A2 , …, An are independent events, their joint probability P(A1 and A2 … and An) is equal to P(A1) × P(A2) × … × P(An). Two events are mutually exclusive events if they cannot both happen. Consider the possible outcomes of one roll of a die. The events “x = an even number” and “x = 3” are mutually exclusive; they cannot both happen on the same roll. In general, P(A or B) = P(A) + P(B) − P(AB). We must subtract the probability of both A and B happening to avoid counting those outcomes twice. If the probability that one stock will rise tomorrow, P(A), is 60% and the probability that another stock will rise tomorrow, P(B), is 55%, we cannot calculate the probability that both will rise tomorrow as 60% + 55% = 115%. We must subtract the joint probability that both stocks will rise to get P(A or B). When events A and B are mutually exclusive, P(AB) is zero, so P(A or B) is simply P(A) + P(B). Conditionally Independent Events LO 12.c: Explain the difference between independent events and conditionally independent events. Two conditional probabilities, P(A | C) and P(B | C), may be independent or dependent regardless of whether the unconditional probabilities, P(A) and P(B), are independent or not. When two events are conditionally independent events, P(A | C) × P(B | C) = P(AB | C). Consider Event A, “scores above average on an exam,” and Event B, “is taller than average.” For a population of grade school students, these events may not be independent, as taller students are older on average and likely in a higher grade. Taller students may well do better on a given exam than shorter (younger) students. If we add the conditioning Event C “age equals 8,” we may ind that height and exam scores are independent, that is, P(A | C) and P(B | C) are independent while P(A) and P(B) are not. MODULE QUIZ 12.1 1. For the roll of a fair six-sided die, how many of the following are classified as events? The outcome is 3. The outcome is an even number. The outcome is not 2, 3, 4, 5, or 6. A. One. B. Two. C. Three. D. None. 2. Which of the following equalities does not imply that the events A and B are independent? A. P(AB) = P(A) × P(B). B. P(A or B) = P(A) + P(B) – P(AB). C. P(A | B) = P(A). D. P(AB) / P(B) = P(A). 3. Two independent events: A. must be conditionally independent. B. cannot be conditionally independent. C. may be conditionally independent or not conditionally independent. D. are conditionally independent only if they are mutually exclusive events. MODULE 12.2: CONDITIONAL, UNCONDITIONAL, AND JOINT PROBABILITIES Discrete Probability Function LO 12.d: Calculate the probability of an event for a discrete probability function. A discrete probability function is one for which there are a inite number of possible outcomes. The probability function gives us the probability of each possible outcome. Consider a random variable for which the possible outcomes are x = 1, 2, 3, or 4, with a probability function of x/10 so that P(x) = x/10. The probability of an outcome of 3 is 3/10 = 30%. The probability of an outcome of either 2 or 4 is 2/10 + 4/10 = 60%. This function quali ies as a probability function because the probability of getting one of the possible outcomes is 1/10 + 2/10 + 3/10 + 4/10 = 10/10 = 100%. Conditional and Unconditional Probabilities LO 12.e: De ine and calculate a conditional probability. LO 12.f: Distinguish between conditional and unconditional probabilities. Sometimes we are interested in the probability of an event, given that some other event has occurred. As mentioned earlier, we refer to this as a conditional probability, P(A | B). Consider conditional probabilities that an employee at Acme, Inc., earns more than $40,000 per year, P(40+), conditioned on the highest level of education an employee has attained. Employees fall into one of three education levels: no degree (ND), bachelor’s degree (BD), and higher-than-bachelor’s degree (HBD). If 60% of the employees have no degree, 30% of the employees have attained only a bachelor’s degree, and 10% have attained a higher degree, we write P(ND) = 60%, P(BD) = 30%, and P(HBD) = 10%. Note that the three levels of education attainment are mutually exclusive; an employee can only be in one of the three categories of educational attainment. Note also that the three categories are also exhaustive; the categories cover all the possible levels of educational attainment. We can write this as P(ND or BD or HBD) = 100%. Given a conditional probability and the unconditional probability of the conditioning event, we can calculate the joint probability of both events using P(AB) = P(A | B) × P(B). Assume that for Acme, 10% of the employees with no degree, 70% of the employees with only a bachelor’s degree, and 100% of employees with a degree beyond a bachelor’s degree earn more than $40,000 per year. That is, P(40+ | ND) = 10%, P(40+ | BD) = 70%, and P(40+ | HBD) = 100%. Using these conditional probabilities, along with the unconditional probabilities P(ND) = 60%, P(BD) = 30%, and P(HBD) = 10%, we can calculate the joint probabilities: P(40+ and ND) = 10% × 60% = 6% P(40+ and BD) = 70% × 30% = 21% P(40+ and HBD) = 100% × 10% = 10%. We can use these probabilities to illustrate the total probability rule, which states that if the conditioning events Bi are mutually exclusive and exhaustive then: P(A) = P(A | B1)P(B1) + P(A | B2)P(B2) + …. + P(A | Bn) P(Bn) This is the sum of the joint probabilities. For Acme, we have P(40+) = 6% + 21% + 10% = 37% of the employees earn more than $40,000 per year. Rearranging P(AB) = P(A | B) × P(B), we get: That is, we can calculate a conditional probability from the joint probability of two events and the unconditional probability of the conditioning event. As an example, the conditional probability is P(40+ | BD) is: Bayes’ Rule LO 12.g: Explain and apply Bayes’ rule. Bayes’ rule allows us to use information about the outcome of one event to improve our estimates of the unconditional probability of another event. From our rules of probability, we know that P(A | B) × P(B) = P(AB) and that P(B | A) × P(A) = P(AB), so we can write P(A | B) × P(B) = P(B | A) × P(A). Rearranging these terms, we can arrive at Bayes’ rule: Given the unconditional probabilities of A and B and the conditional probability of B given A, we can calculate the conditional probability of A given B. The following example illustrates the use of Bayes’ rule and provides some intuition about what this formula is telling us. EXAMPLE: Bayes’ formula There is a 60% probability the economy will outperform, and if it does, there is a 70% probability a stock will go up and a 30% probability the stock will go down. There is a 40% probability the economy will underperform, and if it does, there is a 20% probability the stock in question will increase in value (have gains) and an 80% probability it will not. Given that the stock increased in value, calculate the probability that the economy outperformed. Answer: In the earlier igure, we have multiplied the probabilities to calculate the probabilities of each of the four outcome pairs. Note that these sum to 1. Given that the stock has gains, what is our updated probability of an outperforming economy? We sum the probability of stock gains in both states (outperform and underperform) to get 42% + 8% = 50%. Given that the stock has gains, the probability that the economy has outperformed is: The numerator for the calculation of the updated probability P(A | B) using Bayes’ formula in the example is the joint probability of outperform and gains. This is calculated as P(gains | outperform) × P(outperform) (i.e., 0.7 × 0.6 = 0.42). The denominator is the unconditional probability of gains, P(gains | outperform) + P(gains | underperform) (i.e., 0.42 + 0.08 = 0.50). EXAMPLE: Probability concepts and relationships A shipment of 1,000 cars has been unloaded into a parking area. The cars have the following features: There are 600 blue (B) cars. Of the blue cars, 150 have driver assist (DA) technology. There are 400 red (R) cars. Of the red cars, 200 have DA technology. Given these facts, calculate the following: 1. Unconditional probabilities: P(B) and P(R) 2. Conditional probabilities: P(DA | B) and P(DA | R) 3. Joint probabilities: P(B and DA) and P(R and DA) 4. Total probability rule: P(DA) 5. Bayes’ rule: P(B | DA) Answer: Unconditional probabilities: P(B) = 600/1,000 = 60% P(R) = 400/1,000 = 40% Conditional probabilities: P(DA | B) = 150/600 = 25% P(DA | R) = 200/400 = 50% Joint probabilities: P(B and DA) = P(DA | B)P(B) = 25%(60%) = 15%; 15%(1,000) = 150 of the cars are blue with driver assist P(R and DA) = P(DA | R)P(R) = 50%(40%) = 20%; 20%(1,000) = 200 of the cars are red with driver assist Total probability rule: P(DA) = P(DA | B)P(B) + P(DA | R)P(R) = 25%(60%) + 50%(40%) = 35%; 35% (1,000) = 350 of the cars have driver assist Bayes’ rule: P(B | DA) = P(B and DA)/P(DA) = 15%/35% = 42.9%; 350 cars have driver assist and of those cars, 150 are blue: 150/350 = 0.42857 = 42.9% Independence: Now, assume we add to our information that 40% of the blue cars (240) are convertibles and 40% of the red cars (160) are convertibles, so that 400 of the cars are convertibles. In this case, P(B | C) = 240/400 = 60% = P(B) and P(R | C) = 160/400 = 40% = P(R). This meets the requirement for independence that P(A | B) = P(A). The fact that a car chosen at random is a convertible gives us no additional information about whether a car is blue or red. MODULE QUIZ 12.2 1. The probability function for the outcome of one roll of a six-sided die is given as P(X) = x/21. What is P(x > 4)? A. 16.6%. B. 23.8%. C. 33.3%. D. 52.4%. 2. The relationship between the probability that both Event A and Event B will occur and the conditional probability of Event A given that Event B occurs is: 3. The probability that shares of Acme will increase in value over the next month is 50% and the probability that shares of Acme and shares of Best will both increase in value over the next month is 40%. The probability that Best shares will increase in value, given that Acme shares increase in value over the next month, is closest to: A. 20%. B. 40%. C. 80%. D. 90%. KEY CONCEPTS LO 12.a An event is one of the possible outcomes or a subset of the possible outcomes of a random event, such as the lip of a coin. The event space is all the subsets of possible outcomes and the empty set (none of the possible outcomes). LO 12.b Two events are independent if either of the following conditions hold: P(A) × P(B) = P(AB) P(A | B) = P(A) Two events are mutually exclusive if the joint probability, P(AB) = 0 (i.e., both cannot occur). When two events are mutually exclusive, P(A or B) = P(A) + P(B). LO 12.c If two events conditional on a third event are independent, we say they are conditionally independent. For example, if P(AB | C) = P(A | C) P(B | C), then A and B are conditionally independent. Two events may be independent but conditionally dependent, or vice versa. LO 12.d A probability function describes the probability for each possible outcome for a discrete probability distribution. For example, P(x) = x/25, de ined over the outcomes {1, 2, 3, 4, 5}. LO 12.e The joint probability of two events, P(AB), is the probability that they will both occur: P(AB) = P(A | B) × P(B). This relationship can be rearranged to de ine the conditional probability of A given B as follows: LO 12.f An unconditional probability (i.e., marginal probability) is the probability of an event occurring. A conditional probability, P(A | B), is the probability of an Event A occurring given that Event B has occurred. LO 12.g Bayes’ rule is: This formula allows us to update the unconditional probability, P(A), based on the fact that B has occurred. P(AB) can be calculated as P(B | A)P(A). ANSWER KEY FOR MODULE QUIZZES Module Quiz 12.1 1. C All of the outcomes and combinations speci ied are included in the event space for the random variable. (LO 12.a) 2. B P(A or B) = P(A) + P(B) – P(AB) holds for both independent and dependent events. The other equalities are only true for independent events. (LO 12.b) 3. C Two independent events may be conditionally independent or not conditionally independent. (LO 12.c) Module Quiz 12.2 1. D The probability of x > 4 is the probability of an outcome of 5 or 6 (5/21 + 6/21 = 52.4%). (LO 12.d) 2. A The (joint) probability that both A and B will occur is equal to the conditional probability of Event A given that Event B has occurred, multiplied by the unconditional probability of Event B. (LO 12.e) 3. C Bayes’ formula tells us that: Applying that to the information given, we can write: 40%/50% = 80% (LO 12.g) The following is a review of the Quantitative Analysis principles designed to address the learning objectives set forth by GARP®. Cross-reference to GARP FRM Part I Quantitative Analysis, Chapter 2. READING 13 RANDOM VARIABLES Study Session 4 EXAM FOCUS This reading addresses the concepts of expected value, variance, skewness, and kurtosis. The characteristics and calculations of these measures will be discussed. For the exam, be able to distinguish among a probability mass function, a cumulative distribution function, and a probability density function. Also, be able to compute expected value, and be able to identify the four common population moments of a statistical distribution. MODULE 13.1: PROBABILITY MASS FUNCTIONS, CUMULATIVE DISTRIBUTION FUNCTIONS, AND EXPECTED VALUES Random Variables and Probability Functions LO 13.a: Describe and distinguish a probability mass function from a cumulative distribution function, and explain the relationship between these two. A discrete random variable is one that can take on only a countable number of possible outcomes. It can take on only two possible values, zero and one, and is referred to as a Bernoulli random variable. We can model the outcome of a coin lip as a Bernoulli random variable where heads = 1 and tails = 0. The number of days in June that will have a temperature greater than 70 degrees is also a discrete random variable. The possible outcomes are the integers from 0 to 30. A continuous random variable has an uncountable number of possible outcomes. The amount of rainfall that will fall in June is an example of a continuous random variable. There are an in inite number of possible outcomes because for any two values (e.g., 6.95 inches and 6.94 inches), we can ind a number between them [e.g., (6.95 + 6.94) / 2 = 6.945]. Because there are an in inite number of possible outcomes, the probability of any single value is zero. For continuous random variables, we measure probabilities only over some positive interval, (e.g., the probability that rainfall in June will be between 6.94 and 6.95 inches). A probability mass function (PMF), f (x) = P(X = x), gives us the probability that the outcome of a discrete random variable, X, will be equal to a given number, x. For a Bernoulli random variable for which the P(x = 1) = p, the PMF is f (x) = px (1 − p)1−x. This yields P(x = 1) = p and P(x = 0) = 1 − p. A second example of a PMF is f(x) = 1/6, which is the probability that one roll of a six- sided die will take on one of the possible outcomes one through six. Each of the possible outcomes has the same probability of occurring (1/6 = 16.67%). A third example is the PMF f (x) = x/10 for a random variable that can take on values of 1, 2, 3, or 4. For example, P(x = 3) = f (3) = 3/10 = 30%. For all of these PMFs, the sum of the probabilities of all of the possible outcomes is 100%, a requirement for a PMF. A cumulative distribution function (CDF) gives us the probability that a random variable will take on a value less than or equal to x [i.e., F(x) = P(X ≤ x)]. For a Bernoulli random variable with possible outcomes of zero and one, the CDF is: While the PMF for this Bernoulli variable is de ined only for X = 0 or 1, the corresponding CDF is de ined for all real numbers. For example, P(X < 0.1456) = F(0.1456) = 1 − p. For the roll of a six-sided die, the CDF is F(x) = x/6, so that the probability of a roll of 3 or less is F(3) = 3/6 = 50. This illustrates an important relationship between a PMF and its corresponding CDF; the probability of an outcome less than or equal to x is simply the sum of the probabilities of all the possible outcomes less than or equal to x. For the roll of a six-sided die. F(3) = f (1) + f (2) + f (3) = 1/6 + 1/6 + 1/6 = 3/6 = 50%. Expectations LO 13.b: Understand and apply the concept of a mathematical expectation of a random variable. The expected value is the weighted average of the possible outcomes of a random variable, where the weights are the probabilities that the outcomes will occur. The mathematical representation for the expected value of random variable X is: Here, E is referred to as the expectations operator and is used to indicate the computation of a probability-weighted average. The symbol x1 represents the irst observed value (observation) for random variable X; x2 is the second observation, and so on through the nth observation. The concept of expected value may be demonstrated using probabilities associated with a coin toss. On the lip of one coin, the occurrence of the event “heads” may be used to assign the value of one to a random variable. Alternatively, the event “tails” means the random variable equals zero. Statistically, we would formally write the following: if heads, then X = 1 if tails, then X = 0 For a fair coin, P(heads) = P(X = 1) = 0.5, and P(tails) = P(X = 0) = 0.5. The expected value can be computed as follows: In any individual lip of a coin, X cannot assume a value of 0.5. Over the long term, however, the average of all the outcomes is expected to be 0.5. Similarly, the expected value of the roll of a fair die, where X = number that faces up on the die, is determined to be: E(X) = ΣP(xi)xi = (1/6)(1) + (1/6)(2) + (1/6)(3) + (1/6)(4) + (1/6)(5) + (1/6)(6) E(X) = 3.5 We can never roll a 3.5 on a die, but over the long term, 3.5 should be the average value of all outcomes. The expected value is, statistically speaking, our best guess of the outcome of a random variable. While a 3.5 will never appear when a die is rolled, the average amount by which our guess differs from the actual outcomes is minimized when we use the expected value calculated this way. Note that the probabilities of the outcomes for a coin lip (0 or 1) and the probabilities of the outcomes for the roll of a die are equal for all of the possible outcomes in both cases. When outcomes are equally likely, the expected value is simply the mean (average) of the outcomes: When we estimate the expected value of a random variable based on n observations, we use the mean of the observed values as our estimate of the mean of the underlying probability distribution. In terms of a probability model, we are assuming that the outcomes are equally likely, that is, each has a probability of 1/n. Multiplying each outcome by 1/n and then summing them, produces the same expected value as dividing the sum of the outcomes by n. In other cases, the probabilities of the outcomes are not equal and we calculate the expected value as the weighted sum of the outcomes, where the weights are the probabilities of each outcome. The following example illustrates such a case. EXAMPLE: Expected earnings per share (EPS) The probability distribution of EPS for Ron’s Stores is given in the following igure. Calculate the expected earnings per share. EPS Probability Distribution Answer: The expected EPS is simply a weighted average of each possible EPS, where the weights are the probabilities of each possible outcome. E(EPS) = 0.10(1.80) + 0.20(1.60) + 0.40(1.20) + 0.30(1.00) = £1.28 The following are two useful properties of expected values: 1. If c is any constant, then: E(cX) = cE(X) 2. If X and Y are any random variables, then: E(X + Y) = E(X) + E(Y) MODULE QUIZ 13.1 1. The probability mass function (PMF) for a discrete random variable that can take on the values 1, 2, 3, 4, or 5 is P(X = x) = x/15. The value of the cumulative distribution function (CDF) of 4, F(4), is equal to: A. 26.7%. B. 40.0%. C. 66.7%. D. 75.0%. 2. An analyst has estimated the following probabilities for gross domestic product growth next year: P(4%) = 10%, P(3%) = 30%, P(2%) = 40%, P(1%) = 20% Based on these estimates, the expected value of GDP growth next year is: A. 2.0%. B. 2.3%. C. 2.5%. D. 2.8%. MODULE 13.2: MEAN, VARIANCE, SKEWNESS, AND KURTOSIS LO 13.c: Describe the four common population moments. The population moments most often used are mean; variance; skewness; and kurtosis. The irst moment, the mean of a random variable, is its expected value, E(X), which we discussed previously. The mean can be represented by the Greek letter µ (mu). The other three moments are central moments because the functions involve the random variable minus its mean, X − µ. Subtracting the mean produces functions that are unaffected by the location of the mean. These moments give us information about the shape of a probability distribution around its mean. PROFESSOR’S NOTE Since central moments are measured relative to the mean, the irst central moment equals zero and is, therefore, not typically used. The second central moment of a random variable is its variance, σ2. Variance is de ined as: Squaring the deviations from the mean ensures that σ2 is positive. Variance gives us information about how widely dispersed the values of the random variable are around the mean. We often use the square root of variance, σ, as a measure of dispersion because it has the same units as the random variable. If our distribution is for percentage rates of return, the standard deviation is also measured in terms of percentage returns. The third central moment of a distribution is: E{[X − E(X)]3} = E[(X − µ)3] Skewness, a measure of a distribution’s symmetry, is the standardized third moment. We standardize it by dividing it by the standard deviation cubed. Because we both subtract the mean and divide by standard deviation cubed, skewness is unaffected by differences in the mean or in the variance of the random variable. This allows us to compare skewness of two different distributions directly. A distribution with skew = 0 is perfectly symmetric. The fourth central moment of a distribution is: Kurtosis is the standardized fourth moment. Kurtosis is a measure of the shape of a distribution, in particular the total probability in the tails of the distribution relative to the probability in the rest of the distribution. The higher the kurtosis, the greater the probability in the tails of the distribution. We sometimes refer to distributions with high kurtosis as fat-tailed distributions. The following igures illustrate the concepts of skewness and kurtosis for a probability distribution. Figure 13.1: Skewness Figure 13.2: Kurtosis MODULE QUIZ 13.2 1. For two financial securities with distributions of returns that differ only in their kurtosis, the one with the higher kurtosis will have: A. a wider dispersion of returns around the mean. B. a greater probability of extreme positive and negative returns. C. less peaked distribution of returns. D. a more uniform distribution. MODULE 13.3: PROBABILITY DENSITY FUNCTIONS, QUANTILES, AND LINEAR TRANSFORMATIONS Probability Density Functions LO 13.d: Explain the differences between a probability mass function and a probability density function. Recall that we used a PMF to describe the probabilities of the possible outcomes for a discrete random variable. A simple example is P(X = x) = f (x) = x/10 for the possible outcomes 1, 2, 3, and 4. The PMF tells us the probability of each of those possible outcomes, P(X = 4) = 4/10 = 40%. Recall that a continuous random variable can take on any of an in inite number of possible outcomes so that the probability of any single outcome is zero. We describe a continuous distribution function with a probability density function (PDF), rather than a PMF. A PDF allows us to calculate the probability of an outcome between two values (over an interval). This probability is the area under the PDF over the interval. Mathematically, we take the integral of the PDF over an interval to calculate the probability that the random variable will take on a value in that interval. Quantile Functions LO 13.e: Characterize the quantile function and quantile-based estimators. The quantile function, Q(a), is the inverse of the CDF. Recall that a CDF gives us the probability that a random variable will be less than or equal to some value X = x. The interpretation of the CDF is the same for discrete and continuous random variables. Consider a CDF that gives us a probability of 30% that a continuous random variable takes on values less than 2 [i.e., P(X < 2) = F(2) = 30%]. The quantile function, Q(30%), for this distribution would return the value 2; 30% of the outcomes are expected to be less than 2. A common use of quantiles is to report the results of standardized tests. Consider a student with a score of 122 on an exam. If the student’s quantile score is 74%, this indicates that that the student’s score of 122 was higher than 74% of those who took the test. The quantile function, Q(74%), would return the student’s score of 122. Two quantile measures are of particular interest to us here. One is the value of the quantile function for 50%. This is termed the median of the distribution. On average, 50% of the variable’s outcomes will be below the median and 50% of the variable’s outcomes will be above the median. For a symmetric distribution (skew = 0), the mean and median will be equal. For a distribution with positive (right) skew, the median will be less than the mean, but will be greater than the mean for distributions with negative (left) skew. The second quantile measure of interest here is the interquartile range (IQR). The interquartile range is the upper and lower value of the outcomes of a random variable that include the middle 50% of its probability distribution. The lower value is Q(25%) and the upper value is Q(75%). The lower value is the value that we expect 25% of the outcomes to be less than, and the upper value is the value that we expect 75% of the values to be less than. Like standard deviation, the interquartile range is a measure of the variability of a random variable. Compared to a given distribution, the outcomes of a distribution with a lower interquartile range are more concentrated around the mean, just as they are for a distribution with a lower standard deviation. Linear Transformations of Random Variables LO 13.f: Explain the effect of a linear transformation of a random variable on the mean, variance, standard deviation, skewness, kurtosis, median, and interquartile range. A linear transformation of a random variable, X, takes the form Y = a + bX, where a and b are constants. The constant a shifts the location of the random variable, X, and b rescales the values of X. The relationships between the moments of the distribution of X and the moments of the distribution of Y, a linear transformation of X, are as follows: The mean of Y can be calculated as E(Y) = a + bE(X), both the location and the scale are affected. The variance of Y can be calculated as ; while a shifts the location of the distribution, it does not affect the dispersion around the mean which is rescaled by b. The standard deviation of Y is simply. With b > 0 (an increasing transformation), the skew is unaffected, skew Y = skew X. With b < 0 (a decreasing transformation), the magnitude of the skew is unaffected, but the sign is changed, skew Y = −skew X. A linear transformation of X does not affect kurtosis, kurtosis Y = kurtosis X. MODULE QUIZ 13.3 1. Which of the following regarding a probability density function (PDF) is correct? A PDF: A. provides the probability of each of the possible outcomes of a random variable. B. can provide the same information as a cumulative distribution function (CDF). C. describes the probabilities for any random variable. D. only applies to a discrete probability distribution. 2. For the quantile function, Q(x): A. the CDF function F[Q(23%)] = 23%. B. Q(23%) will identify the largest 23% of all possible outcomes. C. Q(50%) is the interquartile range. D. x can only take on integer values. 3. For a random variable, X, the variance of Y = a + bX is: KEY CONCEPTS LO 13.a A probability mass function (PMF), f (x), gives us the probability that a discrete random variable will take on the value x. A cumulative distribution function (CDF), F(x), gives us the probability that a random variable X will take on a value less than or equal to x. LO 13.b The expected value of a discrete random variable is the probability-weighted average of the possible outcomes (i.e., the mean of the distribution). LO 13.c Four commonly used moments of a random variable are its mean, variance (standard deviation), skewness, and kurtosis. The mean is the expected value of the random variable, variance is a measure of dispersion, skewness is a measure of symmetry, and kurtosis is a measure of the proportion of the outcomes in the tails of the distribution. LO 13.d A PMF provides the probability that a discrete random variable will take on a given value. A PDF provides the probability that the outcome for a continuous random variable will be within a given interval. LO 13.e A quantile is the percentage of outcomes less than a given outcome. A quantile function, Q(x%), provides the value of an outcome which is greater than x% of all possible outcomes. Q(50%) is the median of a distribution. 50% of the outcomes are greater than the median and 50% of the outcomes are less than the median. The interquartile range is an interval that includes the central 50% of all possible outcomes. LO 13.f For a variable Y = a + bX (a linear transformation of X): the mean of Y is E(Y) = a + bE(X); the variance of Y is the skew of Y = skew X, for b > 0, and skew Y = –skew X for b < 0; and the kurtosis of Y = kurtosis X. ANSWER KEY FOR MODULE QUIZZES Module Quiz 13.1 1. C F(4) is the probability that the random variable will take on a value of 4 or less. We can calculate P(X ≤ 4) as 1/15 + 2/15 + 3/15 + 4/15 = 66.7%, or by subtracting 5/15, P(X = 5), from 100% to get 66.7%. (LO 13.a) 2. B The expected value is computed as: (4)(10%) + (3)(30%) + (2)(40%) + (1)(20%) = 2.3%. (LO 13.b) Module Quiz 13.2 1. B High kurtosis indicates that the probability in the tails (extreme outcomes) are greater (i.e., the distribution will have fatter tails). (LO 13.c) Module Quiz 13.3 1. B A PDF evaluated between minus in inity and a given value gives the probability of an outcome less than the given value; the same information is provided by a CDF. A PDF provides the probabilities only for a continuous random variable. The probability that a continuous random variable will take on a given value is zero. (LO 13.d) 2. A Q(23%) gives us a value that is greater than 23% of all outcomes and the CDF for that value is the probability of an outcome less than that value (i.e., 23%). (LO 13.e) 3. C The variance of Y is , where is the variance of X. (LO 13.f) The following is a review of the Quantitative Analysis principles designed to address the learning objectives set forth by GARP®. Cross-reference to GARP FRM Part I Quantitative Analysis, Chapter 3. READING 14 COMMON UNIVARIATE RANDOM VARIABLES Study Session 4 EXAM FOCUS This reading explores the following common probability distributions: uniform, Bernoulli, binomial, Poisson, normal, lognormal, chi-squared, Student’s t-, F-, exponential, and beta. You will learn the properties, parameters, and common occurrences of these distributions. For the exam, focus most of your attention on the binomial, normal, and Student’s t-distributions. Also, know how to standardize a normally distributed random variable, how to use a z-table, and how to construct con idence intervals. LO 14.a: Distinguish the key properties and identify the common occurrences of the following distributions: uniform distribution, Bernoulli distribution, binomial distribution, Poisson distribution, normal distribution, lognormal distribution, Chi-squared distribution, Student’s t and F-distributions. MODULE 14.1: UNIFORM, BERNOULLI, BINOMIAL, AND POISSON DISTRIBUTIONS The Uniform Distribution The continuous uniform distribution is de ined over a range that spans between some lower limit, a, and some upper limit, b, which serve as the parameters of the distribution. Outcomes can only occur between a and b, and because we are dealing with a continuous distribution, even if a < x < b, P(X = x) = 0. Formally, the properties of a continuous uniform distribution may be described as follows. For all a ≤ x1 < x2 ≤ b (i.e., for all x1 and x2 between the boundaries a and b): P(X < a or X > b) = 0 (i.e., the probability of X outside the boundaries is zero). P(x1 ≤ X ≤ x2) = (x2 − x1) / (b − a). This de ines the probability of outcomes between x1 and x2. Don’t miss how simple this is just because the notation is so mathematical. For a continuous uniform distribution, the probability of outcomes in a range that is one-half the whole range is 50%. The probability of outcomes in a range that is one-quarter of the possible range is 25%. EXAMPLE: Continuous uniform distribution X is uniformly distributed between 2 and 12. Calculate the probability that X will be between 4 and 8. Answer: The following igure illustrates this continuous uniform distribution. Note that the area bounded by 4 and 8 is 40% of the total probability between 2 and 12 (which is 100%). Continuous Uniform Distribution The cumulative distribution function (CDF) is linear over the variable’s range. The CDF for the distribution in the previous example, P(X < x), is shown in Figure 14.1. Figure 14.1: CDF for a Continuous Uniform Variable The probability density function (PDF) for a continuous uniform distribution is expressed as: The mean and variance, respectively, of a uniform distribution are: The Bernoulli Distribution A Bernoulli random variable only has two possible outcomes. The outcomes can be de ined as either a success or a failure. The probability of success, p, may be denoted with the value 1 and the probability of failure, 1 − p, may be denoted with the value 0. Bernoulli distributed random variables are commonly used for assessing the probability of binary outcomes, such as the probability that a irm will default on its debt over some interval. For a Bernoulli random variable for which the P(x = 1) = p, the probability mass function is f (x) = px (1 − p) 1−x. This yields P(x = 1) = p and P(x = 0) = 1 − p. For a Bernoulli random variable, µx = p and the variance is given by Var(X) = p(1 − p). Note that the variance is low for values of p close to 1 or 0, and the maximum variance is for p = 0.5. For a Bernoulli random variable with possible outcomes 0 and 1, the CDF is: Note that while the probability mass function (PMF) for this Bernoulli variable is de ined only for X = 0 or 1, the corresponding CDF is de ined for all real numbers. The Binomial Distribution A binomial random variable may be de ined as the number of successes in a given number of Bernoulli trials, whereby the outcome can be either success or failure. The probability of success, p, is constant for each trial and the trials are independent. Under these conditions, the binomial probability function de ines the probability of exactly x successes in n trials. It can be expressed using the following formula: p(x) = P(X = x) = (number of ways to choose x from n)px(1 − p)n−x where: So, the probability of exactly x successes in n trials is: EXAMPLE: Binomial probability Assuming a binomial distribution, compute the probability of drawing three black beans from a bowl of black and white beans if the probability of selecting a black bean in any given attempt is 0.6. You will draw ive beans from the bowl. Answer: Some intuition about these results may help you remember the calculations. Consider that a (very large) bowl of black and white beans has 60% black beans and each time you select a bean, you replace it in the bowl before drawing again. We want to know the probability of selecting exactly three black beans in ive draws, as in the previous example. One way this might happen is BBBWW. Because the draws are independent, the probability of this is easy to calculate. The probability of drawing a black bean is 60%, and the probability of drawing a white bean is 1 − 60% = 40%. Therefore, the probability of selecting BBBWW, in order, is 0.6 × 0.6 × 0.6 × 0.4 × 0.4 = 3.456%. This is the p3(1 − p)2 from the formula and p is 60%, the probability of selecting a black bean on any single draw from the bowl. BBBWW is not, however, the only way to choose exactly three black beans in ive trials. Another possibility is BBWWB, and a third is BWWBB. Each of these will have exactly the same probability of occurring as our initial outcome, BBBWW. That’s why we need to answer the question of how many ways (different orders) there are for us to choose three black beans in ive draws. Using the formula, there are ways; 10 × 3.456% = 34.56%, the answer we computed previously. For a given series of n trials, the expected number of successes, or E(X), is given by the following formula: expected value of X = E(X) = np The intuition is straightforward; if we perform n trials and the probability of success on each trial is p, we expect np successes. The variance of a binomial random variable is given by: variance of X = np(1 − p) EXAMPLE: Expected value of a binomial random variable Based on empirical data, the probability that the Dow Jones Industrial Average (DJIA) will increase on any given day has been determined to equal 0.67. Assuming the only other outcome is that it decreases, we can state p(UP) = 0.67 and p(DOWN) = 0.33. Further, assume that movements in the DJIA are independent (i.e., an increase in one day is independent of what happened on another day). Using the information provided, compute the expected value of the number of up days in a ive-day period. Answer: Using binomial terminology, we de ine success as UP, so p = 0.67. Note that the de inition of success is critical to any binomial problem. E(X | n = 5, p = 0.67) = (5)(0.67) = 3.35 Recall that the “ | ” symbol means given. Hence, the preceding statement is read as: the expected value of X given that n = 5, and the probability of success = 67% is 3.35. Using the equation for the variance of a binomial distribution, we ind the variance of X to be: Var(X) = np(1 − p) = 5(0.67)(0.33) = 1.106 We should note that because the binomial distribution is a discrete distribution, the result X = 3.35 is not possible. However, if we were to record the results of many ive- day periods, the average number of up days (successes) would converge to 3.35. Binomial distributions are used extensively in the investment world where outcomes are typically seen as successes or failures. In general, if the price of a security goes up, it is viewed as a success. If the price of a security goes down, it is a failure. In this context, binomial distributions are often used to create models to aid in the process of asset valuation. PROFESSOR’S NOTE We will examine binomial trees for stock option valuation in Book 4. The Poisson Distribution The Poisson distribution is a discrete probability distribution with a number of real- world applications. For example, the number of defects per batch in a production process or the number of 911 calls per hour are discrete random variables that follow a Poisson distribution. While the Poisson random variable X refers to the number of successes per unit, the parameter lambda (λ) refers to the average or expected number of successes per unit. The mathematical expression for the Poisson distribution for obtaining X successes, given that λ successes are expected, is: An interesting feature of the Poisson distribution is that both its mean and variance are equal to the parameter, λ. EXAMPLE: Using the Poisson distribution (1) On average, the 911 emergency switchboards receive 0.1 incoming calls per second. Assuming the arrival of calls follows a Poisson distribution, what is the probability that in a given minute exactly 5.0 phone calls will be received? Answer: We irst need to convert the seconds into minutes. Note that λ, the expected number of calls per minute, is (0.1)(60) = 6.0. Hence: This means that, given the average of 0.1 incoming calls per second, there is a 16.06% chance there will be ive incoming phone calls in a minute. EXAMPLE: Using the Poisson distribution (2) Assume there is a 0.01 probability of a patient experiencing severe weight loss as a side effect from taking a recently approved drug used to treat heart disease. What is the probability that out of 200 such procedures conducted on different patients, ive patients will develop this complication? Assume that the number of patients developing the complication from the procedure is Poisson distributed. Answer: Let X = expected number of patients developing the complication from the procedure = np = (200)(0.01) = 2 This means that given a complication rate of 0.01, there is a 3.6% probability that 5 out of every 200 patients will experience severe weight loss from taking the drug. MODULE QUIZ 14.1 1. If 5% of the cars coming off the assembly line have some defect in them, what is the probability that out of three cars chosen at random, exactly one car will be defective? Assume that the number of defective cars has a Poisson distribution. A. 0.129. B. 0.135. C. 0.151. D. 0.174. 2. A recent study indicated that 60% of all businesses have a web page. Assuming a binomial probability distribution, what is the probability that exactly four businesses will have a web page in a random sample of six businesses? A. 0.138. B. 0.276. C. 0.311. D. 0.324. 3. What is the probability of an outcome being between 15 and 25 for a random variable that follows a continuous uniform distribution within the range of 12 to 28? A. 0.509. B. 0.625. C. 1.000. D. 1.600. MODULE 14.2: NORMAL AND LOGNORMAL DISTRIBUTIONS The Normal Distribution The normal distribution is important for many reasons. Many of the random variables that are relevant to inance and other professional disciplines follow a normal distribution. In the area of investment and portfolio management, the normal distribution plays a central role in portfolio theory. The PDF for the normal distribution is: The normal distribution has the following key properties: It is completely described by its mean, µ, and variance, σ2, stated as X ~ N(µ, σ2). In words, this says, “X is normally distributed with mean µ and variance σ2.” Skewness = 0, meaning the normal distribution is symmetric about its mean, so that P(X ≤ µ) = P(µ ≤ X) = 0.5, and mean = median = mode. Kurtosis = 3; this is a measure of how the distribution is spread out with an emphasis on the tails of the distribution. Excess kurtosis is measured relative to 3, the kurtosis of the normal distribution. A linear combination of normally distributed independent random variables is also normally distributed. The probabilities of outcomes further above and below the mean get smaller and smaller but do not go to zero (the tails get very thin but extend in initely). Many of these properties are evident from examining the graph of a normal distribution’s PDF as illustrated in Figure 14.2. Figure 14.2: Normal Distribution PDF A con idence interval is a range of values around the expected outcome within which we expect the actual outcome to be some speci ied percentage of the time. A 95% con idence interval is a range that we expect the random variable to be in 95% of the time. For a normal distribution, this interval is based on the expected value (sometimes called a point estimate) of the random variable and on its variability, which we measure with standard deviation. Con idence intervals for a normal distribution are illustrated in Figure 14.3. For any normally distributed random variable, 68% of the outcomes are within one standard deviation of the expected value (mean), and approximately 95% of the outcomes are within two standard deviations of the expected value. Figure 14.3: Confidence Intervals for a Normal Distribution In practice, we will not know the actual values for the mean and standard deviation of the distribution, but will have estimated them as X and s. The three con idence intervals of most interest are given by the following: The 90% con idence interval for X is X − 1.65s to X + 1.65s. The 95% con idence interval for X is X − 1.96s to X + 1.96s. The 99% con idence interval for X is X − 2.58s to X + 2.58s. EXAMPLE: Con idence intervals The average return of a mutual fund is 10.5% per year and the standard deviation of annual returns is 18%. If returns are approximately normal, what is the 95% con idence interval for the mutual fund return next year? Answer: Here µ and σ are 10.5% and 18%, respectively. Thus, the 95% con idence interval for the return, R, is: 10.5 ± 1.96(18) = −24.78% to 45.78% Symbolically, this result can be expressed as: P(−24.78 < R < 45.78) = 0.95 or 95% The interpretation is that the annual return is expected to be within this interval 95% of the time, or 95 out of 100 years. The standard normal distribution A standard normal distribution (i.e., z-distribution) is a normal distribution that has been standardized so it has a mean of zero and a standard deviation of 1 [i.e., N~(0,1)]. To standardize an observation from a given normal distribution, the z-value of the observation must be calculated. The z-value represents the number of standard deviations a given observation is from the population mean. Standardization is the process of converting an observed value for a random variable to its z-value. The following formula is used to standardize a random variable: PROFESSOR’S NOTE The term z-value will be used for a standardized observation in this reading. The terms z-score and z-statistic are also commonly used. EXAMPLE: Standardizing a random variable (calculating z-values) Assume the annual earnings per share (EPS) for a population of irms are normally distributed with a mean of $6 and a standard deviation of $2. What are the z-values for EPS of $2 and $8? Answer: If EPS = x = $8, then z = (x − µ) / σ = ($8 − $6) / $2 = +1 If EPS = x = $2, then z = (x − µ) / σ = ($2 − $6) / $2 = –2 Here, z = +1 indicates that an EPS of $8 is one standard deviation above the mean, and z = −2 means that an EPS of $2 is two standard deviations below the mean. Calculating probabilities using z-values Now we will show how to use standardized values (z-values) and a table of probabilities for Z to determine probabilities. A portion of a table of the CDF for a standard normal distribution is shown in Figure 14.4. We will refer to this table as the z-table, as it contains values generated using the cumulative density function for a standard normal distribution, denoted by F(Z). Thus, the values in the z-table are the probabilities of observing a z-value that is less than a given value, z [i.e., P(Z < z)]. The numbers in the irst column are z-values that have only one decimal place. The columns to the right supply probabilities for z-values with two decimal places. Note that the z-table in Figure 14.4 only provides probabilities for positive z-values. This is not a problem because we know from the symmetry of the standard normal distribution that F(−Z) = 1 − F(Z). The tables in the back of many texts provide probabilities for negative z-values, but we will work with only the positive portion of the table because this may be all you get on the exam. In Figure 14.4, we can ind the probability that a standard normal random variable will be less than 1.66, for example. The table value is 95.15%. The probability that the random variable will be less than −1.66 is simply 1 − 0.9515 = 0.0485 = 4.85%, which is also the probability that the variable will be greater than +1.66. Figure 14.4: Cumulative Probabilities for a Standard Normal Distribution PROFESSOR’S NOTE When you use the standard normal probabilities, you have formulated the problem in terms of standard deviations from the mean. Consider a security with returns that are approximately normal, an expected return of 10%, and standard deviation of returns of 12%. The probability of returns greater than 30% is calculated based on the number of standard deviations that 30% is above the expected return of 10%. In this case, 30% is 20% above the expected return of 10%, which is 20 / 12 = 1.67 standard deviations above the mean. We look up the probability of returns less than 1.67 standard deviations above the mean (0.9525 or 95.25% from Figure 14.4) and calculate the probability of returns more than 1.67 standard deviations above the mean as 1 − 0.9525 = 4.75%. EXAMPLE: Using the z-table (1) Considering again EPS distributed with µ = $6 and σ = $2, what is the probability that EPS will be $9.70 or more? Answer: Here we want to know P(EPS > $9.70), which is the area under the curve to the right of the z-value corresponding to EPS = $9.70 (see the distribution that follows). The z-value for EPS = $9.70 is: That is, $9.70 is 1.85 standard deviations above the mean EPS value of $6. From the z-table, we have F(1.85) = 0.9678, but this is P(EPS ≤ 9.70). We want P(EPS > 9.70), which is 1 − P(EPS ≤ 9.70). P(EPS > 9.70) = 1 − 0.9678 = 0.0322, or 3.2% P(EPS > $9.70) EXAMPLE: Using the z-table (2) Using the distribution of EPS with µ = $6 and σ = $2 again, what percent of the observed EPS values are likely to be less than $4.10? Answer: As shown graphically in the distribution that follows, we want to know P(EPS < $4.10). This requires a two-step approach like the one taken in the preceding example. First, the corresponding z-value must be determined as follows: So, $4.10 is 0.95 standard deviations below the mean of $6.00. Now, from the z-table for negative values in the back of this book, we ind that F(−0.95) = 0.1711, or 17.11%. Finding a Left-Tail Probability The z-table gives us the probability that the outcome will be more than 0.95 standard deviations below the mean. The Lognormal Distribution The lognormal distribution is generated by the function ex, where x is normally distributed. Because the natural logarithm, ln, of ex is x, the logarithms of lognormally distributed random variables are normally distributed, thus the name. The PDF for the lognormal distribution is: Figure 14.5 illustrates the differences between a normal distribution and a lognormal distribution. Figure 14.5: Normal vs. Lognormal Distributions In Figure 14.5, we can see the following: The lognormal distribution is skewed to the right. The lognormal distribution is bounded from below by zero so that it is useful for modeling asset prices that never take negative values. If we used a normal distribution of returns to model asset prices over time, we would admit the possibility of returns less than –100%, which would admit the possibility of asset prices less than zero. Using a lognormal distribution to model price relatives avoids this problem. A price relative is just the end-of-period price of the asset divided by the beginning price (S1/S0) and is equal to (1 + the holding period return). To get the end-of-period asset price, we can simply multiply the price relative by the beginning- of-period asset price. Because a lognormal distribution takes a minimum value of zero, end-of-period asset prices cannot be less than zero. A price relative of zero corresponds to a holding period return of –100% (i.e., the asset price has gone to zero). MODULE QUIZ 14.2 1. The probability that a normal random variable will be more than two standard deviations above its mean is: A. 0.0217. B. 0.0228. C. 0.4772. D. 0.9772. 2. Which of the following random variables is least likely to be modeled appropriately by a lognormal distribution? A. The size of silver particles in a photographic solution. B. The number of hours a housefly will live. C. The return on a financial security. D. The weight of a meteor entering the earth’s atmosphere. MODULE 14.3: STUDENT’S T, CHI-SQUARED, AND F- DISTRIBUTIONS Student’s t-Distribution Student’s t-distribution is similar to a normal distribution, but has fatter tails (i.e., a greater proportion of the outcomes are in the tails of the distribution). It is the appropriate distribution to use when constructing con idence intervals based on small samples (n < 30) from a population with unknown variance and a normal, or approximately normal, distribution. It may also be appropriate to use the t-distribution when the population variance is unknown and the sample size is large enough that the central limit theorem will assure that the sampling distribution is approximately normal. Student’s t-distribution has the following properties: It is symmetrical. It is de ined by a single parameter, the degrees of freedom (df), where the degrees of freedom are equal to the number of sample observations minus 1, n − 1, for sample means. It has a greater probability in the tails (fatter tails) than the normal distribution. As the degrees of freedom (the sample size) gets larger, the shape of the t- distribution more closely approaches a standard normal distribution. The degrees of freedom for tests based on sample means are n − 1 because, given the mean, only n − 1 observations can be unique. The table in Figure 14.6 contains one-tailed critical values for the t-distribution at the 0.05 and 0.025 levels of signi icance with various degrees of freedom (df). Note that, unlike the z-table, the t-values are contained within the table and the probabilities are located at the column headings. Figure 14.6: Table of Critical t-Values Figure 14.7 illustrates the shapes of the t-distribution associated with different degrees of freedom. The tendency is for the t-distribution to look more and more like the normal distribution as the degrees of freedom increase. Practically speaking, the greater the degrees of freedom, the greater the percentage of observations near the center of the distribution and the lower the percentage of observations in the tails, which are thinner as degrees of freedom increase. This means that con idence intervals for a random variable that follows a t-distribution must be wider than those for a normal distribution, for a given con idence level. Figure 14.7: t-Distributions for Different Degrees of Freedom (df) The Chi-Squared Distribution Hypothesis tests concerning population parameters and models of random variables that are always positive are often based on a chi-squared distribution, denoted χ2. The chi-squared distribution is asymmetrical, bounded below by zero, and approaches the normal distribution in shape as the degrees of freedom increase. Figure 14.8: Chi-Squared Distribution The chi-squared test statistic, χ2, with n − 1 degrees of freedom, is computed as: where: n = sample size s2 = sample variance = hypothesized value for the population variance The chi-squared test compares the test statistic to a critical chi-squared value at a given level of signi icance to determine whether to reject or fail to reject a null hypothesis. The F-Distribution Hypotheses concerning the equality of the variances of two populations are tested with an F-distributed test statistic. An F-distributed test statistic is used when the populations from which samples are drawn are normally distributed and that the samples are independent. The test statistic for the F-test is the ratio of the sample variances. The F-statistic is computed as: where: = variance of the sample of n1 observations drawn from Population 1 = variance of the sample of n2 observations drawn from Population 2 An F-distribution is presented in Figure 14.9. As indicated, the F-distribution is right- skewed and is truncated at zero on the left-hand side. The shape of the F-distribution is determined by two separate degrees of freedom, the numerator degrees of freedom, df1, and the denominator degrees of freedom, df2. Note that n1 − 1 and n2 − 1 are the degrees of freedom used to identify the appropriate critical value from the F-table (provided in the Appendix). Some additional properties of the F-distribution include the following: The F-distribution approaches the normal distribution as the number of observations increases (just as with the t-distribution and chi-squared distribution). A random variable’s t-value squared (t2) with n − 1 degrees of freedom is F- distributed with 1 degree of freedom in the numerator and n − 1 degrees of freedom in the denominator. There exists a relationship between the F- and chi-squared distributions such that: as the # of observations in denominator → ∞ Figure 14.9: F-Distribution The Exponential Distribution The exponential distribution is often used to model waiting times, such as how long it takes an employee to serve a customer or the time it takes a company to default. The PDF for this distribution is as follows: In the previous function, the scale parameter, β, is greater than zero and is the reciprocal of the rate parameter λ (i.e., λ = 1/ β). The rate parameter measures the rate at which it will take an event to occur. In the context of waiting for a company to default, the rate parameter is known as the hazard rate and indicates the rate at which default will arrive. Figure 14.10 displays the PDF of the exponential distribution assuming different values for the rate parameter. Figure 14.10: Exponential PDF The exponential distribution is able to assess the time it takes a company to default. However, what if we want to evaluate the total number of defaults over a speci ic period? As it turns out, the number of defaults up to a certain period, Nt, follows a Poisson distribution with a rate parameter equal to t / β. We can further examine the relationship between the exponential and Poisson distributions by considering the mean and variance of both distributions. Recall that the mean and variance of a Poisson-distributed random variable is equal to λ. As it turns out, the mean of the exponential distribution is equal to 1 / λ, and the variance is equal to 1 / λ2. The Beta Distribution The beta distribution can be used for modeling default probabilities and recovery rates. As a result, it is used in some credit risk models such as CreditMetrics®, which will be discussed in the FRM Part II curriculum. The mass of the beta distribution is located between the intervals zero and one. As you can see from Figure 14.11, this distribution can be symmetric or skewed depending on the values of its shape parameters (β and α). Figure 14.11: Beta PDF Mixture Distributions LO 14.b: Describe a mixture distribution and explain the creation and characteristics of mixture distributions. The distributions discussed in this reading, as well as other distributions, can be combined to create unique PDFs. It may be helpful to create a new distribution if the underlying data you are working with does not currently it a predetermined distribution. In this case, a newly created distribution may assist with explaining the relevant data. To illustrate a mixture distribution, suppose that the returns of a stock follow a normal distribution with low volatility 75% of the time and high volatility 25% of the time. Here, we have two normal distributions with the same mean, but different risk levels. To create a mixture distribution from these scenarios, we randomly choose either the low or high volatility distribution, placing a 75% probability on selecting the low volatility distribution. We then generate a random return from the selected distribution. By repeating this process several times, we will create a probability distribution that re lects both levels of volatility. Mixture distributions contain elements of both parametric and nonparametric distributions. The distributions used as inputs (i.e., the component distributions) are parametric, while the weights of each distribution within the mixture are nonparametric. The more component distributions used as inputs, the more closely the mixture distribution will follow the actual data. However, more component distributions will make it dif icult to draw conclusions given that the newly created distribution will be very speci ic to the data. By mixing distributions, it is easy to see how we can alter skewness and kurtosis of the component distributions. Skewness can be changed by combining distributions with different means, and kurtosis can be changed by combining distributions with different variances. Also, by combining distributions that have signi icantly different means, we can create a mixture distribution with multiple modes (e.g., a bimodal distribution). Creating a more robust distribution is clearly bene icial to risk managers. Different levels of skew and/or kurtosis can reveal extreme events that were previously dif icult to identify. By creating these mixture distributions, we can improve risk models by incorporating the potential for low-frequency, high-severity events. MODULE QUIZ 14.3 1. The t-distribution is the appropriate distribution to use when constructing confidence intervals based on: A. large samples from populations with known variance that are nonnormal. B. large samples from populations with known variance that are at least approximately normal. C. small samples from populations with known variance that are at least approximately normal. D. small samples from populations with unknown variance that are at least approximately normal. 2. Which of the following statements about F- and chi-squared distributions is least accurate? Both distributions: A. are asymmetrical. B. are bound by zero on the left. C. are defined by degrees of freedom. D. have means that are less than their standard deviations. KEY CONCEPTS LO 14.a A continuous uniform distribution is one where the probability of X occurring in a possible range is the length of the range relative to the total of all possible values. Letting a and b be the lower and upper limit of the uniform distribution, respectively, then for a ≤ x1 ≤ x2 ≤ b, The binomial distribution is a discrete probability distribution for a random variable, X, that has one of two possible outcomes: success or failure. The probability of a speci ic number of successes in n independent binomial trials is: where p = the probability of success in a given trial The Poisson random variable, X, refers to a speci ic number of successes per unit. The probability for obtaining X successes, given a Poisson distribution with parameter λ, is: The normal probability distribution has the following characteristics: The normal curve is symmetrical and bell-shaped with a single peak at the exact center of the distribution. Mean = median = mode, and all are in the exact center of the distribution. The normal distribution can be completely de ined by its mean and standard deviation because the skew is always 0 and kurtosis is always 3. A standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. A normal random variable, x, can be normalized (changed to a standard normal, z) with the transformation z = (x – mean of x) / standard deviation of x. A lognormal distribution exists for random variable Y, when Y = eX, and X is normally distributed. The t-distribution is similar, but not identical, to the normal distribution in shape—it is de ined by the degrees of freedom and has fatter tails. The t-distribution is used to construct con idence intervals for the population mean when the population variance is not known. Degrees of freedom for the t-distribution is equal to n − 1; Student’s t-distribution is closer to the normal distribution when df is greater, and con idence intervals are narrower when df is greater. The chi-squared distribution is asymmetrical, bounded below by zero, and approaches the normal distribution in shape as the degrees of freedom increase. The F-distribution is right-skewed and is truncated at zero on the left-hand side. The shape of the F-distribution is determined by two separate degrees of freedom. LO 14.b Mixture distributions combine the concepts of parametric and nonparametric distributions. The component distributions used as inputs are parametric while the weights of each distribution within the mixture are based on historical data, which is nonparametric. ANSWER KEY FOR MODULE QUIZZES Module Quiz 14.1 1. A The probability of a defective car (p) is 0.05; hence, the probability of a nondefective car (q) = 1 − 0.05 = 0.95. Assuming a Poisson distribution: λ = np = (3)(0.05) = 0.15 Then, (LO 14.a) 2. C Success = having a web page: [6! / 4!(6 − 4)!](0.6)4(0.4)6 − 4 = 15(0.1296)(0.16) = 0.311 (LO 14.a) 3. B Since a = 12 and b = 28: (LO 14.a) Module Quiz 14.2 1. B 1 − F(2) = 1 − 0.9772 = 0.0228 (LO 14.a) 2. C A lognormally distributed random variable cannot take on values less than zero. The return on a inancial security can be negative. The other choices refer to variables that cannot be less than zero. (LO 14.a) Module Quiz 14.3 1. D The t-distribution is the appropriate distribution to use when constructing con idence intervals based on small samples from populations with unknown variance that are either normal or approximately normal. (LO 14.a) 2. D There is no consistent relationship between the mean and standard deviation of the chi-squared or F-distributions. (LO 14.a)