ST1111 Probability Models Lecture Notes PDF
Document Details
Uploaded by Deleted User
null
C. Scarrott & E. Holian
Tags
Summary
These lecture notes cover fundamental concepts in probability and statistics. They explore different interpretations of probability, including classical, frequentist, and subjective approaches. The notes then introduce random variables, categorizing them as discrete or continuous and reviewing various probability rules.
Full Transcript
ST1111 Lecture Worksheet C. Scarrott & E. Holian 1 Introduction 1.1 Class Discussion - Probability and Statistics What does the term “probable” mean to you ? How would you describe what the term “statistics” means? Feel free to jot down some...
ST1111 Lecture Worksheet C. Scarrott & E. Holian 1 Introduction 1.1 Class Discussion - Probability and Statistics What does the term “probable” mean to you ? How would you describe what the term “statistics” means? Feel free to jot down some notes as we discuss.... 1.2 Interpreting Probabilities We all know the answer to the question “What is the probability of getting a head if I toss a fair coin?”, but there is also a probability within the statement “I have a 50% chance of getting an A in this course.” There are many approaches to interpreting probabilities, see this page at Wikipedia on “Probability Interpretations” Of relevance in this course: 1. classical probability - each outcome has same chance of occurring, so probability of an event is simply the proportion of outcomes in the event; 2. frequentist probability - the probability of an event is defined by the long-term relative frequency that it occurs; and 3. subjective probability - a probability statement represents a “degree of belief” that the event will occur, so are a personalised/subjective judgment. There are other interpretations, many of which relate to concepts in philosophy/logic, each with their own features and drawbacks. Mostly these interpretations follow certain basic rules, or axioms of probability. Page 1 ST1111 Lecture Worksheet C. Scarrott & E. Holian Irish Lotto Example Consider the numbers drawn in the Irish Lotto (since 2015) and how this relates to these concepts. There are forty seven balls labelled 1 through 47 for this experiment and so the possible outcomes are {1, 2,... , 46, 47}. The balls are thoroughly mixed in the Lotto machine before each one comes out, 6 balls are drawn without replacement followed a 7th bonus ball We can model it with each new ball selected having the same probability of coming out from those remaining If we consider the first ball selected, we assign each outcome the same probability of 1 47 This is a classical probability statement. Here are the results of all the balls selected in the draws between 5th September 2015 and Saturday 26th September 2020: 75 Frequency 50 25 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 Ball Number 0.03 Proportion 0.02 0.01 0.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 Ball Number The upper plot shows the frequency by which each ball number is drawn, and the lower plot shows the proportion of times each ball number is drawn. Notice how these observed proportions are close to the value of 1/47 that you would expect them to converge to in the long-term, consistent with the “frequentist probability” interpretation, for the first ball selected. As to someone’s “subjective probability” of their favourite number coming out first, that is their personal belief. I think it is still 1/47! Page 2 ST1111 Lecture Worksheet C. Scarrott & E. Holian How likely is it that.. you will win the lottery this week? in a room of 23 people, at least two people share the same birthday? an insured driver will make a claim? the next outcome in a time series will be higher than the previous outcome? the evidence presented at a jury trial supports guilty verdict if the accused was actually innocent? ’When it is not in our power to determine what is true, we must ascertain that which is most probable’ René Descartes In order to learn from data, to make predictions we need to take into account randomness measure how probable events are likely to be assign a probabilistic model to real-life phenomena Let’s start at the beginning.... Page 3 ST1111 Lecture Worksheet C. Scarrott & E. Holian 1.3 Random variables An experiment is a process that leads to an observation or measurement, an outcome. The set of all possible outcomes of an experiment is referred to as the sample space, denoted by S. Each element in the sample space is referred to as a sample point. An event can be a single sample point or a collection of sample points. Random variable: X is a function on S with values in real numbers, X : S → R. It transforms the outcome of an experiment into a number. Examples: Toss a coin 10 times, Sample Space = S = {HT H...HT,....}, all configurations of H and T. Let random Variable X = number of heads, X : S → {0, 1,..., 10} for this example. Examples: Roll a die observe uppermost face: 1,2,3,4,5,6 Roll 2 dice sum of 2 faces: 2,3,... , 12 Do you live in halls: Yes, No Number of heads when toss coin 10 times: 0,1,2,3,4,... ,10 Number of cars using drive-through 6pm-8pm: 0,1,2,... Number of emails received in a day: 0,1,2,... Mark awarded on a multiple choice question with negative marking : -1, 0, 2. These are all discrete outcomes as the sample points are all isolated values. There can be a finite set of values or an countably infinite set of values. A random variable may also be continuous which are uncountably infinite: Examples: rate of return of a stock time to failure time spent on mobile phone Defn: If S is a sample space with a probability measure and X is a real-valued function defined over the elements of S, then X is called a random variable. A random variable may be categorised as either discrete or continuous. Page 4 ST1111 Lecture Worksheet C. Scarrott & E. Holian 1.4 Revision - Probability Rules Appendix: Material you should be familiar with. Outcomes (or events) with probability 0 cannot occur and those with probability 1 always occur. Suppose the events A and B are not non-overlapping, they can be described as disjoint or mutually exclusive so that the intersection between them is A ∩ B = ∅ and P (A ∩ B) = 0. 1.4.1 Axioms of Probability The axioms of probability define a function P that assigns real numbers, probabilities}, to events: 1. For any event A, 0 ≤ P (A) ≤ 1 2. If S is the sample space then P (S) = 1 3. If A1 , A2 ,... is a sequence of mutually exclusive events then ∞ ∞ ! = P (Ai ). [ X P Ai i=1 i=1 Here, the infinite union ∞ Aj = A1 ∪ A2 ∪ · · · [ j=1 is the set of elements that are in at least one of A1 , A2 ,... Note that Axiom 3 works with a finite union of sets too, since any number of them can be the empty set, ∅, which is mutually exclusive with everything, so we also have: n n ! = P (Ai ). [ X P Ai i=1 i=1 It is common to use the special case P (A ∪ B) = P (A) + P (B) at post-primary. Probability space: (S, A, P ) where S - sample space, A - set of all events, P - probability function. 1.4.2 Other useful properties 1. Complementary events rule: P (A) = 1 − P (A), where A is referred to as the complement of event A, and P (A) is the probability of event A not occurring. 2. Additive rule: P (A ∪ B) = P (A) + P (B) − P (A ∩ B), where P (A ∪ B) = P (A or B) is the probability of events A or B occurring. 3. Additive rule for mutually exclusive events (Axiom 3): Events A and B are said to be mutually exclusive (disjoint) if P (A ∩ B) = 0. In this case the additive rule reduces to the special case for mutually exclusive events: P (A ∪ B) = P (A) + P (B). 4. Conditional probability: P (B|A) = P P(A∩B) (A) where P (B|A) = P (B given A) is the probability of eventsB occurring given A occurs. Page 5 ST1111 Lecture Worksheet C. Scarrott & E. Holian 5. Multiplication rule: P (A ∩ B) = P (A)P (B|A), where P (A ∩ B) = P (A and B) is the probability of events A and B occurring. 6. Independence: Events A and B are said to be independent events if P (B|A) = P (B). In this case the multiplication rule reduces to the special case for independent events: P (A ∩ B) = P (A)P (B). 7. Partition rule: P (A) = P (A ∩ B) + P (A ∩ B) 8. Total probability rule: Extending 7, if a sample space S is broken up into a set of disjoint partitions, B1 , B2 , · · · , Bk , where Bi ∩ Bj = ∅ for i 6= j, that is, S = ki=1 Bi , then for any event A ∈ S, S k k P (A) = P (A ∩ Bi ) = P (Bi )P (A|Bi ) X X i=1 i=1 9. De Morgan’s laws: P (A ∪ B) = P (A ∩ B) P (A ∩ B) = P (A ∪ B) 10. Additive rule for three or more events: Extending 2, let A, B, C be any three events, P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (B ∩ C) − P (A ∩ C) + P (A ∩ B ∩ C) In general, let A1 , A2 , A3 , · · · , An be any n events, n n n n ! = P (Ai )− P (Ai ∩Aj )+ P (Ai ∩Aj ∩Ak )−· · ·+(−1)n−1 P (A1 ∩· · ·∩An ) [ X X X P Ai i=1 i=1 i x) to be consistent with the usual definition of cumulative probabilities. Probability density function: For a continuous random variable X, the function f (x), such that, Z x2 P (x1 < X ≤ x2 ) = f (x)dx x1 is the probability density function. The summation used in a discrete random variable is replaced by an integral for the continuous random variable, and these probabilities are the areas under the density curve: x2 Z x2 P (X = x) replaced by f (x)dx X x1 x1 Examples: lower tail P (X ≤ x), upper tail P (X > x) and interval P (x1 < X ≤ x2 ) probability areas sketched below. 0.3 0.5 0.8 0.4 0.6 0.2 0.3 f(x) f(x) f(x) P(X −1) = 0.4 0.2 0.2 0.841 0.1 0.2 0.1 P(1 < X 1) (iv) P (0.5 < X ≤ 4.5) (d) Determine the cumulative distribution function F (x). (e) Use the cumulative distribution function to find: (i) P (X ≤ 2) (ii) P (X ≤ 2.4) (iii) P (X > 1) (iv) P (0.5 < X ≤ 4.5) (f) Sketch the cumulative distribution function. (g) If F (x) = 0.5 find the value of x. Interpret the meaning of this feature. Example 6: The continuous random variable X is defined by the density function f (x) = cx2 for 0 ≤ x ≤ 1 3 2 f(x) 1 0 0.00 0.25 0.50 0.75 1.00 x (a) Find the value of c. (b) Plot the density function f (x). (c) Find the probabilities (i) P (X ≤ 0.5) (ii) P (0.5 < X ≤ 0.8) (iii) P (X > 0.2) (d) Determine the cumulative distribution function F (x). (e) Use the cumulative distribution function to find: (i) P (X ≤ 0.5) (ii) P (0.5 < X ≤ 0.8) (iii) P (X > 0.2) Page 24 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example 7: The lifetime in hours of a certain product is a continuous random variable X defined by the density function 100 f (x) = 2 for x ≥ 100 x 0.0100 0.0075 f(x) 0.0050 0.0025 0.0000 0 100 200 300 400 500 x (a) Find the probability (i) an item will function for less than 150 hours (ii) an item will function for at least 200 hours (b) Determine the cumulative distribution function F (x). (c) Use the cumulative distribution function to find the same probabilities. Example 8: The lifetime in hours that a computer functions before breaking down is a continuous random variable X defined by the density function 1 −x/100 f (x) = e for x ≥ 0 100 0.0100 0.0075 f(x) 0.0050 0.0025 0.0000 0 100 200 300 400 500 x (a) Find the probability (i) a computer will function for less than 100 hours (ii) a computer will function for at least 150 hours (b) Determine the cumulative distribution function F (x). (c) Use the cumulative distribution function to find the same probabilities. Page 25 ST1111 Lecture Worksheet C. Scarrott & E. Holian Try out this R code to produce plots of the density function and the cumulative distribution function: For Example 5: fx = function(x) {1 / 6} Fx = function(x) {x / 6} ggplot(data.frame(x = c(0, 6)), aes(x = x)) + stat_function(fun = fx) + ylab("f(x)") ggplot(data.frame(x = c(0, 6)), aes(x = x)) + stat_function(fun = Fx) + ylab("F(x)") For Example 6: fx = function(x) {3 * xˆ2} Fx = function(x) {xˆ3} ggplot(data.frame(x = c(0, 1)), aes(x = x)) + stat_function(fun = fx) + ylab("f(x)") ggplot(data.frame(x = c(0, 1)), aes(x = x)) + stat_function(fun = Fx) + ylab("F(x)") Page 26 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example 9: The cumulative probability distribution for a random variable X is defined as 4 (3x − x3 + 2) for − 1 ≤ x ≤ 1 ( 1 F (x) = 0 elsewhere. 1.00 0.75 F(x) 0.50 0.25 0.00 −1.0 −0.5 0.0 0.5 1.0 x (a) Verify F (1) = P (X ≤ 1) = 1 and F (−1) = P (X ≤ −1) = 0. (b) Find the probabilities (i) P (X ≤ 0) (ii) P (−0.5 < X ≤ 0) (iii) P (X > 0.5) (c) Determine the density function f (x). Example 10: The probability density for a random variable X is defined as x for 0 < x ≤ 1 f (x) = 2−x for 1 ≤ x ≤ 2. 0 elsewhere (a) Sketch the function f (x). (b) Verify this function is a probability density function, i.e. all x f (x)dx = 1 R (c) Find the probabilities (i) P (X ≤ 0.5) (ii) P (1.2 < X ≤ 1.5) (iii) P (X > 0.5) (d) Determine the cumulative distribution function F (x). (e) Sketch F (x). (f) Use F (x) to find the same probabilities. Page 27 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3 Describing features of interest of random variables 3.1 Expectation Class discussion: What do you expect to happen? How much do you expect to win ? What do we mean by “on average” ? The expectation is one of the fundamental concepts in probability. The expected value of a random variable gives the population mean, which is a measure of the centre of the distribution. By taking the expected value of various functions of a random variable, we can measure many features of its distribution, e.g. spread and correlation. In mechanical terms, expected value is the “centre of mass” of the probability density or mass function: the value of f (x) is the mass and the value x is the position. The expected value of X, E(X), is the balance point, where the torques produced by the mass on each side are equal and opposite, so no rotation occurs. This does not mean that the area on each side is 0.5 – the position matters: further away produces more torque. f (x) E(X) x Defn: The expected value of random variable X is the sum of the product of each outcome by its probability, summing over all possible outcomes of X, i.e. for a discrete random variable X, where f (x) is the value of its probability distribution at x, the expected value of X is E(X) = xf (x). X all x for a continuous random variable, where f (x) is the value of its probability density at x, the expected value of X is Z E(X) = xf (x)dx. all x E(X) is often also given the notation µX , or simply µ when referring to the one random variable. Page 28 ST1111 Lecture Worksheet C. Scarrott & E. Holian These definitions give expectation as a “weighted average” of the possible values. This is true but some intuitive idea of expectation is also helpful. Expectation is not necessarily what you expect. Consider tossing a fair coin. If it is heads you lose 10e. If it is tails you win 10e. What do you expect to win? Mathematically, if X is the amount you win then 1 1 E(X) = −10 × + 10 × = 0. 2 2 So the expected value of X is not even one of the possible outcomes! It makes more sense in the context of the next point. Expectation is the “long term average”. If you repeat this experiment independently many times, each time you either lose 10e or win 10e, so you will get a long sequence of −10e and 10es. The average of this long sequence converges to E(X) = 0 as the number of trials grows. Over time, the proportions of winning and losing both tend to 12 , so each outcome is weighted in the same way as the above definition. 0 2.50€ Average Won 5.00€ 7.50€ 10.00€ No. of 0 500 1000 1500 2000 2500 3000 Trials Example 1: Find the expected value for the random variable X defined as the value on the uppermost face when throwing a die. x=6 1 1 1 1 1 1 1 21 µ = E(X) = xf (x) = x =1 +2 +3 +4 +5 +6 = = 3.5 X X all x x=1 6 6 6 6 6 6 6 6 Interpret this value ! Example 2: Find the expected value for the random variable X defined as the value on the uppermost face when throwing the unfair die discussed previously (Before calculating it, do you think it would be a different value to the fair die? - If so in what way different?). 1 1 1 3 1 1 29 µ = E(X) = xf (x) = 1 + 2 + 3 + 4 + 5 + 6 = = 3.625 X all x 8 8 8 8 8 8 8 Your turn! Multiple choice score. Let X = the mark you get on multiple choice question in which X ∈ {−1, 0, 2} as defined previously. Find µ = E(X). Page 29 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example 5: Find the expected value for the continuous random variable X described in Example 5, i.e. defined by the density function f (x) = 16 for 0 ≤ x ≤ 6. 1 1 1 Z Z 6 E(X) = xf (x)dx = x dx = x2 |60 = (62 − 02 ) = 3 all x 0 6 12 12 Example 6: Find the expected value for the continuous random variable X described in Example 6, i.e. defined by the density function f (x) = 3x2 for 0 ≤ x ≤ 1. 3 3 3 Z Z 1 Z 1 E(X) = xf (x)dx = x(3x )dx = 2 3x3 dx = x4 |10 = (14 − 04 ) = all x 0 0 4 4 4 Example 8: The lifetime in hours that a computer functions before breaking down is a continuous random variable X defined by the density function 1 −x/100 f (x) = e for x ≥ 0 100 1 −x/100 Z Z ∞ E(X) = xf (x)dx = x e dx = −(e−x/100 )(x + 100)|∞ 0 = 100 all x 0 100 (This is a challenging integration exercise, don’t worry if you don’t follow it yet !) Example 10: The probability density for a random variable X is defined as x for 0 ≤ x ≤ 1 f (x) = 2−x for 1 < x ≤ 2. 0 elsewhere E(X) = x xf (x)dxR R Rall 1 = x(x)dx + 12 x(2 − x)dx R01 2 R2 = 0 x dx + 1 (2x − x )dx 2 = [1/3x3 ]10 + [x2 − 1/3x3 )]21 = 1/3 + (22 − 1/3(2)3 ) − (1 − 1/3) = 1 Page 30 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.1.1 Expected value of a function of a random variable. If g(X) is some function of the random variable X then for a discrete random variable X, where f (x) is the value of its probability distribution at x, the expected value of g(X) is E(g(X)) = g(x)f (x). X all x for a continuous random variable, where f(x) is the value of its probability density at x, the expected value of g(X) is Z E(g(X)) = g(x)f (x)dx. all x Example: Say a player, for a 5 euro bet, gambles on winning the amount on the uppermost face on throwing a die. Let the winnings be the random variable X and the gain or loss from the bet be the random variable Y. Then Y = X − 5. Find E(Y ). Assuming it is a fair die: E(Y ) = E(X − 5) = all x (x − 5)f (x) = x=1 (x − 5) 6 x=6 1 P P = (1 − 5) 16 + (2 − 5) 16 + (3 − 5) 16 + (4 − 5) 61 + (5 − 5) 16 + (6 − 5) 16 = −9 6 = −1.5 Example: Say a player, for a 5 euro bet, gambles on winning DOUBLE the amount on the uppermost face on throwing a die. Let the winnings be the random variable X and the gain or loss from the bet be the random variable Y. Then Y = 2X − 5. Find E(Y ). E(Y ) = E(2X − 5) = all x (2x − 5)f (x) = x=1 (2x − 5) 6 x=6 1 P P = (2 − 5) 6 + (4 − 5) 6 + (6 − 5) 6 + (8 − 5) 16 + (10 − 5) 16 + (12 − 5) 16 1 1 1 = 12 6 =2 Which game would you choose to play ?! Example: For the continuous random variable X hdescribed i in Example 6, i.e. defined by the density function f (x) = 3x2 for0 ≤ x ≤ 1, find the E 13 X − 1. h i R x=1 1 1 −1 = x 3 x − 1 f (x)dx 1 = 3 x − 1 3x dx 2 R E 3X x=0 Rall x=1 3 = x − 3x2 dx hx=0 i1 x4 = 4 − x3 0 = 1 4 − 1 − (0) = −4 3 If Y = aX + b, where a and b are fixed contants, show that E(Y ) = aE(X) + b. In the discrete case: Letting g(x) = aX + b, and f (x) the probability distribution function then E(aX + b) = all x (ax + b)f (x)P P = a all x xf (x) + b all x f (x) P = aE(X) + b(1) = aE(X) + b. Page 31 ST1111 Lecture Worksheet C. Scarrott & E. Holian In the continuous case: Letting g(x) = aX + b, and f (x) the density function then ∞ E(aX + b) = −∞ (ax + b)f (x)dxR R R∞ ∞ = a −∞ xf (x)dx + b −∞ f (x)dx = aE(X) + b(1) = aE(X) + b. Following from this we see the result: E(aX) = aE(X) and the result: E(b) = b. Give an interpretation of the last result! Reworking the previous gambling applications: For the fair die it is known E(X) = 3.5 If Y = X − 5, E(Y ) = E(X) − 5 = 3.5 − 5 = −1.5 If Y = 2X − 5, E(Y ) = 2E(X) − 5 = 2(3.5) − 5 = 2 Example: For the continuous random variable X described in Example h 6, i i.e. defined by the density function f (x) = 3x for 0 ≤ x < 1, where E(X) = 3/4, find E 3 X − 1. 2 1 1 1 1 3 3 E x − 1 = E(X) − 1 = −1=− 3 3 3 4 4 Example: For the fair die find E(X 2 ). E(X 2 ) = all x x f (x) = 2 x=6 2 1 P P x=1 x 6 = (1) 6 + (4) 6 + (9) 6 + (16) 61 + (25) 16 + (36) 61 1 1 1 = 91 6 = 15.1667 Example: For the continuous random variable X described in Example 6, i.e. defined by the density function f (x) = 3x2 for 0 ≤ x ≤ 1, find E(X 2 ). R1 E(X 2 ) = all x x2 f (x)dx = x2 3x2 dx R 0 = 01 3x4 dx R h i1 = 3 5 5x 0 = 3 5 You will see more about functions of random variables later! Page 32 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.2 Variance - a measure of spread, dispersion, variability Why measure spread? Imagine a loaded die where outcomes 1 and 6 more likely: Compared to Example 1, the fair die: X 1 2 3 4 5 6 X 1 2 3 4 5 6 P (X = x) 1/4 1/8 1/8 1/8 1/8 1/4 P (X = x) 1/6 1/6 1/6 1/6 1/6 1/6 µ = E(X) = 1 14 + 2 18 + 3 18 + 4 18 + 5 18 + 6 41 = 3.5 µ = E(X) = 1 16 + 2 61 + 3 61 + 4 16 + 5 16 + 6 61 = 21 6 = 3.5 0.3 mu = E(X) = 3.5 0.3 mu = E(X) = 3.5 0.2 0.2 f(x) f(x) 0.1 0.1 0.0 0.0 1 2 3 4 5 6 1 2 3 4 5 6 x x The expected value for the loaded and fair dice is the same! So you are missing a key piece of information for describing the two populations. What we need to look at is the distance of each outcome from the mean, this is the deviation of the outcome from the mean, and how likely they are to occur: Deviation = x − µ The deviations: The deviations: X −µ 1-3.5 2-3.5 3-3.5 4-3.5 5-3.5 6-3.5 X −µ 1-3.5 2-3.5 3-3.5 4-3.5 5-3.5 6-3.5 =-2.5 =-1.5 =-0.5 =0.5 =1.5 =2.5 =-2.5 =-1.5 =-0.5 =0.5 =1.5 =2.5 P (X = x) 1/4 1/8 1/8 1/8 1/8 1/4 P (X = x) 1/6 1/6 1/6 1/6 1/6 1/6 We would like to get a measure of the likelihood of all the deviations across all outcomes of the random variable, but before adding across we square the deviations, so that we take the magnitudes irrespective of direction, i.e. ignoring the ± sign. Squared Deviation = (x − µ)2 The squared deviations: The squared deviations: (X − µ)2 6.25 2.25 0.25 0.25 2.25 6.25 (X − µ)2 6.25 2.25 0.25 0.25 2.25 6.25 P (X = x) 1/4 1/8 1/8 1/8 1/8 1/4 P (X = x) 1/6 1/6 1/6 1/6 1/6 1/6 Now taking account of the probability that the squared deviations are to occur: E (X − µ)2 = all x (x − µ)2 P (X = x) E (X − µ)2 = all x (x − µ)2 P (X = x) P P = 6.25 4 + 2.25 8 + 0.25 8 + 0.25 8 + 2.25 8 + 6.25 4 ≈ 3.75 = 6.25 16 +2.25 16 +0.25 16 +0.25 16 +2.25 16 +6.25 16 ≈ 2.9167 1 1 1 1 1 1 and bringing it back to the original scale, to determine the standard deviation: √ √ 3.75 ≈ 1.9365 2.9167 ≈ 1.7078 Comparing the standard deviations, the loaded die where outcomes 1 and 6 are more likely to occur has a higher standard deviation in comparison to the fair die! Page 33 ST1111 Lecture Worksheet C. Scarrott & E. Holian Defn: The variance of random variable X is the sum of the product of the squared deviation of an outcome from the mean multiplied by its probability, summing over all possible outcomes of X, i.e. for a discrete random variable X, where f (x) is the value of its probability mass at x, the variance of X is V ar(X) = (x − µ)2 f (x). X all x for a continuous random variable, where f (x) is the value of its probability density at x, the variance of X is Z V ar(X) = (x − µ)2 f (x)dx. all x V ar(X) is often also given the notation 2 , σX or simply σ 2 when referring to the one random variable. Defn: The standard deviation of random variable X is the square root of the variance i.e. for a discrete random variable X, where f (x) is the value of its probability mass at x, the standard deviation of X is sX SD(X) = (x − µ)2 f (x). all x for a continuous random variable, where f (x) is the value of its probability density at x, the standard deviation of X is sZ SD(X) = (x − µ)2 f (x)dx. all x SD(X) is often also given the notation σX , or simply σ when referring to the one random variable. SD(X) = V ar(X) = σ and V ar(X) = (SD(X))2 = σ 2. p Your turn! Multiple choice score. Let X = the mark you get on multiple choice question in which X ∈ {−1, 0, 2} as defined previously. Find σ 2 = V ar(X). Page 34 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example 5: Find the variance of the continuous random variable X described in Example 5, i.e. defined by the density function f (x) = 61 for 0 ≤ x ≤ 6. Recall µ = E(X) = 3. R6 σ 2 = V ar(X) = x (x − µ) f (x)dx = 0 (x − 3) 6 dx 2 21 R all 1 6 2 = 6 h0 (x − 6x + 9)dx R i6 = 1 1 3 6 3 x − 3x2 + 9x 0 = 6 3 (6) − 3(6) + 9(6) − 6 (0) 1 1 3 2 1 = 12 − 18 + 9 = 3 √ σ = SD(X) = 3 Example 6: Find the variance of the continuous random variable X described in Example 6, i.e. defined by the density function f (x) = 3x2 for 0 ≤ x ≤ 1. Recall µ = E(X) = 34 = 0.75. σ 2 = V ar(X) = (x − µ)2 f (x)dx = 01 (x − 0.75)2 3x2 dx R R all R1 2 x = (x − 1.5x + 0.5625)3x2 dx R01 = (3x − 4.5x + 1.6875x )dx 0 4 3 2 = [0.6x5 − 1.125x4 + 0.5625x3 ]10 = [0.6(1) − 1.125(1) + 0.5625(1)] − = 0.0375 √ σ = SD(X) = 0.0375 Example 10: The probability density for a random variable X is defined as x for 0 ≤ x ≤ 1 f (x) = 2−x for 1 < x ≤ 2. 0 elsewhere Recall µ = E(X) = 1 σ 2 = V ar(X) = all x (x − µ)2 f (x)dx = 01 (x − 1)2 xdx + 12 (x − 1)2 (2 − x)dx R R R = 01 (x2 − 2x + 1)xdx + 12 (x2 − 2x + 1)(2 − x)dx R R = 01 (x3 − 2x2 + x)dx + 12 (−x3 + 4x2 − 5x + 2)dx R R = (1/4x4 − 2/3x3 + 1/2x2 )|10 + (−1/4x4 + 4/3x3 − 5/2x2 + 2x)|21 = 16 q σ = SD(X) = 1/6 Page 35 ST1111 Lecture Worksheet C. Scarrott & E. Holian Show that V ar(X) = E(X 2 ) − E(X)2. In the discrete case: Letting µ = E(X) and f (x) the probability distribution function then V ar(X) = E[(X − µ)2 ] = (x − µ)2 f (x) P Pall x 2 = (x − 2µx + µ2 )f (x) Pall x 2 = x f (x) − all x 2µxf (x) + µ2 all x f (x) P P Pall x 2 = all x x f (x) − 2µ all x xf (x) + µ 2P all x f (x) P = E(X 2 ) − 2µE(X) + µ2 (1) = E(X 2 ) − 2µ2 + µ2 = E(X 2 ) − µ2 = E(X 2 ) − E(X)2 In the continuous case: Letting µ = E(X) and f (x) the probability distribution function then R∞ V ar(X) = E[(X − µ)2 ] = R−∞ (x − µ)2 f (x)dx ∞ = R−∞ (x2 − 2µx + Rµ2 )f (x)dx ∞ ∞ R∞ = R−∞ x2 f (x)dx − −∞ 2µxf (x)dx + µ2 R−∞ f (x)dx ∞ ∞ 2 ∞ = −∞ x f (x)dx − 2µ −∞ xf (x)dx 2 +µ −∞ f (x)dx R = E(X 2 ) − 2µE(X) + µ2 (1) = E(X 2 ) − 2µ2 + µ2 = E(X 2 ) − µ2 = E(X 2 ) − E(X)2 If Y = aX + b, where a and b are fixed constants, show that V ar(Y ) = a2 V ar(X). V ar(aX + b) = E[(aX + b)2 ] − (E[aX + b])2 = E[a2 X 2 + 2abX + b2 ] − (E[aX + b])2 = a2 E[X 2 ] + 2abE[X] + b2 − (aE[X] + b])2 = a2 E[X 2 ] + 2abE[X] + b2 − a2 (E[X])2 − 2abE[X] − b2 = a2 E[X 2 ] − a2 (E[X])2 = a2 (E[X 2 ] − (E[X])2 ) = a2 V ar(X) Following from this we see the result: V ar(aX) = a2 V ar(X) and the result: V ar(b) = 0. Following from this we see the result: SD(aX) = aSD(X) and the result: SD(b) = 0. Interpret these results! Page 36 ST1111 Lecture Worksheet C. Scarrott & E. Holian Rework the examples to find V ar(X) using V ar(X) = E(X 2 ) − (E(X))2. Example 1: A fair die, Recall: E(X) = 3.5, E(X 2 ) = 91/6 = 15.1667. σ 2 = V ar(X) = E(X 2 ) − E(X)2 = 15.1667 − 3.52 = 2.9167 Example A loaded die whose outcomes 1 and 6 are more likely to occur. Recall: E(X) = 3.5. Finding E(X 2 ) = all x x2 f (x) P = 1 14 + 4 18 + 9 18 + 16 18 + 25 81 + 36 41 = 16 σ 2 = V ar(X) = E(X 2 ) − E(X)2 = 16 − 3.52 = 3.75 Example 2 For the loaded die where outcome 4 is three times more likely to occur, we found E(X) = 3.625. E(X 2 ) = all x x2 f (x) P = 1 81 + 4 18 + 9 18 + 16 38 + 25 81 + 36 81 = 15.375 V ar(X) = E(X 2 ) − E(X)2 = 15.375 − 3.6252 = 2.2344 Example 5: Find the variance of the continuous random variable X described in Example 5, i.e. defined by the density function f (x) = 61 for 0 ≤ x ≤ 6. Recall E(X) = 3. R6 21 E(X 2 ) = 2 f (x)dx = 0 x 6 dx = 18 x |0 = 18 (6 − 0 ) = 12 1 3 6 1 3 2 R all x x V ar(X) = E(X 2 ) − E(X)2 = 12 − 32 = 3 Example 6: Find the variance of the continuous random variable X described in Example 6, i.e. defined by the density function f (x) = 3x2 for 0 ≤ x ≤ 1. Recall E(X) = 3 4 = 0.75. R1 2 R1 4 E(X 2 ) = 2 f (x)dx = 0 x (3x )dx = 0 3x dx = 5 x |0 = 5 (1 − 0 ) = 5 = 0.6 2 3 5 1 3 5 5 3 R all x x 2 V ar(X) = E(X 2 ) − E(X)2 = 3 5 − 3 4 = 0.6 − 0.5625 = 0.0375 Example 8: The lifetime in hours that a computer functions before breaking down is a continuous random variable X defined by the density function 1 −x/100 f (x) = e for x ≥ 0. 100 Recall: E(X) = 100 1 −x/100 Z Z ∞ E(X 2 ) = x2 f (x)dx = x2 e dx = −(e−x/100 )(x2 + 200x + 20000)|∞ 0 = 20000 all x 0 100 (This is a challenging integration exercise, don’t worry if you don’t follow it yet !) σ 2 = V ar(X) = E(X 2 ) − E(X)2 = 20000 − 1002 = 10000 √ σ 2 = SD(X) = V ar(X) = 10000 = 100 p Page 37 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example 10: The probability density for a random variable X is defined as x for 0 ≤ x ≤ 1 f (x) = 2−x for 1 < x ≤ 2. 0 elsewhere Recall: E(X) = 1 E(X 2 ) = x x f (x)dxR 2 R Rall 1 2 = x (x)dx + 12 x2 (2 − x)dx R01 3 R2 = 0 x dx + 1 (2x − x )dx 2 3 = [1/4x4 ]10 + [2/3x3 − 1/4x4 )]21 = 1/4 + (2/3(2)3 − 1/4(2)4 ) − (2/3 − 1/4) = 1/4 + 11/12 = 7/6 σ 2 = V ar(X) = E(X 2 ) − E(X)2 = 7/6 − 12 = 1/6 σ 2 = SD(X) = V ar(X) = 1/6 = 0.4082 p p Page 38 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.3 Skewness A numeric measurement of skewness for a probability distribution is E[(X − µ)3 ] γ1 = σ3 0.4 8 0.6 0.3 6 0.4 f(x) f(x) f(x) 0.2 4 0.2 0.1 2 0.0 0 0.0 −4 −2 0 2 4 0.00 0.25 0.50 0.75 1.00 0 2 4 6 x x x Skewness parameter = 0. Skewness parameter =-1.14. Skewness parameter =1.41. Symmetric, γ1 = 0. Left-skewed, γ1 < 0. Right-skewed, γ1 > 0. Example 1: Find the skewness parameter for the random variable X defined√as the value on the uppermost face when throwing a die. Recall µ = E(X) = 3.5 , σ = SD(X) = 2.9167 ≈ 1.7078 E((X − µ)3 ) = all x (x − µ) f (x) 3 P = (1 − 3.5)3 16 + (2 − 3.5)3 16 + (3 − 3.5)3 16 + (4 − 3.5)3 61 + (5 − 3.5)3 16 + (6 − 3.5)3 16 = 06 = 0 E[(X − µ)3 ] 0 Skewness parameter= = =0 σ 3 1.70783 Example 6: Find the skewness parameter of the continuous random variable X described in Example 6, i.e. defined by the density function f (x) = 3x2 for 0 ≤ x ≤ 1. Recall E(X) = 34 = 0.75 √ and σ = SD(X) = 0.0375 ≈ 0.1936. R1 E((X − µ)3 ) = x (x − µ) f (x)dx = 0 (x − 0.75) (3x )dx 3 3 2 R Rall 1 = (3x − 6.75x + 5.0625x − 1.26563x )dx 5 4 3 2 0 1 = 3/6x6 − 6.75/5x5 + 5.0625/4x4 − 1.26563/3x3 0 = 3/6(1)6 − 6.75/5(1)5 + 5.0625/4(1)4 − 1.26563/3(1)3 − (0) = (3/6 − 6.75/5 + 5.0625/4 − 1.26563/3) = −0.00625167 3 2 E[(X − µ)3 ] −0.00625167 f(x) Skewness parameter = = ≈ −0.86 1 σ3 0.19363 0 0.00 0.25 0.50 0.75 1.00 x Page 39 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.4 Kurtosis A numeric measurement of kurtosis for a probability distribution is E[(X − µ)4 ] σ4 Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. 0.4 0.3 f(x) 0.2 0.1 0.0 −4 −2 0 2 4 x Kurtosis parameter = 3. 8 0.20 0.6 6 0.18 0.4 f(x) f(x) f(x) 4 0.16 0.2 2 0.14 0.12 0 0.0 0 2 4 6 0.00 0.25 0.50 0.75 1.00 0 2 4 6 x x x Kurtosis parameter =1.8. Kurtosis parameter =4.62. Kurtosis parameter =6. Note: The Excess Kurtosis is a shifted version of the kurtosis parameter: Excess Kurtosis = Kurtosis - 3. So that the standardised normal distribution has Excess Kurtosis = 0. Page 40 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.5 Moments The expressions E(X) and E(X 2 ) are two members of a family of features, E(X), E(X 2 ), E(X 3 ),... , E(X r ),... referred to as moments of the distribution of a random variable, in particular E(X r ) is the rth moment about the origin. The expressions E(X − µ) and V ar(X) = E((X − µ)2 ) are two members of a family of features, E(X − µ), E((X − µ)2 ), E((X − µ)3 ),... , E((X − µ)r ),... referred to as moments of the distribution of a random variable, in particular E((X − µ)r ) is the rth moment about the mean. 3.5.1 Moments about the origin The rth moment about the origin of a random variable X, is the expected value of X r , E(X r ), where r = 0, 1, 2, · · ·, i.e. for a discrete random variable E(X r ) = r f (x) P all x x for a continuous random variable E(X r ) = r f (x)dx. R all x x Evaluating at r = 0 gives E(X 0 ) = E(1) = 1. Evaluating at r = 1 gives E(X 1 ) = E(X), the mean, the expectation of the random variable. Evaluating at r = 2 gives E(X 2 ). Evaluating at r = 3 gives E(X 3 ), e.t.c 3.5.2 Moments about the mean The rth moment about the mean of a random variable X, is the expected value of (X − µ)r , E((X − µ)r ), where r = 0, 1, 2, · · ·, i.e. for a discrete random variable E((X − µ)r ) = all x (x − µ)r f (x), P for a continuous random variable E ((X − µ)r ) = all x (x − µ)r f (x)dx. R Evaluating at r = 0 gives E(1) = 1. Evaluating at r = 1 gives E(X) − µ = 0. Evaluating at r = 2 gives E (X − µ)2 = V ar(X) = σ 2. Evaluating at r = 3 gives E (X − µ)3 which can be used to determine the skewness E((X−µ)3 ) measurement σ3. Evaluating at r = 4 gives E (X − µ)4 which can be used to determine the kurtosis mea- E((X−µ)4 ) surement σ4. Sometimes these are referred to as the mean corrected moments. Page 41 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.5.3 Features written in the form of moments about the origin mean µ = E(X) variance σ 2 = V ar(X) = E((X − µ)2 ) = E(X 2 ) − E(X)2 E (X − µ)3 E(X 3 ) − 3µE(X 2 ) − 3µE(X) − µ3 skewness skewness(X) = σ3 = σ3 etc. 3.5.4 Moment Generating Functions Defn: The moment-generating function of a random variable X, where it exists, is given by MX (t) = E(etX ). A discrete random variable has MX (t) = E(etX ) = etx f (x) and X allxZ A continuous random variable has MX (t) = E(etX ) = etx f (x)dx. allx To see how this generates the moments, use the series expansion: t2 x2 t3 x3 t r xr etx = 1 + tx + + + ··· + + ··· 2! 3! r! 2 2 3 3 r r E[etX ] = E[1 + tX + t 2! X + t 3! X + · · · + t r! X + ···] 2 3 r = 1 + tE(X) + 2! E(X ) + 3! E(X ) + · · · + tr! E(X r ) + · · · t 2 t 3 Note that the moments about the origin can be determined by finding the rth deriviative of the moment generating function evaluated at time t=0. dr MX (t) E[X r ] = |t=0. dtr dMX (t) For 1st derivative E(X) = dt |t=0 = MX 0 (0). d2 MX (t) For 2nd derivative E(X 2 ) = dt2 |t=0 = MX 00 (0). So that µ = E(X) = MX 0 (0), and V ar(X) = E(X 2 ) − E(X)2 = M 00 (0) − (M 0 (0))2. X X Example: The random variable X denotes the lifetime of a laptop battery from full charge to the next recharge. The moment generation function for this random variable is MX (t) = (1 − 25t)−1. Use the moment generating function to find the first two moments about the origin E(X) and E(X 2 ) and hence show that the value of the mean lifetime µ is 25, the value of the variance in lifetimes σ 2 is 625 and the standard deviation in lifetimes has value 25. Page 42 ST1111 Lecture Worksheet C. Scarrott & E. Holian 3.6 Quantiles, Percentiles, Quartiles Defn: The p × 100th percentile of a distribution is the value of X = x at which P (X ≤ x) = p (or in terms of the cumulative distribution F (x) = p). Quantiles are a generalisation of this concept, where the pth quantile of a distribution is the value of X = x at which P (X ≤ x) = p. 0.3 0.5 0.4 0.6 0.2 0.3 0.4 f(x) f(x) f(x) 0.25 0.5 0.75 0.9 0.2 0.1 0.2 0.25 0.5 0.75 0.9 0.1 0.25 0.5 0.75 0.9 0.0 0.0 0.0 1.00 2.25 3.50 4.75 5.50 6.00 −4 −3 −2 −1 0 1 2 3 4 0 2 4 6 x x x Particular points of interest: The Median, η : 50%th percentile = 0.5th quantile, P (X ≤ x) = 0.5. The 1st Quartile, Q1 : 25%th percentile = 0.25th quantile P (X ≤ x) = 0.25, i.e. a quarter. The 3rd Quartile, Q3 : 75%th percentile = 0.75th quantile, P (X ≤ x) = 0.75, i.e. 3 quarters. 3.7 Comparison of mean, median and mode The mean/average: µ = E(X). The median: η, value of X for which P (X ≤ x) = 0.5 The mode: value of X for which f (x) is at maximum. In general(!), the skewness produces and ordering of these features: Left Skewed Symmetrical Right Skewed 5 mean mode median 0.75 4 mode median 0.4 mean 3 mode 0.50 median f(x) f(x) f(x) mean 2 0.2 0.25 1 0 0.0 0.00 0.00 0.25 0.50 0.75 1.00 −4 −2 0 2 4 0 1 2 3 4 x x x mean < median < mode mean = median = mode mean > median > mode Page 43 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4 Probability Models for Discrete Random Variables Discrete uniform distribution Bernoulli distribution Binomial distribution Geometric distribution Negative binomial distribution Hypergeometric distribution Poisson distribution Multinomial distribution Multivariate hypergeometric distribution 4.1 Discrete Uniform Distribution All k discrete points/outcomes of a random variable X have equal probability of occurring. Let the k outcomes be x1 , x2 , x3 , · · · , xk in increasing order of value. The probability distribution function is 1 ∀x ∈ {x1 , x2 ,... , xk } f (x) = P (X = x) =. 0k otherwise Expectation and Variance: k k 1 1X E(X) = = X xi xi i=1 k k i=1 k k !2 k k !2 1 1 1 V ar(X) = = 2 k . X X X X x2i − xi x2i − xi i=1 k i=1 k k i=1 i=1 Example: A fair die k = 6: 1 P (X = x) = for x ∈ {1, 2, 3, 4, 5, 6}. 6 Example: Deal or No Deal From the definition above, note that the k discrete points do not have to be equally spaced!! X = The amount in the player’s selected box at the start of the game, k = 22: P (X = x) = 1 22 for X ∈ {0.01, 0.1, 0.5, 1, 5, 10, 50, 100, 250, 500, 750, 1000, 3000, 5000, 10000, 15000, 20000, 35000, 50000, 75000, 100000, 250000}.. Page 44 ST1111 Lecture Worksheet C. Scarrott & E. Holian Be the banker ! Find the expected value of X, (what offer would you make for the box?!). k 1X E(X) = xi k i 1 = (0.01 + 0.1 + 0.5 + 1 + · · · + 250000 22 = 1/22(0.01 + 0.1 + 0.5 + 1 + 5 + 10 + · · · + 250000.) Some R script might help here... x = c(0.01, 0.1, 0.5, 1,5, 10, 50, 100, 250, 500, 750, 1000, 3000, 5000, 10000, 15000, 20000, 35000, 50000, 75000, 100000, 250000) fx = rep(1 / 22, 22) x 0.01 0.10 0.50 1.00 5.00 10.00 50.00 100.00 250.00 500.00 750.00 1000.00 3000.00 5000.00 10000.00 15000.00 20000.00 35000.00 50000.00 75000.00 100000.00 250000.00 fx 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 0.04545455 sum(fx) # check 1 mu = sum(x * fx) mu 25712.12 4.2 Bernoulli Distribution If an experiment/trial has two outcomes, success or failure, and their probabilities are p and 1 − p respectively, then the number of successes in the trial, 1 or 0, has a Bernoulli distribution. Example: Toss a coin, let success = a head, and define X ∈ {0, 1} with the probability distribution function x = 1 (success/head) ( p, f (x) = 1 − p, x = 0 (failure/tail) If the coin is a fair coin: 0.5, x = 1 (success/head) ( f (x) = 0.5, x = 0 (failure/tail) Example: Select an item for inspection from a manufacturing line, observe either defective or not defective. Let success = a defective item, and say probability an item is defective is 0.12. Define X ∈ {0, 1} with the probability distribution function p = 0.12, x = 1 (success/defective) ( f (x) = 1 − p = 0.88, x = 0 (failure/non-defective) Page 45 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4.3 Binomial Distribution Let X = the number of successes in a sample of n independent Bernoulli trials, each with probability of success p. Examples: Number of reds in 15 spins of roulette wheel Number of defective items in a batch of 5 items Number correct on a 20 multi-choice question exam Number of customers who purchase out of 100 customers who enter store When does a binomial distribution apply? A sample of size n individual trials. Each trial has two possible outcomes - a success and a failure (a Bernoulli trial). The outcome observed for one trial is independent of outcomes of any other trial Each trial has the same probability of a success - let the probability of a success be p. The probability function ! n x f (x) = P (X = x) = p (1 − p)n−x for x = 0, 1, 2, · · · , n. x Written as X ∼ Binomial(n, p), where the symbol “~” is read as the word “follows”. Expectation and Variance: E(X) = np V ar(X) = np(1 − p). Moment-generating function: MX (t) = [1 + p(et − 1)]n Task: Derive/discuss the form of the distribution function f (x) described above. n Task: Show f (x) = 1. X x=0 Task: Prove E(X) = np. Task: Show MX (t) = [1 + p(et − 1)]n Task: Use the moment-generating function to prove E(X) = np and V ar(X) = np(1 − p). Tip: You may find the following useful: n ! n x n−x Binomial theorem expansion (a + b)n =. X a b x=0 x n n ! !! n x n−x X n Note the special case 2n = (1 + 1)n = 1 1 =. X x=0 x x=0 x [n.(n − 1)!] n n−1 ! ! n n! Also the property: = = = r r!(n − r)! [r.(r − 1)!](n − r)! r r−1 Page 46 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example, Binomial: Let the random variable X be the number of defective items in a batch of 5 manufactured on a production line. Say the probability an item is defective is 0.12. Comment on the suitability of the binomial probability distribution for this random variable. A sample of n = 5 manufactured items Two possible outcomes - success or failure - defective or non-defective (a Bernoulli trial). Not told, so must *assume* outcomes of items are independent, e.g. the probability that the second item is defective does not depend on whether the first item was defective or not. Is this realistic? Depends on sampling - discuss. Each item has the same probability of being defective/success, p = P (def ective) = 0.12. Note: the success outcome is the outcome of interest, but not necessarily the "good" one. In shorthand, X ∼ Binomial(5, 0.12) with distribution function 5 ! f (x) = P (X = x) = 0.12x (1 − 0.12)5−x x Calculate the following : a) the probability exactly 3 items out of the 5 are defective. b) the probability all 5 items are defective. c) the probability that no items are defective. d) the probability that no more than 3 items are defective. e) the probability of at least 3 items are defective. f) the probability at least one item is defective. g) the expected value of X. Solution: a) P (X = 3) = 53 0.123 (1 − 0.12)2 = b) P (X = 5) = 55 0.125 (1 − 0.12)0 = c) P (X = 0) = 50 0.120 (1 − 0.12)5 = d) P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) = e) f) P (X ≥ 1) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) + P (X = 5) = or P (X ≥ 1) = 1 − P (X < 1) = 1 − (X = 0) = g) E(X) = np = 5(0.12) = Use your calculator... look for the n r button. Or some R script might help here.. # combination rule: choose(5,3) 10 # probability P(X=3) choose(5,3) * 0.12ˆ3 * (1 - 0.12)ˆ2 0.01338163 # or dbinom(3, 5, 0.12) 0.01338163 Page 47 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4.4 Geometric Distribution Suppose our random variable X = number of independent Bernoulli trials, with probability of success p, until which the first success occurs. When does a geometric distribution apply? Each trial has two possible outcomes - a success and a failure (a Bernoulli trial). Outcome of each trial is independent of any other trial. Each trial has the same probability of a success - let the probability of a success be p. Count number of trials upto and including the first success. The probability distribution function for X ∼ Geom(p): f (x) = P (X = x) = p(1 − p)x−1 for x = 1, 2, 3,... and 0 < p ≤ 1 Sometimes the random variable is defined as the number of failures before the first success, see both definitions on Wikipedia. It is a commonly used model for the waiting time between rare events (that occur independently and at a constant rate), where the time periods are in a block. They must be sufficiently rare to occur at most once per block. For example, we often make “annualised risk” statements about climate or natural disasters where the time block is a year, like this is “a 1 in 100 year flooding event”. Expectation and variance: 1 E(X) = p 1−p V ar(X) = p2 Notice that the return period (average waiting time until an event occurs), E(X), is the reciprocal of the occurrence probability, 1/p, so that an event with probability 0.01 occurs on average once in every 100 years. In real-life applications the geometric distribution is often combined with a model for the “level” of the event, e.g. the insurance/financial loss amount, flood volume or earthquake magnitude. The return level is then associated with a particular return period. Moment-generating function: MX (t) = pet for t < − ln(1 − p). 1 − (1 − p)et Task: Derive/discuss the form of the distribution function f (x) described above. ∞ Task: Show f (x) = 1. X x=1 Task: Prove E(X) = p1. Task: Show MX (t) = pet for t < − ln(1 − p). 1 − (1 − p)et 1 1−p Task: Use the moment-generating function to prove E(X) = and V ar(X) =. p p2 Tip: You may find the following useful: Recall: a + ar + ar2 + ar3 + · · · is a geometric series, for |r| < 1 the infinite sum is a + ar + a ar2 + ar3 + · · · =. 1−r Page 48 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example, Geometric: Let the random variable X be the number of items selected from a batch of manufactured product on a production line on which the first defective item is selected. Say the probability an item is defective is 0.12. a) Write down the probability distribution. State any assumptions you make. b) What is the probability the first defective item will be the fourth item selected from the batch? c) What is the probability the first defective item will be the tenth item selected from the batch? d) Calculate the expected value of this variable and interpret its meaning. Solution: a) X ∼ Geom(0.12). f (x)? Extra assumption needed? b) P (X = 4) = (1 − 0.12)3 0.12 = 0.0818 c) P (X = 10) = (1 − 0.12)9 0.12 = 0.038 d) E(X) = 1/p = 1/0.12 = 8.33. On average in repeated experiments you would expect the first defective item to be selected on the 8 31 rd trial. Memoryless Property The geometric distribution is the only discrete probability model with the memoryless property, where for any M and x, P (X > M + x | X > M ) = P (X > x). It is akin to the exponential distribution which is the only continuous random variable with this property. Using P (X > x) = (1 − p)x for x = 1, 2,... we have P (X>M +x ∩ X>M ) P (X > M + x | X > M ) = P (X>M ) P (X>M +x) = P (X>M ) (1−p)M +x = (1−p)M = (1 − p)x = P (X > x) So if you have already waited for more than 10 years, then the probability of waiting more than 30 years is the same as the probability of waiting the further 20 years so P (X > 30 | X > 10) = P (X > 20). Page 49 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4.5 Negative Binomial Distribution Let X be the number of independent trials until the kth success occurs, where each trial has probability of success p. When does a negative binomial distribution apply? Each trial has two possible outcomes - a success and a failure (a Bernoulli trial). The outcome observed for one trial is independent of outcomes of any other trial Each trial has the same probability of a success - let the probability of a success be p. The probability distribution function is x−1 k ! f (x) = P (X = x) = p (1 − p)x−k for x = k, k + 1, k + 2, · · · k−1 Notice that the sample space depends on k, similar to the binomial. In shorthand, write X ∼ N Binomial(k, p). Expectation and Variance: k E(X) = p k 1 V ar(X) = −1. p p k pet Moment-generating function: MX (t) = for t < − ln(1 − p). 1 − (1 − p)et The geometric distribution is a special case of the negative binomial distribution with k = 1, X ∼ Geom(p) ≡ N Binomial(k = 1, p). Task: Derive/discuss the form of the distribution function f (x) described above. Task: How else are the geometric and negative binomial related to each other? Page 50 ST1111 Lecture Worksheet C. Scarrott & E. Holian Example, Negative Binomial: Let X be the number of items selected from a batch of manufac- tured product on a production line on which the third defective item occurs. Say the probability an item is defective is 0.12. a) Write down the probability distribution. State any assumptions made? b) What is the probability the third defective item will be the third item selected from the batch? c) What is the probability the third defective item will be the tenth item selected from the batch? d) Calculate the expected value of this variable and interpret its meaning. Solution: a) X ∼ N B(k = 3, p =!0.12) for X ∈ {3, 4, 5,...} f (x)? Any extra assumption? 3−1 b) P (X = 3) = 0.123 (1 − 0.12)3−3 = 0.123 = 0.0017 3−1 10 − 1 9 ! ! c) P (X = 10) = 0.12 (1 − 0.12) 3 10−3 = 0.123 (1 − 0.12)7 = 0.0254 3−1 2 3 d) E(X) = = 25. On average in repeated experiments you would expect the third defective 0.12 item to occur on the 25th trial. Does this make sense give the result from the previous geometric example? Page 51 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4.6 Hypergeometric Distribution Let the random variable X = number of successes out of n trials drawn from a total of N in which there are a total of M successes. This is sampling without replacement. So the probability of success on the ith trial is conditional on the outcome of previous trials. When does a hypergeometric distribution apply? A fixed population of size N. The population contains two types of individuals, M successes and N − M failures. A sample of size n is selected from the population without replacement. If x is the number of successes observed in the n trials, then n − x is the number of failures observed in the n trials. The trials are dependent as the outcome of each one depends on what was drawn previously, due to the sampling without replacement. The probability of success is simply the proportion of successes left at each selection, as each individual outcome is equally likely to be selected. M N −M x n−x f (x) = P (X = x) = N for x = 0, 1, 2, · · · , min(n, M ). n Notice the restriction on the sample space. Why? In shorthand, write X ∼ Hypergeometric(N, M, n) Expectation and variance: nM M E(X) = =n N N nM (N − M )(N − n) M N −M N −n V ar(X) = =n N (N − 1) 2 N N N −1 Task: Derive/discuss the form of the distribution function f (x) described above. Task: How are the binomial and hypergeometric related to each other? Does E(X) make sense? Task: Discuss what happens as N → ∞ and the sample drawn is small (N > 10n) i.e. an infinite urn and a small sample! Example: An urn contains 30 black marbles and 20 red marbles. A sample of 10 marbles is selected at random from the urn. Marbles are not replaced before the next marble is selected. Let X be the number of black marbles that are in the sample of 10 marbles selected. a) Write down the probability distribution. f (x)? b) What is the probability that exactly 3 black marbles are selected? c) What is the probability that all 10 marbles selected are black? d) What is the probability that no marbles selected are black? e) What is that no more than 2 black marbles are selected? f) What is the expected value of this random variable? Interpret the meaning of this value. Page 52 ST1111 Lecture Worksheet C. Scarrott & E. Holian a) 30 20 x 10−x f (x) = P (X = x) = 50 for x = 0, 1, 2, · · · , 10. 10 X ∼ Hypergeometric(N = 50, M = 30, n = 10) 30 20 b) P (X = 3) = 3 50 7 10 30 20 c) P (X = 10) = 10 0 50 10 30 20 d) P (X = 0) = 0 10 50 10 e) P (X ≤ 2) = P (X = 0) + P (X = 1) + P (X = 2) (30)(20) (30)(20) (30)(20) = 0 50 10 + 1 50 9 + 2 50 8 (10) (10) (10) 30 f) E(X) = 10 = 6. In repeating the experiment, the number of black marbles in the sample of 50 10 marbles selected would be 6 on average. Does this makes sense? Example: A batch of 24 items is known to have 4 defective items. An inspector checks 6 items of the batch. What is the probability distribution for the number of defective items in the 6 inspected? What is the probability that none inspected will be defective? 4 20 x 6−x f (x) = P (X = x) = 24 for x = 0, 1, 2, 3, 4. 6 X ∼ Hypergeometric(N = 24, M = 4, n = 6). 4 20 4 20 0 6−0 f (0) = P (X = 0) = 24 = 0 6 24 = 0.288. 6 6 Page 53 ST1111 Lecture Worksheet C. Scarrott & E. Holian 4.7 Poisson Distribution Let the random variable X be the number of randomly occurring and independent events in an interval where the assumed average rate of events in that interval is λ. Number of customers arriving in 20 minutes Number of buses arriving at a stop in 30 minutes Number of times "trick or treat"ers ring your doorbell per hour on Halloween night When does a Poisson distribution apply ? Counting the number of the randomly occurring events. A "unit interval" is specified - e.g. in t