Normal Distribution PDF & CDF (5th Seminar)
Document Details
Uploaded by ComfortingAestheticism
University of Debrecen Faculty of Medicine
Tags
Summary
This document provides notes on continuous random variables, probability density function (PDF), cumulative distribution function (CDF), and normal distribution. It includes definitions, characteristics, and examples.
Full Transcript
5th seminar ▪ Continuous random variables. Probability density function and cumulative distribution function ▪ Normal (Gaussian) distribution Definition Normal density- and distribution function ▪ Standard normal distribution ▪ Examples Week 6...
5th seminar ▪ Continuous random variables. Probability density function and cumulative distribution function ▪ Normal (Gaussian) distribution Definition Normal density- and distribution function ▪ Standard normal distribution ▪ Examples Week 6 Continuous random variables ❖ Density function (PDF). Formally, if ξ is a continuous random variable, then it has a probability density function ƒ(x), whose integrate between two limits (a and b) gives the probability of ξ takes a value between a and b. Thus, the probability of a continuous random variable to assume values between a and b. b P ( a b ) = f ( x ) dx a 80 Characteristics: P ( 60 80 ) = Graphical interpretation: f ( x ) dx the total area under the curve is 1 60 the probability of any specific value is 0! P (60 ≤ x ≤ 80) density function a b random variable Continuous random variables II. ❖ Cumulative distribution function (CDF). The cumulative distribution function F(x) of a random variable (ξ) at x represents the probability that the random variable takes a value less than or equal to x. F(x) 1.0 median = 50 0.8 0.9786 P (ξ = 60) = 0 !!!! Cumulative frequency 0.6 0.7577 0.5 0.4 0.2 0.0 0 20 4050 60 80 100 x P (ξ ≤ 60) = 0.7577 P (60 ≤ ξ ≤ 80) = 0.9786 – 0.7577 = 0.2209 P (ξ ≤ 80) = 0.9786 Normal distribution. ❖ Definition. Characteristics. ( x− )2 1 − The normal density is given by: f ( x) = e 2 2 2 The graph of the normal distribution produces the familiar bell-shaped curve The two parameters of the distribution are the mean (), and the standard deviation () μ identifies the position of the center (peak) of the distribution along the x-axis, while determines the degree of flatness or peakedness (height and width) of the graph of the distribution because of the characteristics of these two parameters, is often referred to as a location parameter and is often referred to as a shape parameter importance: many biologically relevant parameters (weight, height, blood pressure) can be modeled through or approximated by the normal distribution characteristics: it has only one peak it is symmetrical about its mean () takes on values between -∞ and +∞ the maximum of the curve is at x = mean, median and mode are all equal Normal distribution II. ❖ Normal density- and distribution function. density function a different normal distribution is specified for each different value of and different values of shift the graph of the distribution along the x-axis f(x) a large standard deviation creates a bell that is short and wide, while a small standard deviation creates a tall and narrow curve F(x) distribution function the distribution function takes a value of 0.5 at the mean 0.5 the distribution function approaches 1 as x approaches infinity 0 μ Standard normal distribution ❖ Definition. The normal distribution is really a family of distributions in which one member is distinguished from another on the basis of the values of and The most important member of this family is the standard normal distribution or unit normal distribution, as it is sometimes called, because it has a mean of 0 and a standard deviation of 1. Any kind of normal distribution can be converted to standard normal distribution with the following formula: xi − standardization process zi = (z-transformation) Subtract the expected value (μ) of the original normal distribution from the measured xi values, and then this difference must be divided by the standard deviation (σ) of the original distribution. The distribution of the resulted z values will be standard normal distribution. Standard normal distribution II. ❖ Standard normal density- and distribution function. density function distribution function φ(x) standard deviation if we erect perpendiculars a distance of 1 standard deviation from the mean in both directions, the area enclosed by these perpendiculars, the x-axis, and the curve will be approximately 68 percent of the total area. if we extend these lateral boundaries a distance of two standard deviations on either side of the mean, approximately 95 percent of the area will be enclosed Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. a) What is the probability that a randomly chosen individual will have a brain volume smaller than 1450 cm3? Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. a) What is the probability that a randomly chosen individual will have a brain volume smaller than 1450 cm3? Solution a) the following probability has to be found: P ( x 1450 ) this probability is given by the area under the curve of the probability distribution function (PDF) of the normal distribution specified in the question (mean and SD of 1400 and 125, respectively) between -∞ and 1450 (this is shown by the shaded area in the graph below) this distribution has to be standardized using the following equation: x − 1450 − 1400 z= = = 0.4 125 therefore we can convert the problem to be solved to the following equation: 1450 − 1400 P ( x 1450 ) = P z = P ( z 0.4 ) 125 brain volume (cm3) Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. a) What is the probability that a randomly chosen individual will have a brain volume smaller than 1450 cm3? Solution a) this probability corresponds to the shaded area under the curve (between -∞ and 0.4) of the PDF of the standard normal distribution (see graph) this probability can be found by using the table displaying the values of the cumulative distribution function of the standard normal distribution 0.6554 P ( z 0.4 ) = ( 0.4 ) = 0.6554 conclusion: the probability that a randomly chosen individual will have a brain volume less than 1450 cm3 is 0.6554. Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. b) What is the probability that a randomly chosen individual will have a brain volume larger than 1450 cm3? Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. b) What is the probability that a randomly chosen individual will have a brain volume larger than 1450 cm3? Solution b) the total area under the curve of any PDF between -∞ and +∞ is one. In other words the sum of the areas of the shaded and white areas in the following graph is one P ( x 1450 ) + P ( x 1450 ) = 1 P ( z 0.4 ) + P ( z 0.4 ) = 1 simply rearranging the above equations which results: P ( z 0.4 ) = 1 − P ( z 0.4 ) = 1 − ( 0.4 ) = 1 − 0.6554 P ( z 0.4 ) = 0.3446 0.6554 0.3446 Conclusion: the probability that a randomly chosen individual will have a brain volume larger than 1450 cm3 is 0.3446 Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. c) What is the probability that a randomly chosen individual will have a brain volume between 1300 cm3 and 1525 cm3? Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. c) What is the probability that a randomly chosen individual will have a brain volume between 1300 cm3 and 1525 cm3? Solution c) the following equation shows the probability we are looking for: P (1300 x 1525) this probability is identical to the black area in the left‐side graph below which is, in turn, the difference between the area of the grey‐shaded part and the hatched part on the right: P (1300 x 1525 ) = P ( x 1525 ) − P ( x 1300 ) brain volume (cm3) brain volume (cm3) Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. c) What is the probability that a randomly chosen individual will have a brain volume between 1300 cm3 and 1525 cm3? Solution c) now, the brain volumes of 1300 cm3 and 1500 cm3 have to be standardized: 1300 − 1400 1525 − 1400 z1 = = −0.8 z2 = =1 125 125 the questionable probability can be written according to the following equation: P ( −0.8 z 1) = P ( z 1) − P ( z −0.8 ) by turning to the table of the standard normal distribution: P ( −0.8 z 1) = P ( z 1) − P ( z −0.8 ) = (1) − ( −0.8 ) = 0.8413 − 0.2119 = 0.6294 conclusion: the probability that the brain volume of a randomly chosen individual is between 1525 cm3 and 1300 cm3 is 0.6294. Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. d) What is that value compared to which the brain volume of 20% of the population is lower? Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. d) What is that value compared to which the brain volume of 20% of the population is lower? Solution d) The area on the left-side of the unknown brain volume (designated by X in the graph below) under the PDF of the given normal distribution is 0.2 according to the problem, since 20% of the population has smaller brain volume than this unknown value: P ( z X z ) = 0.2 (where Xz is the standardized z value corresponding to X) brain volume (cm3) Example In a population brain volume is distributed according to the normal distribution with a mean value of 1400 cm3 and an SD of 125 cm3. d) What is that value compared to which the brain volume of 20% of the population is lower? Solution d) From the table of the standard normal distribution find that Xz value where the cumulative distribution function of the standard normal distribution assumes a value of 0.2. This turns out to be ~ -0.84. now we can write the standardization equation and solve for the unknown brain volume, x: x − 1400 −0.84 = x = −125 0.84 + 1400 = 1295 125 conclusion: the brain volume compared to which 20% of the population has smaller brain is 1295 cm3. Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! hocj thuoc cach trinh bay Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! Solution a) The probability that the time (t) spent watching TV is less than 25 hours per week and the equation containing the corresponding standardized z value can be written as follows: 25 − 20.5 P ( t 25) = P z = P ( z 0.82 ) 5.5 By consulting the table of the standard normal distribution we find that: P ( z 0.82 ) = 0.7939 Conclusion: the probability that a middle-school student watches TV less than 25 hours per week is 79.39%. Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! Solution b) By recognizing that the total area under the curve of the normal PDF is unity, we can write the following equation: P ( t 30 ) = 1 − P ( t 30 ) We want to find the probability P(t > 30), but we can only find the probability P(t < 30) in the table after standardization. Now let us calculate the standardized z value and look it up the table of the standard normal distribution: 30 − 20.5 P ( t 30 ) = 1 − P ( t 30 ) = 1 − P z = 1 − P ( z 1.73 ) = 1 − 0.9582 = 0.0418 5.5 Conclusion: the probability that a randomly chosen middle-school children watches TV more than 30 hours per week is 4.18%. Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV! Solution c) P ( 25 t 30 ) = P ( t 30 ) − P ( t 25) = 0.9582 − 0.7939 = 0.1643 Conclusion: the probability that a middle-school student watches TV between 25 and 30 hours per week is 16.43%. Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV. Example 2. The time spent watching TV per week by middle-school students has a normal distribution with a mean of 20.5 hours and a standard deviation of 5.5 hours. Find the percent a) who watch TV less than 25 hours per week b) who watch more than 30 hours per week c) who watch between 25 and 30 hours per week. d) Find the 90th percentile for the time spent watching TV. Solution d) we have to find the 90th percentile of the time spent watching TV, i.e. we have to find the time (T) for which the following equation holds: P ( t T ) = 0.9 T − 20.5 let us standardize T: zT = 5.5 we have to find the 90th percentile of the time spent watching TV, i.e. we have to find the time (T) for which the following equation holds: the 90th percentile of the standard normal distribution is approximately 1.28. By substituting this value into the equation above the table: x − 20.5 1.28 = x = 1.28 5.5 + 20.5 = 27.54 5.5 conclusion: the 90th percentile of the time spent watching TV is 27.54 hours. Example 3. In a population the hemoglobin (Hb) level of adult men follows normal distribution with a mean of 155 g/l and standard deviation of 20 g/l, respectively. a) Find the probability that a randomly choosen man will have a Hb level between 138 and 152 g/l? b) What is that Hb level of 15% of the population is greater? c) What is the probability that an individual will have a Hb level of exactly 170 g/l? Example 3. In a population the hemoglobin (Hb) level of adult men follows normal distribution with a mean of 155 g/l and standard deviation of 20 g/l, respectively. a) Find the probability that a randomly choosen man will have a Hb level between 138 and 152 g/l? b) What is that Hb level of 15% of the population is greater? c) What is the probability that an individual will have a Hb level of exactly 170 g/l? Solution 152 − 155 138 − 155 a.) P (138 x 152 ) = P ( x 152 ) − P ( x 138 ) = P z − P z = P ( z –0.15 ) − P ( z –0.85 ) 20 20 P ( z –0.15 ) − P ( z –0.85 ) = 0.4404 − 0.1977 = 0.2427 b.) we have to find the 85th percentile of the standard normal distribution (where the cumulative distribution function of the standard normal distribution assumes a value of 0.85. This turns out to be ~ 1.04) P ( z X z ) = 0.85 X z = 1.04 x − 155 by substituting this value into the standardization formula: 1.04 = x = 1.04 20 + 155 = 175.8 20 conclusion: 15% of the adult men have greater Hb level than 175.8 g/l. c.) P=0, since that is a continuous random variable