Normal Distributions 2021 PDF

Document Details

GoldChupacabra

Uploaded by GoldChupacabra

Tags

normal distributions probability statistics mathematics

Summary

This document explains normal distributions, including their characteristics and how to calculate z-scores, probabilities, and percentiles. It provides examples of applying these concepts to real-world scenarios, such as calculating time spent playing outdoors by primary school-age children.

Full Transcript

NORMAL DISTRIBUTIONS Learning Outcomes: State the main characteristics of the Normal Distributions. Calculate and interpret z-scores. Use the Empirical Rules to estimate probabilities or proportions when it is appropriate. Calculate probabilities and percentiles of a given Normal Distributio...

NORMAL DISTRIBUTIONS Learning Outcomes: State the main characteristics of the Normal Distributions. Calculate and interpret z-scores. Use the Empirical Rules to estimate probabilities or proportions when it is appropriate. Calculate probabilities and percentiles of a given Normal Distribution. A data set is said to be Normally distributed if its theoretical density curve can be described by the function (which is also known as the Normal curve, or bell curve): ( ( ) ) for all real x 2 1 1 x−μ f ( x) = exp − σ √2 π 2 σ The graph of the function is shown below: Characteristics of Normal curve: In theory it extends from −∞ to ∞. This means the graph continues without bound to the left and to the right. It asymptotically approaches 0 as x approaches −∞ or ∞. That is, as the graph extends without bound to the right and to the left, it gets closer and closer to 0 without ever touching it. It has mean μ and standard deviation σ.  It is completely determined by its mean μ and standard deviation σ. That is, there can only be one Normal curve, once the mean and standard deviation are specified. It is symmetric about the mean μ. This means that the mean of the distribution is equal to the median. That is, half of the distribution lies to the left of the mean, and half to the right. The total area under the curve is 1. A Normal distribution with mean 0 and standard deviation 1 is called the Standard Normal distribution. While many real-life distributions can be approximated by the Normal distribution, none fits it exactly. (Why not?) p. 1/6 The Empirical Rules listed below can be used as a rough guide in calculating Normal distribution probabilities/proportions: Approximately 68% of the area under the Normal curve lie within 1 SD of the mean Approximately 95% of the area under the Normal curve lie within 2 SD’s of the mean Approximately 99.8% (i.e., almost all) of the area under the Normal curve lie within 3 SD’s of the mean These are often used to quickly approximate proportion of data that are at least approximately Normally distributed. Calculating Normal Probabilities/Proportions and Finding Intervals or Percentiles: Example 1: Suppose that the time spent playing outdoor among primary school age children in a certain population is Normally distributed with mean 1.70 hr/day and standard deviation 0.45 hr/day. a. What is the median of the distribution? b. Fill in the blanks: Almost all primary school age children in this population spend between _____ and _____ hrs/day playing outdoors. (Use the Empirical Rule). c. Calculate the Interquartile Range (IQR) of the distribution, and interpret what the number means in the context of the problem. d. Calculate the 80th percentile of the distribution and interpret what the number means in the context of the problem. Solution: a. Since the Normal distribution is symmetric, the median is equal to the mean, which is 1.70 hrs. Note the median is also called the 50th percentile (or the 2nd quartile = Q2.). Fifty percent of the primary school age children in this population spend less than 1.70 hrs/day playing outdoor, and the other 50% spend more. b. According to the Empirical Rule, almost all (about 99.73%) observations in a Normal distribution will lie within 3 standard deviations for the mean. Since the time spent playing outdoors is Normal with mean μ=1.70 and standard deviation σ = 0.45, we can say nearly all primary school age children in this population spend between 1.70 – 3(0.45) = 0.35 and 1.70 + 3(0.45) = 3.05 hrs/day playing outdoors. Note: the Empirical Rules should be use only if the distribution in question is (at least approximately) Normal. c. To calculate the IQR we need Q1, the first quartile, or the 25th percentile, and Q3, the 3rd quartile or the 75th percentile. To find the first quartile (= 25th percentile), we first find the Z-score so that 25% of the Standard Normal distribution lies to the left of it. From the Normal Table we obtain z = -0.67 (See diagram below): p. 2/6 Note that the desired left tail area is not exactly 0.25. The z-table list z- scores only to 2 decimal places. Since z = -0.67, this means Q1 lies 0.67 standard deviation below the mean. So Q1 = 1.70 – 0.67(0.45) = 1.40 Similarly we find the 3rd quartile (= the 75th percentile), by first finding the z-score in the Normal table so that 75% of the distribution lies to the left of it. From the z-table we get z = 0.67 Since z= 0.67, this means the 75th percentile of the distribution is 0.67 standard deviation above the mean. So Q3 = 1.70 + 0.67(0.45) = 2.00 Now we can calculate the IQR as Q3 – Q1 = 2.00 – 1.40 = 0.60. This means that middle half of primary school age children in this population has a range of 0.60 hrs/day (from 1.40 to 2.00) in the time they spend playing outdoors. d. To find the 80th percentile of the distribution we first find the z-score corresponding to the 80th percentile of the Standard Normal (z) distribution. From the Normal table we get z = 0.84. See diagram. 0.84 standard deviation above the mean, or 1.70 + 0.84(0.45) = 2.08. This means that about 80% of primary school age children in this population spend less than 2.08 hrs/day playing outdoors (and about 20% spend more Since z=0.84, this means that the 80th percentile lies than 2.08 hrs/day). Alternatively we can use the formula: X = μ + z(σ) = 1.70 + 0.84(0.45) = 2.08 Example 2: According to reports, starting salaries for college graduates in a certain city are Normally distributed with mean $61,000 and standard deviation $4,000. a. What percentage of college graduates have starting salaries greater than $70,000? b. If you select a college graduate at random, what is the probability that his/her starting salary is between $60,000 and $66,000. c. How much must a graduate earn to be in the top 6%?. That is, find a salary that is larger than 94% of the other salaries, or the 94th percentile of the distribution) p. 3/6 Solution: a. First, find the z-score, i.e., how many As probability or percentage is represented by area under standard deviations is $70,000 above or the curve. See diagram. This is our answer. below the mean of $61,000. 70,000 −61,000 z= = 2.25. Now, use the 4,000 Standard Normal probability table (z table) to find the area under the curve to the right of 2.25. The table gives us 1- 0.9878 = 0.0122. b. First let’s find the z scores corresponding From the Normal Probability table we find the desired to $60,000 and $66,000. And we find area to be 0.4931. (See diagram) 60,000−61,000 that: z = = −0.25 and 4,000 66,000−61,000 z= = 1.25. Now we can 4,000 think of the question as follows: what is probability that a randomly selected observation from a Normally distributed population will lie between 0.25 below and 1.25 standard deviations above the mean. About 49% of new graduates have starting salaries between $60,000 and $66,000. c. Here we want to find reverse Normal probability, or percentile. i.e., we are given the probability/percentage (6%), and we want to find the z-score corresponding to it, that is a z-score such that the area under the Normal curve that lies to the left of it is 0.94. From the Normal Probability Table we find the z-score corresponding to the desired probability is 1.55. i.e., a graduate must be make at least 1.55 standard deviations above the mean salary, or 61,000 + 1.55(4,000) = $67,200. That is, the 94th percentile of the salary distribution is $67,200. Note that the right tail area we obtain from the table is 0.0605, not exactly 6%. This is due to the fact that the z table calculates z- scores only to 2 decimal places. The exact z-score should be somewhere between 1.55. and 1.56. p. 4/6 STANDARD NORMAL TABLE (Cumulative area to the left of NEGATIVE z-values are shown in this table.) p. 5/6 STANDARD NORMAL TABLE (Cumulative area to the left of POSITIVE z-values are shown in this table.) p. 6/6

Use Quizgecko on...
Browser
Browser