Summary

These lecture notes cover biostatistics topics focusing on descriptive statistics, including measures of central tendency (mode, median, mean) and measures of variation (range, variance, standard deviation, coefficient of variation).

Full Transcript

SDCE 303 : INTRODUCTION TO BIOSTATISTICS LECTURE NOTES 4 Descriptive statistics: Measures of Central Tendency & Dispersion JOETABS LECTURE NOTES 4 Objectives 1. To explain the common averages: The Mode The Median The Mea...

SDCE 303 : INTRODUCTION TO BIOSTATISTICS LECTURE NOTES 4 Descriptive statistics: Measures of Central Tendency & Dispersion JOETABS LECTURE NOTES 4 Objectives 1. To explain the common averages: The Mode The Median The Mean To explain common measures of variation: Range Variance Standard deviation Coefficient of variation LECTURE NOTES 4 Measures of Central tendency ❑ Measures of central tendency refer to a single value that summarises a set of data, i.e. it locates the centre of the values. A measure of central tendency is a measure which indicates where the middle of the data is. The three most commonly used measures of central tendency are: i. Mode ii. Median iii. Mean LECTURE NOTES 4 Mode The Mode is the value of the observation that appears most frequently. Data may have two modes. In this case we say the data are bimodal Observations with more than two modes are referred to as multimodal Some data may not have mode at all. If all values are different there is no mode ❑Properties of the mode: The mode is not unique LECTURE NOTES 4 ❑Mode from ungrouped data Ex 1. Find the mode of the following values: 8, 9,14, 9, 8, 7, 8, 10, 6, 9, 7, 8, 10, 14, 11, 8,14, 11 Soln: It is always good to organize data in an array 6,7,7,8,8,8,8,8,9,9,9,10,10,11,11,14,14,14 Since 8 occurred five times – a frequency larger than any other number – the mode for the data set is 8 ❑Bimodal Ex2. 6,7,7,8,8,8,8,8,9,9,9,10,10,11,11,14,14,14,14,14 Ans. Mode = 8 and 14 ❑No mode Ex3. 6, 6, 7,7,8,8, 9,9, 10,10,11,11,14,14 LECTURE NOTES 4 ❑Mode from a simple frequency distribution For discrete distributions, the mode is the value with the greatest frequency Ex. Table of marks of 18 students in a class Score 30 24 12 18 22 20 14 15 No. of students 1 4 4 3 1 2 2 1 Soln: Mode = 24 & 12 = bimodal LECTURE NOTES 4 ❑Mode from the Histogram Draw the histogram from the frequency distribution Use the highest modal class and from the x-axis determine the intersection of the lines as shown in the histogram Where the modal class appears at the extreme group, take the mean of the lower and upper boundaries LECTURE NOTES 4 ❑Using formula to determine mode for a group data Mode = L + D1/(D1 + D2) x C 1. Determine the modal class (the class with the largest frequency) 2. L = lower boundary of the modal class 3. Calculate D1 = difference between the largest frequency and the frequency immediately before it 4. Calculate D2 = difference between the largest frequency and the frequency immediately after it 5. C = modal class width (from class boundary) LECTURE NOTES 4 Example Age No 20-25 2 25-30 14 30-35 29 35-40 43 40-45 33 45-50 9 Mode = L + D1/(D1 + D2) x C Modal class = 35-40 L=35 D1= 43-29=14 D2=43-33=10 C=5 Mode = 35 + 14/(14+10) x 5 = 37.92 Mode = 38 LECTURE NOTES 4 Median ❑The Median is the middle observation of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. it is the value that divides the set of observations into two equal parts such that half of the data are before it, and the other half are after it ❑Determine the median of a set of values Re-arrange the numbers in ascending order If there is an even number of data in the array, the median is the average of the two middle numbers. If there is an odd number of data in the array, the median is the middle number, i.e. (n+1)/2th item. LECTURE NOTES 4 median Example, suppose you want to find the median for the following set of data: a. 21, 23, 26, 68, 69, 70, 73, 23, 24, b. 74, 66, 69, 68,73, 70 Ex 1 First, we arrange the data in an ordered array: 21, 23, 23, 24, 26, 68, 69, 70, 73 Since n=9 is an odd number, the median is by (9+1)/2th value = 5th Count the fifth value. The median = 26 Ex2. 66, 68, 69, 70, 73, 74 Since there is an even number of data, the average of the middle two numbers 69 and 70 = (69 + 70)/2 = 139/2 = 69.5 Median = 69.5 LECTURE NOTES 4 ❑Median from frequency table If n is odd then the median is the value that corresponds to (∑f+1)/2th item, ie. Total frequency + 1, all divide by 2 Ex 1. Find the median age of the data age frequency (f). 15 3 16 6 17 7 18 10 19 2 20 1 LECTURE NOTES 4 Solution Ex 1. age Freq. CF 15 3 3 16 6 9 17 7 16 18 10 26 19 2 28 20 1 29 The total frequency (∑f) is odd = 29 The median is ½ (∑f+1)th item = (29+1)/2 = 15th observation or item Add the frequencies starting from 3 until you get to the 15th item The 15th item is 17 LECTURE NOTES 4 ❑Median from frequency table … If n is even then the median is the mean of the two middle terms, i.e. (∑f)/2th item + the next item, all divide by 2 Ex 2. solution 1 Marks Freq (f) Marks Freq. CF Solution 2 ▪ The total frequency is even = 0 2 0 2 2 26 1 6 1 6 8 ▪ Half the total frequency is 2 4 2 4 12 26/2 = 13. 3 4 3 4 16 ▪ The median is the 13th and 14th items or observations 4 6 4 6 22 ▪ The 13th and 14th observations 5 2 5 2 24 are 3 and 3 6 2 6 2 26 ▪ Therefore the median score is ∑f=26 (3+3)/2 = 6/2=3 LECTURE NOTES 4 The median from cumulative curve The cumulative curve is used to estimate the actual median value from a grouped data From the cumulative curve, the median is the value on the x- axis corresponding to half the total frequency, i.e. (∑f)/2 Draw the cumulative curve using either the absolute cumulative or the % cumulative values The median is the 50% mark of the total frequency traced to the x-axis using the absolute values The median is the actual 50% mark traced to the x-axis using the % values LECTURE NOTES 4 Cumulative curve Cumm. Freq % or absolute Values 50% mark median variable LECTURE NOTES 4 ❑The properties of the median Uniqueness. For a given set of data there is one and only one median. It is not affected by extremely large or small values. Generally, the median provides a better measure of location than the mean when there are some extremely large or small observations LECTURE NOTES 4 Mean ❑The mean is the sum of all the values divided by the total number of values in a set of data Mean = the sum of all the values the number of values Population mean:  = X 1 + X 2 + X 3 +... + X N =  X N N The population mean is usually unknown, then we use the sample mean to estimate or approximate it Any measurable characteristic of a population is called a parameter. The mean of a population is a parameter. LECTURE NOTES 4 ❑Sample mean If the population is large and sample is taken, then we use the n Sample mean x = x i =1 i n n is the number of items in the sample Any measure based on sample data, is called a statistic. The mean of the sample is a statistic. Statistic is a characteristic of the sample Ex. e.g. Hb level of 6 pregnant women are: 12 12.5 11 13 12.5 8 Mean = 12+12.5+11+13+12.5+8 6 Mean = 11.5g LECTURE NOTES 4 Mean of a frequency distribution ❑Mean of Ungrouped data Mean = ∑x, ∑f where ∑ is sum, f is frequency, x represents individual values ❑ Mean for grouped data Mean = ∑fx, where x is the class mid-point value ∑f For a grouped data, we cannot know the actual values, so we only estimate Take the mid-point or mid-values of the interval LECTURE NOTES 4 Ex. Determine the mean age solution ages f ages f Class mid- fx point (x) 10-19 11 10-19 11 14.5 159.5 20-29 10 20-29 10 24.5 245 30-39 5 34.5 172.5 30-39 5 40-49 2 44.5 89 40-49 2 50-59 1 54.5 54.5 50-59 1 60-69 1 64.5 64.5 60-69 1 ∑f= 30 ∑fx= 786 Mean = ∑fx, = 786/30 = 26.2 = 26 ∑f LECTURE NOTES 4 FORMS OF THE MEAN A. Arithmetic Mean B. Geometric Mean C. Harmonic Mean D. Trimmed Mean LECTURE NOTES 4 Arithmetic Mean The arithmetic mean is most commonly used average. It is generally referred as the average or simply mean. The arithmetic mean or simply mean is defined as the value obtained by dividing the sum of values by their number or quantity. It is denoted as (read as X-bar). Therefore, the mean for the values X1, X2, X3,……….., Xn shall be denoted by. Following is the mathematical representation for the formula for the arithmetic mean or simply, the mean. LECTURE NOTES 4 For Example: 1. The arithmetic mean of the 2. The mean wage of 5 employees is values 5, 8, 10, 12 and 17 is ¢1000. If the wages of four employees are ¢800, ¢1200, ¢1300 and 900, find the wage of fifth employee. Here, n = 5 and = 1000 Sum of wages of 5 employees: ∑X = n , = 5(1000) = ¢5000 Sum of wages of 4 employees = 800 + 1200 + 13000 + 900 = ¢4200 Wages of the fifth employees = 5000 – 4200 = ¢800. LECTURE NOTES 4 Arithmetic Mean for Grouped Data The formula provided above is being used when the number of values is small. If the number of values is large, they are grouped into a frequency distribution. In case of grouped data when the data is arranged in the form of frequency distribution, all the values falling in a class are assumed to be equal to the class mark or midpoint. If the X1, X2, ……, Xk are the class marks with f1, f2, ….., fk as the corresponding class frequencies, the sum of the values in the first class would be f1X1, in the second class f2X2 and so on the sum of the values in kth class would be fkXk. Hence, the sum of the values in all the k classes would be f1X1 + f2X2 + ……. + fkXk = ∑fX The total number of values is the sum of the class frequencies, as follows; f1 + f2 + ……… + fk = ∑f LECTURE NOTES 4 Find the mean weight of 120 weight students at a university from the following frequency distribution Table 11 Weight (lb) Class Mark (X) Frequency (f) fX 110 – 119 114.5 1 114.5 120 – 129 124.5 4 498 130 – 139 134.5 17 2286.5 140 – 149 144.5 28 4046 150 – 159 154.5 25 3862.5 160 – 169 164.5 18 2961 170 – 179 174.5 13 2268.5 180 – 189 184.5 6 1107 190 – 199 194.5 5 972.5 200 – 209 204.5 2 409 210 -219 214.5 1 214.5 n = ∑f = 120 LECTURE NOTES 4 ∑fX = 18740 LECTURE NOTES 4 Weighted Arithmetic Mean When the values are not of equal importance, we assign them certain numerical values to express their relative importance. These numerical values are called weights. If X1, X2, ……, Xk have weights W1, W2, ……., W3, then the weighted arithmetic mean or the weighted mean, which is denoted as , is calculated by the following formula; LECTURE NOTES 4 For Example: The marks obtained by a student in English, Urdu and Statistics were 70, 76, and 82 respectively. Find the appropriate average if weights of 5, 4 and 3 are assigned to these subjects. We will use the weighted mean, the weights attached to the marks being 5, 4 and 3. Thus LECTURE NOTES 4 x = For a population with N data, the mean is N x x=  For a sample with n data, the mean is n In general, the mean is what we typically think of as the average.  LECTURE NOTES 4 Mark obtained Frequency fx x f 0 10 0 1 20 20 2 40 80 3 50 150 4 30 120 5 30 150 6 20 120 7 20 140 8 10 80 9 10 90 10 10 100 n = 250  f.x = 1 050 LECTURE NOTES 4 solution f. x Mean = = =x 1050/250 n = 4.2 Median =n+ 1/ 2 = 12/2 = 6th = 5 Mode = 3 LECTURE NOTES 4 FIND THE MEAN, MEDIAN AND MODE No. of No. of families Total no. of children f children x fx 0 12 0 1 15 15 2 5 10 3 2 6 4 1 4 n = 35 Σfx = 35 LECTURE NOTES 4 usage The arithmetic mean: The arithmetic mean is best used when the sum of the values is significant. For example, your grade in your statistics class. If you were to get 85 on the first test, 95 on the second test, and 90 on the third test, your average grade would be 90 LECTURE NOTES 4 PROPERTIES OF THE MEAN Influenced by every score. If a score is changed, the mean value changes. E.g. 3,4,2,4,7 mean =4 3,4,7,4,7 mean = 5 The mean is a function of the sum(aggregate) of the scores. This means that the number of observations multiplied by the mean gives the sum of the scores. E.g. 3,4,2,4,7 mean = 4 = 4(5) = 20 or 5(5) =25 LECTURE NOTES 4 PROPERTIES OF THE MEAN If the mean is subtracted from each individual score and the difference summed, the result is zero(0) E.g. 4,2,3,6,5 mean = 4 4-4=0 2-4 =-2 3-4 =-1 6-4 =2 5-4=1 0-2-1+2+1=0 LECTURE NOTES 4 PROPERTIES OF THE MEAN If the same value is added to or subtracted from every number I a set of scores, the mean goes up or down by the values of the number EXERCISE E.g. 4,2,3,6,5 mean = 4 Add 2 to all the numbers and find the mean If each score is multiplied or divided by the same value, the mean increases or decrease by the same value E.g. 4,2,3,6,5 mean = 4 Multiply by (2) mean = 8 LECTURE NOTES 4 USES OF THE MEAN Useful for statistical work, e.g. correlation, ANOVA etc. Useful for normal distribution data It is useful when an average is needed It provides a direction of performance when compared with other measures of location. When mean>Median, distribution is skewed to the right(positive skewness), when the mean< median, distribution is skewed to the left(negative skewness) It serves as standard of performance with which individual scores can be compared LECTURE NOTES 4 Relations Between the Measures of Central Tendency In symmetrical distributions, the median and mean are equal For normal distributions, mean = median = mode In positively skewed distributions, the mean is greater than the median In negatively skewed distributions, the mean is smaller than the LECTURE NOTES 4 median Measures of Shape: Symmetric or skewed Shape Left-Skewed Symmetric Right-Skewed Mean Median Mod Mean = Median = Mode Mode Median Mean e LECTURE NOTES 4 Example: CLASS WORK(10 MARKS) Ages of a random sample of 40 pro football players. 21,21,22,22,22,22,23,23,23,23 24,24,24,25,25,25,25,25,25,25 26,26,26,26,27,27,28,28,28,29 29,29,29,29,30,31,31,32,33,37 Note the sum of the data is 1050. Compute the a. mean, b. median c. mode of this data. PLEASE SHOW YOUR WORK!! LECTURE NOTES 4 Solution: The mean is 1050/40 = 26.25 The median is in the 20.5th position, and so the median is 25.5. The mode is the most common data which is 25 (occurs 7 times in list). LECTURE NOTES 4 Mark obtained Frequency fx x f 0 10 0 1 20 20 2 40 80 3 50 150 4 30 120 5 30 150 6 20 120 7 20 140 8 10 80 9 10 90 10 10 100 n = 250  f.x = 1 050 LECTURE NOTES 4 Mean = 4,2 Median = 4 Mode = 3 LECTURE NOTES 4 DISTRIBUTION OF DATA LECTURE NOTES 4 LEARNING OBJECTIVES At the end of this lesson, the student should be able to describe the general shape of a distribution in terms of its number of modes, skewness, and variation. LECTURE NOTES 4 Number of Modes One way to describe the shape of a distribution is by its number of peaks, or modes. Uniform distribution—has no mode because all data values have the same frequency. LECTURE NOTES 4 Any peak is considered a mode, even if all peaks do not have the same height. A distribution with a single peak is called a single-peaked, or unimodal, distribution. A distribution with two peaks, even though not the same size, is a bimodal distribution. What is the following distribution? LECTURE NOTES 4 Symmetry or Skewness A distribution is symmetric if its left half is a mirror image of its right half. A symmetric distribution with a single peak and a bell shape is known as a normal distribution. LECTURE NOTES 4 Symmetry or Skewness  A distribution is left-skewed (or negatively skewed) if the values are more spread out on the left, meaning that some low values are likely to be outliers.  A distribution is right skewed or positively skewed if the values are more spread out on the right. It has a tail pulled toward the right. LECTURE NOTES 4 What is the relationship between mean, median and mode for a normal distribution? Find the mean median and mode of: 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7 Mean is 4. Median is 4. Mode is 4. LECTURE NOTES 4 LECTURE NOTES 4 What is the relationship between mean, median and mode of a left- skewed distribution? Find the mean, median and mode of: 0, 5, 10, 20, 40, 45, 45, 50, 50, 50, 60, 60, 60, 60, 60, 60, 70, 70, 70, 70, 70, 70, 70, 70 The mean is 51.5. The median is 60. The mode is 70. LECTURE NOTES 4 LECTURE NOTES 4 What is the relationship between mean, median and mode of a right- skewed distribution? Find the mean, median, and mode of: 20, 20, 20, 20, 20, 20, 20, 20, 30, 30, 30, 30, 30, 30, 45, 45, 45, 50, 50, 60, 70, 90 The mean is 36.1. The median is 30. The mode is 20. LECTURE NOTES 4 LECTURE NOTES 4 For each of the following situations, state whether you expect the distributions to be symmetric, left-skewed or right-skewed. House prices in the United States. Weight in a sample of 30 year old men. The heights of all players in the NBA. LECTURE NOTES 4 Which is a better measure of “average” (or of the center of the distribution) for a skewed distribution: the median or the mean? Why? LECTURE NOTES 4 Copyright © 2009 Pearson Education, Inc. Variation Variation describes how widely data are spread out about the center of a data set.  How would you expect the variation to differ between times in a 5K city run and a 5K run in a state meet? LECTURE NOTES 4 To summarize-- The general shape of a distribution can be discussed using: 1. The number of modes 2. Symmetry or skewness 3. Variation. LECTURE NOTES 4 Dispersion …. ❑Note: 1. If all the values are the same → There is no dispersion. 2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion small. b) If the values are widely scattered → The Dispersion is greater. LECTURE NOTES 4 Measures of Dispersion…. ❑Measures of Dispersion include: 1.Range 2. Variance 3. Standard deviation 4. Coefficient of variation (C.V). 5. Z-Deviation (Normal Distribution) LECTURE NOTES 4 ❑The range This is the difference between the largest and smallest values in a set of data Range =Largest value- Smallest value Range of group frequency is: upper class boundary of the highest interval minus the lower class boundary of the lowest interval Ex. Ages of mothers at two ANC centres Hospital A: 18, 20,21, 21, 23,23 Range is (23 -18) = 5 Hospital B: 17, 17, 18, 20,25, 29 Range is (29-17) = 12 There is more spread in the ages of mothers at Hospital B than A The range is affected by extreme values e.g. 20,20,21,22,60. The 60 year old mother has distorted the spread LECTURE NOTES 4 ❑ The Variance The variance measures dispersion relative to the scatter of the values about the mean. The standard deviation and variance are measures of how much each individual observation (x) in the data set deviates from the mean The sample Variance n  i ( x − x ) 2 where, x is sample mean S 2 = i =1 n −1 LECTURE NOTES 4 ❑Population variance N  (x i − ) 2  2 = i =1 N where ,  is Population mean LECTURE NOTES 4 The Standard Deviation The standard deviation is the square root of variance = Varince a) Sample Standard Deviation = S = S 2 b) Population Standard Deviation = σ =  2 ❑ The larger the standard deviation of a sample, the more variable are the observations. Therefore, If the standard deviation is wide, the mean cannot be a true representation of the data set If the SD is small, then the mean will be more representative of all the observations the standard deviation gives a better accuracy than the variance, it contains the same unit of measurement as the observation LECTURE NOTES 4 ❑ Steps for computing the variance and standard deviation 1. Calculate the mean 2. Calculate (x-mean) 3. Square each ( x-mean)2 4. Find the sum of the results from step 3 5. Divide the sum of squared deviations in step 4, by (n-1) to obtain the variance 6. Find the square root of the variance in step 5, to obtain the standard deviation LECTURE NOTES 4 Ex. Find the variance and standard deviation of the following set: 2, 3,5,6,8 Solution: Mean = ∑x/n = 24/5 = 4.8 x X-mean ( x-mean)2 2 -2.8 7.84 s2 = ∑(x-mean)2 / (n-1) 3 -1.8 3.24 = 22.80/4 = 5.7 5 0.2 0.04 6 1.2 1.44 Stand Dev. = √s SD = √4.56 = 2.38 8 3.2 10.24 22.80 LECTURE NOTES 4 Ex. Variance & Standard deviation for a frequency distribution Parity 1 2 3 4 5 frequency 3 4 8 2 3 Solution Mean = ∑fx/ ∑f x f fx x-mean (x-mean)2 f(x-mean)2 Mean = 58/20 = 2.9 1 3 3 -1.9 3.61 10.83 2 4 8 -0.9 0.81 3.24 S2 = ∑f(x-mean)2 / ∑f-1 3 8 24 0.1 0.01 0.08 4 2 8 1.1 1.21 2.42 S2 = 29.80/19 = 1.56 5 3 15 2.1 4.41 13.23 ∑f =20 ∑fx=58 ∑f(x-mean)2 SD =√ (∑f(x-mean)2 / ∑f-1) = 29.80 = √ (29.80/19) = √1.49= 1.24 LECTURE NOTES 4 The Coefficient of Variation (C.V) ❑ The standard deviation is useful as a measure of variation within a given set of data. ❑The C.V is a measure we use to compare the dispersion in two sets of data which is independent S C.V = (100) X where X is Sample mean, S: sample standard deviation. LECTURE NOTES 4 EX. Suppose two samples of human males yield the following data: Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 pound Standard deviation 10 pound 10 pound We wish to know which is more variable, weight of 25 years or 11 years? Solution: S C.V = (100) X c.v (Sample1)= (10/145)*100= 6.9 % c.v (Sample2)= (10/80)*100= 12.5 % ✓ Then weight of 11-years old LECTURE (sample2) NOTES 4 is more variable Normal Distribution Normal Distribution - is a very important statistical data distribution pattern occurring in many natural phenomena, such as height, blood pressure, grades, IQ, baby birth weights, etc. Normal Curve - when graphing the normal distribution as a histogram, it will create a bell-shaped curve known as a normal curve. It is based on Probability! You’ll see! FINDING Z – SCORES & NORMAL DISTRIBUTION The z Distribution Normal Distribution Curve: What is this curve all about? The shape of the curve is bell-shaped The graph falls off evenly on either side of the mean. (symmetrical) 50% of the distribution lies on the left of the mean 50% lies to the right of the mean. (above) The spread of the normal distribution is controlled by the standard deviation. The mean and the median are the same in a normal distribution. (and even the mode) Features of Standard Normal Curve Mean is the center 68% of the area is within one S.D. 95% of area is within two S.D.’s 99% of area is within 3 S.D.’s As each tail increases/decreases, the graph approaches zero (y axis), but never equals zero on each end. For each of these problems we will need pull-out table IV in the back of text What is a Z – Score? Z-score’s allow us a method of converting, proportionally, a study sample to the whole population. Z-Score’s are the exact number of standard deviations that the given value is away from the mean of a NORMAL CURVE. Table always solves for the area to the left of the Z-Score! Z-SCORES & LOCATION IN A DISTRIBUTION Standardization or Putting scores on a test into a form that you can use to compare across tests. These scores become known as “standardized” scores. The purpose of z-scores, or standard scores, is to identify and describe the exact location of every score in a distribution z-score is the number of standard deviations a particular score is from the mean. (This is exactly what we’ve been doing for the last however many minutes!) z-SCORES The sign tells whether the score is located above (+) or below (-) the mean The number (magnitude) tells the distance between the score and the mean in terms of number of standard deviations USING Z-SCORES TO STANDARDIZE A DISTRIBUTION Shape doesn’t change (Think of it as re-labeling) Mean is always 0 SD is always 1 Why is the fact that the mean is 0 and the SD is 1 useful? standardized distribution is composed of scores that have been transformed to create predetermined values for m and s Standardized distributions are used to make dissimilar distributions comparable z-score formula x− z=  Where x represents an element of the data set, the mean is represented by  and standard deviation by . Analyzing the data Suppose an Exam scores among college students are normally distributed with a mean of 500 and a standard deviation of 100. If a student scores a 700, what would be her z-score? Analyzing the data Suppose SAT scores among college students are normally distributed with a mean of 500 and a standard deviation of 100. If a student scores a 700, what would be her z-score? 700 − 500 z= =2 100 Her z-score would be 2 which means her score is two standard deviations above the mean. Analyzing the data A set of math test scores has a mean of 70 and a standard deviation of 8. A set of English test scores has a mean of 74 and a standard deviation of 16. For which test would a score of 78 have a higher standing? Analyzing the data A set of math test scores has a mean of 70 and a standard deviation of 8. A set of English test scores has a mean of 74 and a standard deviation of 16. For which test would a score of 78 have a higher standing? To solve: Find the z-score for each test. 78-70 math z -score = =1 8 English z -score= 78-74 =.25 16 The math score would have the highest standing since it is 1 standard deviation above the mean while the English score is only.25 standard deviation above the mean. Analyzing the data What will be the miles per gallon for a Toyota Camry when the average mpg is 23, it has a z value of 1.5 and a standard deviation of 2? Analyzing the data What will be the miles per gallon for a Toyota Camry when the average mpg is 23, it has a z value of 1.5 and a standard deviation of 2? x− Using the formula for z-scores: z =  x − 23 1.5 = 3 = x − 23 x = 26 2 The Toyota Camry would be expected to use 26 mpg of gasoline. Analyzing the data A group of data with normal distribution has a mean of 45. If one element of the data is 60, will the z-score be positive or negative? Practice examples: For each of the following examples, Look for the words "normally distributed" in a question before using Table IV to solve them. Don’t forget - Table IV always solves for the area to the left of the Z-Score! 1. Find probabilities from zi 1.1 Using Published tables Most stats books have z-score tables which allow you to find Prob(z< zi ) Or sometimes they list Prob(0 1.36) 1. Draw curve 2. Work out what value in the tables will help you. 3. Compute the desired probability by manipulating the value from the tables. 95 e.g. Prob(z < 1.36) 1. Draw curve 2. Work out what value in the tables will help you. 3. Compute the desired probability by manipulating the value from the tables. 96 Table A: Standard Normal Probabilities Each entry in the body of Probability the table is the area under the standard normal curve to the left of z. z -1. 555 z z.00.01.02.03.04.05.06.07.08.09 -3.40.0003.0003.0003.0003.0003.0003.0003.0003.0003.0002 -3.30.0005.0005.0005.0004.0004.0004.0004.0004.0004.0003 -3.20.0007.0007.0006.0006.0006.0006.0006.0005.0005.0005 -3.10.0010.0009.0009.0009.0008.0008.0008.0008.0007.0007 -3.00.0013.0013.0013.0012.0012.0011.0011.0011.0010.0010 -2.90.0019.0018.0018.0017.0016.0016.0015.0015.0014.0014 -2.80.0026.0025.0024.0023.0023.0022.0021.0021.0020.0019 -2.70.0035.0034.0033.0032.0031.0030.0029.0028.0027.0026 -2.60.0047.0045.0044.0043.0041.0040.0039.0038.0037.0036 -2.50.0062.0060.0059.0057.0055.0054.0052.0051.0049.0048 -2.40.0082.0080.0078.0075.0073.0071.0069.0068.0066.0064 -2.30.0107.0104.0102.0099.0096.0094.0091.0089.0087.0084 -2.20.0139.0136.0132.0129.0125.0122.0119.0116.0113.0110 -2.10.0179.0174.0170.0166.0162.0158.0154.0150.0146.0143 -2.00.0228.0222.0217.0212.0207.0202.0197.0192.0188.0183 -1.90.0287.0281.0274.0268.0262.0256.0250.0244.0239.0233 -1.80.0359.0351.0344.0336.0329.0322.0314.0307.0301.0294 -1.70.0446.0436.0427.0418.0409.0401.0392.0384.0375.0367 -1.60.0548.0537.0526.0516.0505.0495.0485.0475.0465.0455 -1.50.0668.0655.0643.0630.0618.0606.0594.0582.0571.0559 -1.40.0808.0793.0778.0764.0749.0735.0721.0708.0694.0681 -1.30.0968.0951.0934.0918.0901.0885.0869.0853.0838.0823 -1.20.1151.1131.1112.1093.1075.1056.1038.1020.1003.0985 -1.10.1357.1335.1314.1292.1271.1251.1230.1210.1190.1170 -1.00.1587.1562.1539.1515.1492.1469.1446.1423.1401.1379 -.90.1841.1814.1788.1762.1736.1711.1685.1660.1635.1611 -.80.2119.2090.2061.2033.2005.1977.1949.1922.1894.1867 -.70.2420.2389.2358.2327.2296.2266.2236.2206.2177.2148 -.60.2743.2709.2676.2643.2611.2578.2546.2514.2483.2451 -.50.3085.3050.3015.2981.2946.2912.2877.2843.2810.2776 -.40.3446.3409.3372.3336.3300.3264.3228.3192.3156.3121 -.30.3821.3783.3745.3707.3669.3632.3594.3557.3520.3483 -.20.4207.4168.4129.4090.4052.4013.3974.3936.3897.3859 -.10.4602.4562.4522.4483.4443.4404.4364.4325.4286.4247 97 0.0.5000.4960.4920.4880.4840.4801.4761.4721.4681.4641 z.00.01.02.03.04.05.06.07.08.09 -3.40.0003.0003.0003.0003.0003.0003.0003.0003.0003.0002 -3.30.0005.0005.0005.0004.0004.0004.0004.0004.0004.0003 -3.20.0007.0007.0006.0006.0006.0006.0006.0005.0005.0005 -3.10.0010.0009.0009.0009.0008.0008.0008.0008.0007.0007 -3.00.0013.0013.0013.0012.0012.0011.0011.0011.0010.0010 -2.90.0019.0018.0018.0017.0016.0016.0015.0015.0014.0014 -2.80.0026.0025.0024.0023.0023.0022.0021.0021.0020.0019 -2.70.0035.0034.0033.0032.0031.0030.0029.0028.0027.0026 -2.60.0047.0045.0044.0043.0041.0040.0039.0038.0037.0036 -2.50.0062.0060.0059.0057.0055.0054.0052.0051.0049.0048 -2.40.0082.0080.0078.0075.0073.0071.0069.0068.0066.0064 -2.30.0107.0104.0102.0099.0096.0094.0091.0089.0087.0084 -2.20.0139.0136.0132.0129.0125.0122.0119.0116.0113.0110 -2.10.0179.0174.0170.0166.0162.0158.0154.0150.0146.0143 -2.00.0228.0222.0217.0212.0207.0202.0197.0192.0188.0183 -1.90.0287.0281.0274.0268.0262.0256.0250.0244.0239.0233 -1.80.0359.0351.0344.0336.0329.0322.0314.0307.0301.0294 -1.70.0446.0436.0427.0418.0409.0401.0392.0384.0375.0367 -1.60.0548.0537.0526.0516.0505.0495.0485.0475.0465.0455 -1.50.0668.0655.0643.0630.0618.0606.0594.0582.0571.0559 -1.40.0808.0793.0778.0764.0749.0735.0721.0708.0694.0681 -1.30.0968.0951.0934.0918.0901.0885.0869.0853.0838.0823 -1.20.1151.1131.1112.1093.1075.1056.1038.1020.1003.0985 -1.10.1357.1335.1314.1292.1271.1251.1230.1210.1190.1170 -1.00.1587.1562.1539.1515.1492.1469.1446.1423.1401.1379 -.90.1841.1814.1788.1762.1736.1711.1685.1660.1635.1611 -.80.2119.2090.2061.2033.2005.1977.1949.1922.1894.1867 -.70.2420.2389.2358.2327.2296.2266.2236.2206.2177.2148 -.60.2743.2709.2676.2643.2611.2578.2546.2514.2483.2451 -.50.3085.3050.3015.2981.2946.2912.2877.2843.2810.2776 -.40.3446.3409.3372.3336.3300.3264.3228.3192.3156.3121 -.30.3821.3783.3745.3707.3669.3632.3594.3557.3520.3483 -.20.4207.4168.4129.4090.4052.4013.3974.3936.3897.3859 -.10.4602.4562.4522.4483.4443.4404.4364.4325.4286.4247 0.0.5000.4960.4920.4880.4840.4801.4761.4721.4681.4641 98 pz_gt_zi calculates the probability that z is greater than zi: e.g. pz_gt_zi (-2.897). will result in the following output: Prob(z > zi) for a given zi ZI PROB -2.89700.99812 which says that 99.812% of z lie above –2.897. 99 TRY( CLASS EXERCISE) Ex (7) – This week gas prices followed a normal distribution curve and averaged $3.71 per gallon, with a standard deviation of 3 cents. (a) What percent of stations charge at least $3.77? (b) What percent will charge less than $3.20? (c) What percent will charge in-between $3.30 and $4.9 per gallon? (d) If I sampled 30 gas stations, how many would charge between $4.9 and $3.65 per gallon? Finding Probabilities The shaded area under the curve is equal to the probability of the specific event occurring. Ex (4) - A shoe manufacturer collected data regarding men's shoe sizes and found that the distribution of sizes exactly fits a normal curve. If the mean shoe size is 11 and the standard deviation is 1.5. (a)What is the probability of randomly selecting a man with a shoe size smaller than 9.5? (b)If I surveyed 40 men, how many would be expected to wear smaller than 9.5? How did we get that answer: xi −  9.5 − 11 This is how Z= Z= = −1 many SD’s  1.5 from the mean -1.00 is a Z-score (# of S.D.’s from the mean) that refers to the area to the left of that position. Find it in Table IV. -1.00 =.1587 We want the area to the left of that curve, so, this is the answer. Table IV gives us the answer for area to the left of the curve. (b).1587(40) = 6.3 = 6 Ex (5) – Gas mileage of vehicles follows a normal curve. A Ford Escape claims to get 25 mpg highway, with a standard deviation of 1.6 mpg. A Ford Escape is selected at random. (a) What is the probability that it will get more than 28 mpg? (b) If I sampled 250 Ford Escapes, how many would I expect to get more than 28 mpg? How did we get that answer: xi −  Z= 28 − 25 = 1.875 Z=  1.6 1.875 is a Z-score (# of S.D.’s from the mean) that refers to the area to the left of that position. 1.875 =.9696 We want the area to the right of that curve, thus 1-.9696 =.0304 reference Biostatistics. A foundation for analysis in the health sciences. By Wayne W. Daniels, Chad N. Cross LECTURE NOTES 4

Use Quizgecko on...
Browser
Browser