Normal Distributions, Z-Scores, and the Unit Normal Table (z table) PDF

Summary

These notes cover normal distributions, z-scores, and the unit normal table (z-table). They describe how to interpret a single score within a distribution. The notes also cover the concept of how to compare scores from different distributions.

Full Transcript

Normal distributions, The Standard Normal Distribution Z scores, and the Unit Normal Table (z table) Putting Today’s Material in Context What we’ve done before: l Described an entire set of scores using measures of central tendency and variability. What we’re doing now: l Learning more...

Normal distributions, The Standard Normal Distribution Z scores, and the Unit Normal Table (z table) Putting Today’s Material in Context What we’ve done before: l Described an entire set of scores using measures of central tendency and variability. What we’re doing now: l Learning more about normal distributions. l Learning about the relationships between the mean, the standard deviation and normal distributions. l Interpreting a single score (or a chunk of scores) within a distribution using z scores. note: we are still talking about descriptive statistics. It turns out that a lot of data (populations and samples) are normally distributed. Today, we’ll dig into what that means and why it matters so much for statistics. ~11,000 participants 60 90 120 150 Finish time - minutes BIG PICTURE: In this course we will do lots of fun things with normally distributed data. First, though, we need to learn about important properties of normal distributions Why are many variables normally distributed? l Each score in a distribution is influenced by a number of factors, and each factor is random. Runner time is influence by: age, gender, experience, fitness level, state of mind, weather conditions l The combination of these random factors makes the scores that are closest in value to the center of the distribution most common. Frequency l And the opposite is also true: extreme scores (e.g., scores far from the mean) are unlikely. l Question: Why are extreme scores unlikely? l Answer: very few people have ALL of the random factors “going their way” or ‘going against them” 𝑥̅ An unusual race score: lots of experience, high fitness Left and right level, in perfect state of mind, excellent weather. halves are mirror images l When a score is not a middle score, there are equal odds that random factors cause the score to be either above or below the mean, making the curve symmetrical. More key properties of normal distributions 50% (i.e.,.5000) 50% (i.e.,.5000) of scores are located of scores are located ABOVE the mean Frequency BELOW the mean LAMAL: 4-digit numbers Theoretical, but many variables are approximately normally distributed. Bell-shaped: mean = median = mode (and all are located at the 50th percentile). AUC = 1. [1.0 =.5000 scores are above &.5000 scores are below the mean). Symmetrical (equal chance random factors increase or decrease scores from the mean of the distribution). The mean can equal any value. (20, -5, 0.45, etc.) The standard deviation can be any positive number. (2, 1.33, 0.07, etc.) The curve has no upper or lower limits (e.g., tails are asymptotic - they don’t touch the x-axis). Scores closer to the mean are more likely than scores further from the mean. In a SPECIAL distribution, the Standard Normal Distribution, the mean is 0 and the standard deviation is 1 (too be continued...) OBSERVATION: In a normal distribution, the mean can be ANY positive or negative number, and the standard deviation can be any positive number. PROBLEM: As statisticians, one of our main activities is to find the probability (likelihood) that a particular score or sample mean will occur. With an infinite number of normal distributions, that task is very, very hard. SOLUTION: Statisticians, found the area under ONE NORMAL CURVE called “THE STANDARD NORMAL DISTRIBUTION” and figured out the exact proportions of scores that are located within each segment of the curve. We use “the standard normal transformation” (or z transformation) to convert raw scores into z scores. TRANSLATION: We can take any set of normally distributed raw scores and convert them into z scores. When raw data are normally X−X# distributed, the proportions of scores in each area of the curve of the Z score = transformed data will be the same as the proportions in the SND. s DETAILS: 1. The standard normal distribution (or z distribution) has a Q: What are all mean (!𝒙) of 0 and a standard deviation (s) of 1. these four digit numbers? 2. Scores on the x-axis in the standard normal distribution are called z scores. 3. We will practice converting raw scores (x) into z scores (using the z transformation) so that we can interpret the meaning of a single score (or a chunk of scores) within a distribution. (Is the score typical? Is it extreme?) 𝑥̅ Z Score Formula Sample: X−X# s Z= z score s x X Population: X−𝜇 s Z= € z score σ x µ Definition of a z score: how many standard deviations away from the mean a particular raw score falls – expressed in standard deviation units. Big picture: You are not trying to change the meaning of the raw score, you are converting each raw score into a different format (“a z score”). SIDE NOTE about when working with data that are not normally distributed: If your raw data are skewed, they will stay skewed even after you convert the raw scores. We call these “standard scores” instead of z scores. Keep in mind: The standard normal distribution is used to determine the probability of a certain outcome in relation to all other outcomes. After we learn more about the properties of normal distributions and z scores, we’ll use the standard normal distribution to answer these types of questions: l What percentage of scores lies at or above a z score of 2.3? l What proportion of scores falls between a z of -2.4 and a z of 0.70? Solve this problem two different ways. l If the mean on a test was 82, the standard deviation was 5, and you knew that your z score was 2.30, how do you figure out your raw score on the test? With our big goal in mind, now let’s learn more about normal distributions… Last lecture we learned about 𝑥̅ s the standard deviation (s): 𝑥̅ It represents the average amount that scores in a distribution deviate from the mean (𝑥). ̅ 0 10 20 30 40 50 60 70 80 90 100 Anxiety score (x) But the standard deviation is even more informative for data that are NORMALLY DISTRIBUTED: First notice the x-axis on this distribution. 𝑥̅ It displays raw scores (0 – 24) and also standard deviations from the mean (-3 to 3) This example: A normal distribution displaying the number of feet Let’s replace the three normal distributions 𝑥̅ that students moved from a podium when making a class presentation. with a single normal distribution to explain s the next few concepts. 𝑥̅ Now let’s bring those proportions back… 0 10 20 30 40 50 60 70 80 90 100 In this example, it For normal distributions with any mean would extremely and any variance, we can make the In this example, unusual/unlikely to see a score of 25 following three statements: ~95% of scores in this data set. will be between 4 and 20 feet. The Empirical Rule (aka: the 68, 95, 99.7 rule) Approx. 68% of all scores lie within 1 sd of the mean 𝑥̅ Approx. 95% of all scores lie within 2 sd of the mean Approx. 99.7% of all scores lie within 3 sd of the mean Let’s add up the proportion of scores in a normal distribution that fall within 2 standard deviations of the mean (aka between 4 and 20 feet)..3413 +.3413 +.1359 +.1359 =.9544 We are VERY interested in the tails! In this example, it RELEVANCE: Scientists compare their would extremely In this example, experimental data to statistical norms ~95% of students unusual/unlikely to (proportions in the normal curve) to moved between see a score of 25 in this data set. determine if they have ‘unusual’ scores 4 and 20 feet from the podium The Empirical Rule (aka: the 68, 95, 99.7 rule) Approx 68% of all scores lie within 1 sd of the mean 𝑥̅ Approx 95% of all scores lie within 2 sd of the mean Approx 99.7% of all scores lie within 3 sd of the mean Properties of z scores lThe mean of a set of z scores = 0 l The SD of a set of z scores will always equal 1. Let’s see why… Assume 𝑥̅ = 50, s = 10 When you standardize And the standard deviation a set of scores, the becomes 1 because any mean is converted value that is 1 SD above Frequency to 0 because the the mean will become value that represents X−X# 1 when you convert the mean (e.g., 50), it to a Z score. minus itself (50), s will equal zero. X- 𝑥̅ z score A visual explanation of why the standard deviation of a set of z scores equals 1. Below is a distribution of raw scores X−X# Frequency 3 2 Z = s 10 19 − 8 −16 Z= 23 Mean (raw Mean (raw score) score) == 8 19 16 10 22 12 Z=1 MeanMean (stndrd score) == (z score) 00 11 22 The x value (raw value) that is one standard deviation above the mean is always going to be equal to a z score of 1. When you standardize any distribution (regardless if it is a normal or skewed distribution of scores), what properties are affected? l Mean? l YES (the mean becomes 0; the value changes) l Standard deviation? l YES (the sd becomes 1; the value changes) l Skewness? l NO (distribution will stay the same shape) l Kurtosis? l NO (distribution will stay the same shape) THM: Just by knowing the MEAN, the STANDARD DEVIATION, and that this variable is NORMALLY DISTRIBUTED, we know A LOT about this population (or sample) 4 key characteristics of the standard deviation 1. SD is always positive. 2. SD describe quantitative data (i.e., in this class, we’ll calculate the SD only when we work with ratio or interval data). 𝑥̅ 3. Since the SD represents the average distance from the mean, the SD is most often reported WITH the mean. 4. SD is affected by the value of EACH SCORE in a distribution. OYO: READ PRIVITERA!! (ch 3, ch 4, ch 6) A z score is a value on the x-axis of a standard normal distribution. The numerical value of a z score specifies the distance or the number of standard deviations that a value is above or below the mean. The standard normal distribution, or z distribution, is a normal distribution with a mean equal to 0 and a standard deviation equal to 1. The standard normal distribution X−X# Z score = is distributed in z score units along the x-axis. s IMPORTANT UPDATE: EXAM 1 material Statistics and Research Methods – Morling Text study pointers for Exam 1. For the most part, the exams in this statistics course will be centered on material that is directly covered in lecture. Exam 1 is an exception. Specifically, Exam 1 will include questions from the Morling textbook reading that I DO NOT COVER IN LECTURE. To help guide you studying, here are some broad areas from the Morling textbook that you should focus on while you are studying for Exam 1. Please start early. Please work with a buddy. Chapter 1 - Know the definitions and differences between applied, basic and translational research Chapter 2 - pgs 23 – 26, pgs 36-53 - Know the importance of a comparison group - Know the limitations of authority figures and personal experience - Know the terms availability heuristic, present/present bias, confirmation bias Chapter 3 – pgs 56-59, Chapter 5 – pgs 117 – 124 - What is the difference between a conceptual and an operational variable? Chapter 6 – pgs 153 – 165 - Know how to make a good survey, and what makes a bad survey (what to keep in my when developing a survey) - Know how to write good questions (for a survey or an experiment) - key terms: open ended questions, forced choice questions, leading questions, double barrel questions, negative wording, question order, acquiescence, fence sitting, desirable responding, faking bad (not Breaking Bad – could not help myself…). Reaching each learner: Because students learn in different ways, we will use three different examples to practice working with raw data, means, standard deviations, z scores and distributions. 1. Exam scores 2. Race times (swimmers) 3. Text messaging (Worksheet 5) Example 1 – exam scores - Calculating z scores and using the z table - Determining probabilities and percentages - Working ‘backwards’ from z scores to raw scores I got a I got a 76% on 76% on the exam the exam! You just learned that you scored a 76% on an Spanish exam Q: What else would you want to know? - the class mean (𝑥)̅ - the standard deviation (𝒔) # ) and Okay, let’s have a look at the class mean (𝒙 standard deviation (s) to determine if a 76% is worth a celebration… Let’s transform! Spanish grades X−X# 76 − 70 6 =2 Frequency 3 s 3 3 In this data set, a score of 76 is 2 z scores above the mean. 𝑥̅ = 70 76 Question: How does your exam score compare to the rest of the class? 1359 +.3413 +.3413 +.1359 +.0215 +.0013 =.9772.0228 Answer: BUENO! You scored equal to or higher than 97.72% of the class. Answer: You could also say that only 2.28% of the class scored equal to or higher than you did. Next: Let’s learn how to use the Unit Normal Table (i.e., the z table) using this example. a score of 76 is equal to or higher than 97.72% of the other scores. #$%#& z= ' = + 2.0 Or, only 2.28% of scores are equal to or higher than a score of 76..9772.0228 Step 1: locate the z score (2.00) in column A Step 2: Find the proportion in the C column (.0228) or use the b column and add.5000 NEXT: Converting from a z score (z) to a raw score (x) On the next exam in your Spanish class, the mean on the exam is 92 and the standard deviation of the mean is 2.3. Spanish grades Your prof tells you that you earned a Frequency z score of -0.33. 2.3 What is your exam score? x Solution: 92 Step 1: Always draw the distribution! X−X# (%)* Z = -0.33 = *.' Step 2: Let’s do some simple math s (2.3) * -0.33 = 𝑥 − 92 Bueno! You got an A! -0.759 = 𝑥 − 92 92 - 0.759 = 𝑥 𝑥 = 91.24 % OYO: Below are the data (mean, s) generated by two other sections of Spanish class. For each class compute the z transformation and use the z table - Calculate a the z score for an exam grade of 76 in each class - State the proportion of scores that are higher than 76 (use z table and this should be in the form.xxxx) X−X# - State the percentage of scores that are lower than 76 s - Which class would you rather get a 76% in and why? Section 1 Section 2 Section 3 s = 3 9 1.7 𝑥̅ = 70 72 83 #$%#& z= = + 2.0 ' Higher =.9772 proportion Lower = 2.28% percent What’s the benefit of transforming raw scores to z scores? Allows us to compare raw scores from different distributions Example 1: comparing SAT (800) and ACT (36) scores Example 2: comparing athletic race times from different decades Reaching each learner: Because students learn in different ways, we will use three different examples to practice working with raw data, means, standard deviations, z scores and distributions. 1. Exam scores 2. Race times (swimmers) 3. Text messaging Let’s practice converting a different pair of raw scores into standard scores… Two Olympic medalists swam in different eras. Michael Phelps Mark Spitz 2008: 8 gold medals 1972: 7 gold medals Phelps vs. Spitz 200 Meter Butterfly Michael Phelps Mark Spitz (2008) (1972) race mean (𝑥)̅ = 117.8 sec. race mean (𝑥)̅ = 129.5 sec. stand. dev (s) = 3.0 sec. stand. dev (s) = 5.3 sec. Phelps’s time (x) = 112.0 sec. Spitz’s time (x) = 120.7 sec. Who was more dominant in their era? To help answer this, we’ll look at these data within their respective distributions, convert the raw scores into z scores, and then compare J Phelps vs. Spitz X−X# z= s Michael Phelps Mark Spitz (2008) (1972) race mean = 117.8 sec. race mean = 129.5 sec. s = 3.0 sec. s = 5.3 sec. Phelps’s time = 112.0 sec. Spitz’s time = 120.7 sec. ,,*.& %,,#.- ,*&.# %,*).. z= = - 1.93 z=..' = - 1.66 '.& Phelps z score = -1.93 Spitz z score = -1.66 Phelps has it! Frequency -3 -2 -1 0 1 2 3 -1.93 Phelps -1.66 Spitz’s z score How to express this comparison in words: NEXT: We can also check out Spitz’s swim time was 1.66 z scores lower than the mean (0) in 1972. Phelps swim time was 1.96 z scores lower than the mean (0) in 2008. these scores in Phelps z score is more extreme (more unusual) than Spitz’s z score the z table Phelps z score = -1.93 Spitz z score = -1.66 Statistically speaking, only 2.68% of all swimmers are likely to have a faster time than Phelps. Whereas, 4.85% of all swimmers are likely to have a faster time than Spitz. ##$.&'##(.) #$&.('#$+., But overall, z= *.& = - 1.93 z= ,.* = - 1.66 these champs Phelps was faster than Spitz was faster than 95.15% are crazy good! 97.32% of swimmers in 2008 of swimmers in 1972 Reaching each learner: Because students learn in different ways, we will use three different examples to practice working with raw data, means, standard deviations, z scores and distributions. 1. Exam scores 2. Race times (swimmers) 3. Text messaging Optional worksheet 5 – z scores Gretchen, one of the college seniors who participated in the texting study, texts her parent 1 time per day. Gretchen wants to know how her texting behavior compares to other college seniors. LOOKING AHEAD to EXAM 2: She asks the experimenter (you!) if her score is typical or if instead, her rate of texting is somewhat unusual. To answer Gretchen, you need to: 1. Calculate how far Gretchen’s score is from the mean of the data set. 2. Calculate the standard deviation of the data set. 3. Convert Gretchen’s raw score into a z score. 4. Find The proportion that corresponds to Gretchen’s z score using the z table. 5. Answer the question asked (percent, proportion, raw score?) OYO You will work on Gretchen’s question (and other questions about our texting data) in Worksheet 5 - find a friend - work together - compare your answers - come chat with me about them! Summary l Z scores enable us to interpret a single score within a distribution (how many SDs above or below the mean the score falls). l Z scores help us compare scores from different distributions. l In a normal distribution, we know the precise probability of getting scores above or below a particular Z score. Notes for HW 2 l Simplify expressions using PEMDAS l Include your units of measurement in your answers l DRAW the distribution for Z score problems l Make sure you are rounding properly (use all 4 decimal places when reporting Z table proportions) l Make sure you are reporting proportions if the question asks for proportions and percentages if the question asks for percentages l Don’t skip any problems (or cannot submit corrections) l Be sure to select the appropriate pages on which each question appears when uploading PDF to Gradescope It’s time to let you loose on some z score problems! When solving z problems, ask yourself: A. What am I solving for: proportion? percentage? at least? equal to or more? x value? cut-off score B. What information have I been given: z score? mean, x, sd? proportion? percent? C. DRAW THE DISTRIBUTION! D. In z table: use column A? use column B? use column C? Practice Z score problems 1. What percentage of scores lies at or above a z score of 3.5? 2. What z score cuts off the top 5% of the distribution? NOTE: to solve this problem, you have to take the average of two proportions from the z table. You’ll know what I mean when you get to this part. 3. What percentage of scores lies between a z of -2.0 and a z of 1.0? 4. 73% of the population would score at or above what z score? This question is a little tricky. Notice that it asks you about the majority of the people. As for all these questions… draw the distribution! 5. What proportion of scores falls between a z of 1.20 and a z of 2.00? Solve this problem two different ways. 6. If the mean on a test was 75, the standard deviation was 8, and you knew that your z- score was 1.55, how do you figure out your raw score on the test? 7. Imagine that the average number of hours per day that newborns sleep during their first week of life is 17, with a standard deviation of 3. What percentage of newborns sleeps 12 hours or more? 1. What1.percentage What percentage ofofscores scores lieslies at orat above or aabove z score ofa3.5? Z Score Practice Problems z score of 3.5? For each problem, be sure to DRAW the distribution and put in the appropriate values. Then figure out what column(s) of the z table you need to answer the question. Make sure that if the question asks for a proportion, you provide a proportion and not a percentage (and vice-versa). 2013.5 17 0002 1. What percentage of scores lies at or above a z score of 3.5? 02 ofsavesareatorabovea2scoreof3.5 2. What z score cuts off the top 5% of the distribution? 0002 2. What z score cuts2013.5 17 top off the 5% of the distribution? 02 ofsaves orabovea2scoreof3.5 areattable Use2 tolookup NOTE: to solve this problem, you have to take the average of two proportions5.4495 2 1.64 from the z table. You’ll know what I mean 4500in Column when B to this part. you get 2. What z score cuts off the top 5% of the distribution? do e i Bf.sno.us.ro Use2tabletolookup 5.4495 2 1.64 2 4500inColumnB Specialcaseyou dothis for e Interpolate i Bf.sno.us.roextrapolate 3. What percentage of scores lies between a z of -2.0 and a z of 1.0? 2 scores Specialcaseyou Interpolate Use 2tabletolookup extrapolate SpecialcaseyouSpecialcaseyou Interpolate Interpolate extrapolate extrapolate 3. What percentage 3. What 3. What ofpercentage scores liespercentage between of scores of ascores lies between z aof lies -2.0 z of between -2.0 and z aof and a a zz1.0? of -2.0 of 1.0?and a z of 1.0? Use 2tabletolookup Use 2tabletolookup gÉÉy gÉÉy 4772 3413 18185 4772 3413 18185 0 81.85 0 81.85 4. 73% of the 4. 73% of the population population would would score atscore at or above or above whatzz score? what score? 4. 73% of the population would score at or above what z score? This question is a little tricky. iii iii Notice that it asks you about 2700 2 0.61 the majority of the people. As for all these questions… 0.610 2 2700 2 0.61 draw the distribution! 0.610 2 More pr 5. What proportion of scores falls between a z of 1.20 and a z of 2.00? Solve this problem two different ways. 6. If the mean on a test was 75, the standard deviation was 8, and you knew that your z-score was 1.55, what was your raw score on the test? 7. Imagine that the average number of hours per day that newborns sleep during their first week of life is 17, with a standard deviation of 3. What percentage of newborns sleeps 12 hours or more? The next several slides describe how to use different calculations to solve z-score related questions. You will see similar questions on Exam 1. You’ll calculate: - z-scores (e.g., z = 1.34) - proportions (e.g., p =.3966) - raw score values (e.g., x = 21.62) - percentages (e.g., 39.66%) Just to help you see your END GOAL before we get started, below are a couple of examples of the types of questions I will ask you at the end of this cluster of slides. l What percentage of scores lies at or above a z score of 2.3? l What proportion of scores falls between a z of -2.4 and a z of 0.70? Solve this problem two different ways. l If the mean on a test was 82, the standard deviation was 5, and you knew that your z-score was 2.30, how do you figure out your raw score on the test?

Use Quizgecko on...
Browser
Browser