PSYC 204 Introduction to Psychological Statistics - Week 1 Lecture Notes (2024-09-03) PDF
Document Details
Uploaded by IndebtedBeryllium1048
McGill University
2024
Dr. Jens Kreitewolf
Tags
Summary
This document is lecture notes for an undergraduate psychology class. It contains introductory information and examples about various psychological statistics topics. The lecture notes cover the concept of descriptive statistics, defining and explaining different types of measurements and variables. The notes provide an overview of statistical methods and present various examples demonstrating the real-world applications of these methods.
Full Transcript
PSYC 204 Introduction to Psychological Statistics Week 1: Introduction to statistics (Ch. 1) 2024-09-03 Dr. Jens Kreitewolf What is this lecture about? Key content from Chapter 1 1. What are statistics? 2. Different types of measurements and variables 3. Different research methods 4. Stat...
PSYC 204 Introduction to Psychological Statistics Week 1: Introduction to statistics (Ch. 1) 2024-09-03 Dr. Jens Kreitewolf What is this lecture about? Key content from Chapter 1 1. What are statistics? 2. Different types of measurements and variables 3. Different research methods 4. Statistical notation PSYC204 – Introduction to Psychological Statistics 2 What are statistics? Book says: “set of mathematical procedures for organizing, summarizing, and interpreting information” PSYC204 – Introduction to Psychological Statistics 3 Statistics: What is it good for? You need statistics... …to pass the exam …for your research project/thesis …to assess the quality of scientific studies …if you want to work in research …for what else? PSYC204 – Introduction to Psychological Statistics 4 Some real-life examples Statistics are used to predict important outcomes and make decisions about many things in your everyday life This week’s weather Political elections How likely you are to wreck your car (car insurance price) Can you think of other examples? PSYC204 – Introduction to Psychological Statistics 5 Some real-life examples Statistics are used to predict important outcomes and make decisions about many things in your everyday life ast to he p This week’s weather to u se t Political elections we r rror e p o r e* ! *with e How likely you are toowreck u h t yourfcar u tu i ve y th e (car insurance sg price) ict i sti c pred Stat Can you think of other examples? PSYC204 – Introduction to Psychological Statistics 6 Some Basic Terms Population: The entire group of individuals is called the population. Sample: PSYC 204 PSYC students 204 Students in the A smaller group selected from Discussion the population. Group 1 PSYC204 – Introduction to Psychological Statistics 7 Some Basic Terms Variable: is a characteristic or condition that can change or take on different values. PSYC 204 Data (pl.): PSYC students 204 Students in the Discussion The measurements obtained in a Group 1 research study are called the data. PSYC204 – Introduction to Psychological Statistics 8 PSYC204 – Introduction to Psychological Statistics 9 Descriptive Statistics Descriptive statistics are methods for organizing and summarizing data. A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic. PSYC204 – Introduction to Psychological Statistics 10 Example: Time Zone Data (last semester) What time zone are you in? PSYC204 – Introduction to Psychological Statistics 11 Example: Time Zone Data (from previous semester) 244/404 students filled out the survey These data are just a sample from the population of students in the class. 3% were in UTC +8, is that a parameter or a statistic? PSYC204 – Introduction to Psychological Statistics 12 Descriptive Statistics Descriptive statistics are methods for organizing and summarizing data. A descriptive value for a population is called a parameter and a descriptive value for a sample is WholePSYC PSYC204 students 204 Class who called a statistic. answered = time zone survey Parameter What information would I = statistic need to get the population parameter? PSYC204 – Introduction to Psychological Statistics 13 Inferential Statistics Inferential statistics are methods for using sample data to make general conclusions (inferences) about populations. PSYC 204 students who Whole PSYC 204 answered time Class zone survey PSYC204 – Introduction to Psychological Statistics 14 Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. PSYC 204 students who Whole PSYC 204 answered time Class zone survey PSYC204 – Introduction to Psychological Statistics 15 PSYC204 – Introduction to Psychological Statistics 16 Descriptive & Inferential Statistics —Comparison Descriptive statistics Inferential statistics Summarize data Study samples to make Organize data generalizations about the Simplify data population Interpret experimental data Familiar examples Tables Common terminology Graphs “Margin of error” Averages “Statistically significant” PSYC204 – Introduction to Psychological Statistics 17 Learning Check 1 A researcher is interested in the effect of amount of sleep on high school students’ exam scores. A group of 75 high school boys agree to participate in the study. The boys are _____. A. A statistic B. A variable C. A parameter D. A sample PSYC204 – Introduction to Psychological Statistics 18 Types of Variables Discrete variable Has separate, indivisible categories No values can exist between two neighboring categories Continuous variable Has an infinite number of possible values between any two observed values Is divisible into an infinite number of fractional parts PSYC204 – Introduction to Psychological Statistics 19 Types of Variables Whether a variable is discrete or continuous also depends on how we measure it… PSYC204 – Introduction to Psychological Statistics 20 Real Limits of Continuous Variables Because continuous variables can be subdivided again and again, we need limits: Real limits are the boundaries of each interval representing scores measured on a continuous number line. PSYC204 – Introduction to Psychological Statistics 21 Example: What whole number would 149.6 or 150.3 be assigned? PSYC204 – Introduction to Psychological Statistics 22 Example: What whole number would 149.6 or 150.3 be assigned? PSYC204 – Introduction to Psychological Statistics 23 Measuring variables and scales Measurement assigns individuals or events to categories The categories can be names such as introvert/extrovert or employed/unemployed (qualitative data) They can be numerical values such as 68 inches or 175 pounds (quantitative data) The categories used to measure a variable make up a scale of measurement Relationships between the categories determine different types of scales PSYC204 – Introduction to Psychological Statistics 24 Measuring variables and scales 1. A nominal scale is an unordered set of categories identified only by name. Nominal measurements only permit you to determine whether two individuals are the same or different. 2. An ordinal scale is an ordered set of categories. Ordinal measurements tell you the direction of difference between two individuals. 3. An interval scale is an ordered series of equal-sized categories. Interval measurements identify the direction and magnitude of a difference. The zero point is located arbitrarily on an interval scale. The ZERO doesn’t mean NONE of the quantity – What is an example of this? 4. A ratio scale is an interval scale where a value of zero indicates none of the variable. Ratio measurements identify the direction and magnitude of differences and allow ratio comparisons of measurements. PSYC204 – Introduction to Psychological Statistics 25 Measuring variables and scales Nominal Ordinal Interval Ratio 2 1 3 =≠ = ≠, < > = ≠, < >, + - = ≠, < >, + -, * / PSYC204 – Introduction to Psychological Statistics 26 Measuring variables and scales Nominal Ordinal Interval Ratio Can you think of other examples? 2 1 3 =≠ = ≠, < > = ≠, < >, + - = ≠, < >, + -, * / PSYC204 – Introduction to Psychological Statistics 27 Learning Check 2 A study assesses the optimal size (number of members) for study groups. The variable size of group is _____. A. Discrete and interval B. Continuous and ordinal C. Discrete and ratio D. Continuous and interval PSYC204 – Introduction to Psychological Statistics 28 Research Methods Descriptive research (individual variables) One (or more) variables measured per Example: Fluffiness of cats individual Statistics describe the observed variable May use categorical and/or numerical variables Not concerned with relationships between variables PSYC204 – Introduction to Psychological Statistics 29 Research Methods The correlational method One group of participants Measurement of two variables for Example: Fluffiness and Grooming each participant The goal is to describe type and magnitude of the relationship Patterns in the data reveal relationships Nonexperimental method of study PSYC204 – Introduction to Psychological Statistics 30 Research Methods The correlational method Can demonstrate the existence of a relationship between two variables Does not provide an explanation for the relationship Most importantly, does not demonstrate a cause-and-effect relationship between the two variables PSYC204 – Introduction to Psychological Statistics 31 Research Methods The experimental method Goal is to demonstrate a cause-and- effect relationship between two variables Group 1 Group 2 Manipulation: The level of one (control) (treatment) variable is determined by the experimenter Control rules out influence of other variables PSYC204 – Introduction to Psychological Statistics 32 Research Methods Methods of control – Random assignment of subjects – Matching of subjects – Holding the level of some potentially influential variables constant Control condition Group 1 Group 2 – Individuals do not receive the experimental (control) (treatment) treatment – They either receive no treatment or they receive a neutral, placebo treatment – Purpose: to provide a baseline for comparison with the experimental condition Experimental condition – Individuals do receive the experimental treatment PSYC204 – Introduction to Psychological Statistics 33 Research Methods Important Terminology: Independent variable (IV): the variable that is manipulated by the researcher Group 1 Group 2 Dependent variable (DV): the IV: (control) (treatment) one that is observed to assess the effect of treatment DV: Fluffiness PSYC204 – Introduction to Psychological Statistics 34 Other Types of Studies Other types of research studies, know as non-experimental or quasi- experimental, are similar to experiments because they also compare groups of scores. These studies do not use a manipulated variable to differentiate the groups. Instead, the variable that differentiates the groups is usually a pre-existing participant variable (such as male/female) or a time variable (such as before/after). Because these studies do not use the manipulation and control of true experiments, they cannot demonstrate cause and effect relationships. As a result, they are similar to correlational research because they simply demonstrate and describe relationships. PSYC204 – Introduction to Psychological Statistics 35 Learn Check 3 Researchers observed that students’ exam scores were higher the more sleep they had the night before. This study is _____. A. Descriptive B. Experimental comparison of groups C. Non-experimental group comparison D. Correlational PSYC204 – Introduction to Psychological Statistics 36 Learn Check 4 Stephens, Atkins, and Kingston (2009) found that participants were able to tolerate more pain when they shouted their favorite swear words over and over than when they shouted neutral words. For this study, what is the independent variable? a. The amount of pain tolerated b. The participants who shouted swear words c. The participants who shouted neutral words d. The kind of word shouted by the participants PSYC204 – Introduction to Psychological Statistics 37 Statistical Notation Statistics uses operations and notations you have already learned (Appendix A has a Mathematical Review) The individual measurements or scores obtained for a research participant will be identified by the letter X (or X and Y if there are multiple scores for each individual). The number of scores in a data set will be identified by N for a population or n for a sample. PSYC204 – Introduction to Psychological Statistics 38 Statistical Notation Summing a set of values is a common operation in statistics and has its own notation. The Greek letter sigma, Σ, will be used to stand for "the sum of." For example, ΣX identifies the sum of the X scores. PSYC204 – Introduction to Psychological Statistics 39 Order of Operations 1. All calculations within parentheses (brackets) are done first. 2. Squaring or raising to other exponents is done second. 3. Multiplying, and dividing are done third, and should be completed in order from left to right. 4. Summation with the Σ notation is done next. 5. Any additional adding and subtracting is done last and should be completed in order from left to right. PSYC204 – Introduction to Psychological Statistics 40 Learning Check 5 ΣX ! + 47 instructs you to _____. A. Square each score and add 47 to it, then sum those numbers B. Square each score add up the squared scores, then add 47 to that sum C. Add 47 to each score, square the result, and sum those numbers D. Add up the scores, square that sum, and add 47 to it PSYC204 – Introduction to Psychological Statistics 41 Thank you for your attention! PSYC 204 Introduction to Psychological Statistics Week 2: Frequency distributions (Ch. 2) 2024-09-05 Dr. Jens Kreitewolf What is this lecture about? Key content from Chapter 2 In the first section of this course (until Chapter 6), we will be learning about how to describe and summarize data. In Chapter 2, we focus on summarizing data with frequency distributions: How to organize data into a frequency distribution table How to interpret the table How to organize data into frequency distribution graphs How to interpret the graphs PSYC204 – Introduction to Psychological Statistics 2 Getting started after data collection PSYC204 – Introduction to Psychological Statistics 3 Getting started after data collection After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results. This is the goal of Descriptive Statistics. One method for simplifying and organizing data is to construct a frequency distribution. PSYC204 – Introduction to Psychological Statistics 4 Frequency Distribution Tables A frequency distribution table is an organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. It presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution. PSYC204 – Introduction to Psychological Statistics 5 Frequency Distribution Tables In the X column, all values are listed (usually from the highest to lowest) without skipping any. For the frequency column, tallies are determined for each value (how often each X value occurs in the data set). These tallies are the frequencies for each X value. The sum of the frequencies should equal N. PSYC204 – Introduction to Psychological Statistics 6 Frequency Distribution Tables A frequency distribution table consists of at least two columns - one listing categories on the scale of measurement (X) and another for frequency (f). ΣX = ? N =? PSYC204 – Introduction to Psychological Statistics 7 Frequency Distribution Tables A frequency distribution table consists of at least two columns - one listing categories on the scale of measurement (X) and another for frequency (f). ΣX = ? N =? PSYC204 – Introduction to Psychological Statistics 8 More columns… A third column can be used for the proportion (p) for each category: p = f/N. The sum of the p column should equal 1.00. A fourth column can display the percentage of the distribution corresponding to each X value. The percentage is found by multiplying p by 100. The sum of the percentage column is 100%. PSYC204 – Introduction to Psychological Statistics 9 More columns… Also referred to as relative frequency X f proportion % 5 1 0.1 10 4 2 0.2 20 3 3 0.3 30 2 3 0.3 30 1 1 0.1 10 10 1 100 PSYC204 – Introduction to Psychological Statistics 10 Even more columns… Two more columns can be added to the frequency distribution table: One is for cumulative frequency (cf) and the other is for cumulative percentage (c%). These columns help you to find the relative location of individual scores within a distribution (percentiles or percentile ranks). The percentile rank for a particular X value is the percentage of individuals with scores equal to or less than that X value (when an X value is described by its rank, it is called a percentile). PSYC204 – Introduction to Psychological Statistics 11 Percentiles X f cumu f proportion % Cumu % 5 1 1 0.1 10 10 4 2 3 0.2 20 30 3 3 6 0.3 30 60 2 3 9 0.3 30 90 1 1 10 0.1 10 100 10 29 1 100 PSYC204 – Introduction to Psychological Statistics 12 Percentiles X f cumu f proportion % Cumu % Cumu % 5 1 1 0.1 10 10 100 4 2 3 0.2 20 30 90 3 3 6 0.3 30 60 70 2 3 9 0.3 30 90 40 1 1 10 0.1 10 100 10 10 29 1 100 PSYC204 – Introduction to Psychological Statistics 13 Percentiles X f cumu f proportion % Cumu % Cumu % 5 1 1 0.1 10 10 100 4 The 2 percentile 3 rank for0.2a particular 20X value is 30the 90 3 percentage 3 6of individuals 0.3 with scores 30 equal to60 or 70 less than that X value. 2 3 9 0.3 30 90 40 What is the percentile rank for X = 3? 1 1 10 0.1 10 100 10 10 29 1 100 PSYC204 – Introduction to Psychological Statistics 14 Percentiles X f cumu f proportion % Cumu % Cumu % 5 1 1 0.1 10 10 100 4 2 3 0.2 20 30 90 When an X value is described by its rank, it is 3 3 called 6 a percentile. 0.3 30 60 70 2 What 3 percentile 9 is X = 3? 0.3 30 90 40 1 1 10 0.1 10 100 10 10 29 1 100 PSYC204 – Introduction to Psychological Statistics 15 Age Frequency Distributions Age Rounded.. 25 25. To make a.. frequency table 20 20 you have to: 22.4 22 1. decide the bins 22 22 2. decide upper. and lower limits of 21 21. the bins (this is 23.25 23 related to the real 20 20 limits we. discussed in 22.1 22 chapter 1) 20 20 22 22 3. Put your data 21.25 21 into the bins 21.7 22 PSYC204 – Introduction to Psychological Statistics 16 Age Frequency Distributions Age Rounded.. 25 25. To make a.. frequency table X Frequency (f) fX 20 20 you have to: 25 1 25 22.4 22 1. decide the bins 22 22 2. decide upper 24 0 0. and lower limits of 21 21 23 1 23. the bins (this is 23.25 23 related to the real 22 5 110 limits we 20 20 21 2 42. discussed in 22.1 22 chapter 1) 20 3 60 20 20 22 22 3. Put your data 21.25 21 into the bins Σf=12 = N ΣfX =260 21.7 22 PSYC204 – Introduction to Psychological Statistics 17 Grouped Frequency Example 86 11 To make a 60 When you have a big range of values from the 37 frequency table 89 data, you have to choose how to group them. you have to: 53 If you wanted ten or less categories here, 1. decide the bins 77 2. decide upper 72 what interval would you want to group by? 22 and lower limits of 84 the bins (this is 68 95 related to the real 82 limits we 9 discussed in the 43 32 last lecture) 15 3. Put your data 56 into the bins 78 94 PSYC204 – Introduction to Psychological Statistics 18 Grouped Frequency Example 86 X f 11 60 91-105 2 37 89 76-90 6 53 61-75 2 77 72 46-60 3 22 84 31-45 3 68 16-30 1 95 82 1-15 3 9 Σf=20 43 32 N= 20 15 56 This examples shows numbers grouped by 15, a simple number that creates less than 10 78 “class intervals” and is a multiple of the highest number in a bin. But 20, 10 or other 94 groupings could have been fine if they were good for summarizing the data PSYC204 – Introduction to Psychological Statistics 19 Decisions in statistics In the first example, I ordered the frequency table from high to low. You could order it from low to high; this is the default display option for numeric and continuous variables in many statistical software programs. There isn’t always one right answer; you have to decide what makes sense for your Same with the grouping of data. When you summarize data you lose information. Be aware of this and avoid numbers: You get to decide! making decisions that mislead others! PSYC204 – Introduction to Psychological Statistics 20 Learning Check 1 Use the frequency distribution table to X f determine how many 5 2 subjects were in the study. 4 4 A. 10 B. 15 3 1 C. 33 2 0 D. Impossible to 1 3 determine PSYC204 – Introduction to Psychological Statistics 21 Learning Check 2 A grouped frequency distribution table has categories 0–9, 10– 19, 20–29, and 30–39. What is the width of the interval 20–29? A. 9 points B. 9.5 points C. 10 points D. 10.5 points PSYC204 – Introduction to Psychological Statistics 22 Frequency Distribution Graphs Interval or Nominal or Ratio Ordinal Variables Variables Histogram Polygon Bar Graph Pie Chart PSYC204 – Introduction to Psychological Statistics 23 Frequency Distribution— Histogram PSYC204 – Introduction to Psychological Statistics 24 Frequency Distribution— Histogram Requires numeric scores (interval or ratio) Represent all scores on X-axis from minimum through maximum observed values Include all scores with frequency of zero Draw bars above each score (interval) Height of bar corresponds to frequency Width of bar corresponds to score real limits (or one-half score unit above/below discrete scores) PSYC204 – Introduction to Psychological Statistics 25 Grouped Frequency Distribution— Histogram PSYC204 – Introduction to Psychological Statistics 26 How to build a Histogram A standard histogram can be made into an informal histogram (“block” histogram). Block Histogram Create a bar of the correct height by drawing a stack of blocks. Each block represents one individual. Therefore, block histograms show the frequency count in each bar. PSYC204 – Introduction to Psychological Statistics 27 Frequency Distribution— Polygon PSYC204 – Introduction to Psychological Statistics 28 Frequency Distribution— Polygon List all numeric scores on the X-axis Include those with a frequency of f = 0 Draw a dot above the center of each interval Height of dot corresponds to frequency Connect the dots with a continuous line Close the polygon with lines to the Y = 0 point Can also be used with grouped frequency distribution data PSYC204 – Introduction to Psychological Statistics 29 Graphs for Qualitative Data For non-numerical scores (qualitative data), you can use a bar graph Similar to a histogram Spaces between adjacent bars indicate discrete categories PSYC204 – Introduction to Psychological Statistics 30 Graphs for Qualitative Data For non-numerical scores (qualitative data), you can use a bar graph… …or a pie chart each class of the qualitative variable is represented by a slice (sector of a circle) size of each slice (sector) is proportional to the class relative frequency PSYC204 – Introduction to Psychological Statistics 31 Shape A graph shows the shape of the distribution. A distribution is symmetrical if the left side of the graph is (roughly) a mirror image of the right side. One example of a symmetrical distribution is the bell-shaped normal distribution. On the other hand, distributions are skewed when scores pile up on one side of the distribution, leaving a "tail" of a few extreme values on the other side. PSYC204 – Introduction to Psychological Statistics 32 Smooth Curve— Histogram example The regular histogram does not have the curved line, this one shows a “smoothed curve” style graph overlaid on the histogram PSYC204 – Introduction to Psychological Statistics 33 Smooth Curve— Population Distributions When a population is small, scores for each member are used to construct a frequency distribution graph such as a histogram and bar graph When a population is large, graphs based on relative frequencies are used Normal distribution – Symmetric with greatest frequency in the middle – Common data structure for many variables PSYC204 – Introduction to Psychological Statistics 34 Shapes for Frequency Distributions PSYC204 – Introduction to Psychological Statistics 35 Stem and Leaf Displays A simple alternative to a grouped frequency distribution table or graph Each score is separated into two parts: a stem and a leaf – The first digit (or digits) is called the stem – The last digit is called the leaf Example: X = 85 would be separated into a stem of 8 and a leaf of 5 Every individual score can be identified PSYC204 – Introduction to Psychological Statistics 36 Learning Check 3 What is the shape of this distribution? A. symmetrical B. negatively skewed C. positively skewed D. discrete PSYC204 – Introduction to Psychological Statistics 37 Learning Check 4 For the scores shown in the stem and leaf display, what is the lowest score in the distribution? 9 374 8 945 A. 7 B. 15 7 7042 C. 50 6 68 D. 51 5 14 PSYC204 – Introduction to Psychological Statistics 38 Thank you for your attention! PSYC 204 Introduction to Psychological Statistics Week 2: Central Tendency (Ch. 3) 2024-09-10 Dr. Jens Kreitewolf Recap PSYC204 – Introduction to Psychological Statistics 2 Decisions (on how to present your data) matter PSYC204 – Introduction to Psychological Statistics 3 Decisions (on how to present your data) matter PSYC204 – Introduction to Psychological Statistics 4 We still want to summarize data Last lectures: How to think about variables, summarize, and visualize data Now: How to use numbers to summarize data PSYC204 – Introduction to Psychological Statistics 5 Central Tendency By identifying the "average score," central tendency allows researchers to summarize or condense a large set of data into a single value. It serves as a descriptive statistic because it allows researchers to describe or present a set of data in a very simplified, concise form. We use measures of central tendency to compare sets of data or groups on some outcome (second half of this course focuses on that! -> inferential statistics) PSYC204 – Introduction to Psychological Statistics 6 Central Tendency Central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores. The goal of central tendency is to identify the single value that is the best representative for the entire set of data. PSYC204 – Introduction to Psychological Statistics 7 PSYC204 – Introduction to Psychological Statistics 8 PSYC204 – Introduction to Psychological Statistics 9 Standard measures of central tendency No single measure of central tendency always produces a good, representative value Three commonly used techniques for measuring central tendency: the mean the median and the mode PSYC204 – Introduction to Psychological Statistics 10 The Mean The mean is the most commonly used measure of central tendency. Computation of the mean requires scores that are numerical values measured on an interval or ratio scale. PSYC204 – Introduction to Psychological Statistics 11 The Mean – Notation The mean is the sum of all the scores divided by the number of scores in the data. Book Math Sample: M= å X 𝑥̅ = ∑$!"# 𝑥! n 𝑛 Population: µ= åX 𝜇= ∑% !"# 𝑥! N 𝑁 PSYC204 – Introduction to Psychological Statistics 12 The Mean – Notation The mean is the sum of all the scores divided by the number of scores in the data. Sample: M= å X å X = µ*N n å X Population: µ= å X N= µ N PSYC204 – Introduction to Psychological Statistics 13 The Mean as Balance Point The mean is the balance point of the distribution because the sum of the distances below the mean is exactly equal to the sum of the distances above the mean. Ph.D B.A./B.S H.S. PSYC204 – Introduction to Psychological Statistics 14 Calculating the mean Let’s say, we have the following sample data: 10, 9, 9, 8, 8, 8, 8, 6 M= å X = 10 + 9 + 9 + 8 + 8 + 8 + 8 + 6 66 = = 8.25 n 8 8 PSYC204 – Introduction to Psychological Statistics 15 Calculating the mean from a frequency distribution table Make a simple Quiz Score (X) f fX frequency table for 10 1 10 these data: 9 2 18 8 4 32 10, 9, 9, 8, 8, 8, 8, 6 7 0 0 6 1 6 n=8 ΣX=66 M = ∑X/n = 66/8 = 8.25 PSYC204 – Introduction to Psychological Statistics 16 Learning Check 1 A sample of n = 12 scores has a mean of 𝑥̅ = 8. What is the value of ΣX for this sample? A. ΣX = 1.5 B. ΣX = 4 C. ΣX = 20 D. ΣX = 96 PSYC204 – Introduction to Psychological Statistics 17 Weighted Mean – Example Sometimes you want to calculate a weighted mean to take into account different group sizes. Three steps: Determine the combined sum of all the scores Determine the combined number of scores Divide the sum of scores by the total number of scores M= å X +åX 1 2 n1 + n2 PSYC204 – Introduction to Psychological Statistics 18 Weighted Mean – Example Say you have two groups of participants and their score on a depression inventory (this example is in Mindtap) Mean1=16, N=6 ΣX=M*N=16*6=96 Mean2=12, N=10 ΣX=M*N=12*10=120 Weighted Mean= Σ𝑿𝟏 + Σ𝑿𝟐 𝑵𝟏 + 𝑵𝟐 PSYC204 – Introduction to Psychological Statistics 19 Characteristics of the Mean Changing the value of a score in the data changes the mean. Introducing a new score or removing a score changes the mean (unless the score added or removed is exactly equal to the mean). The mean is sensitive to inclusion and exclusion criteria for participants. That means that the decisions researchers make about who to include in their analyses can drastically influence the summary statistics. When you read about research keep an eye out for inclusion and exclusion criteria. PSYC204 – Introduction to Psychological Statistics 20 Changing the Mean— Example If a constant value is added to every score in a X X+3 X*3 distribution, then the same constant value is added to 2 5 6 the mean. 4 7 12 7 10 21 3 6 9 4 7 12 Mean 4 7 12 PSYC204 – Introduction to Psychological Statistics 21 Changing the Mean— Example If a constant value is added to every score in a X X+3 X*3 distribution, then the same constant value is added to 2 5 6 the mean. 4 7 12 Also, if every score is 7 10 21 multiplied by a constant 3 6 9 value, then the mean is 4 7 12 also multiplied by the same constant value. Mean 4 7 12 PSYC204 – Introduction to Psychological Statistics 22 When the Mean Won’t Work We want to use the mean to summarize the data, but sometimes it doesn’t work very well, particularly when… …a distribution contains a few extreme scores (or is very skewed), the mean will be pulled toward the extremes (displaced toward the tail). PSYC204 – Introduction to Psychological Statistics 23 When the Mean Won’t Work We want to use the mean to summarize the data, but sometimes it doesn’t work very well, particularly when… …a distribution contains a few extreme scores (or is very skewed), the mean will be pulled toward the extremes (displaced toward the tail). PSYC204 – Introduction to Psychological Statistics 24 The Median If the scores in a distribution are listed in order from smallest to largest, the median is defined as the midpoint of the list. The median divides the scores so that 50% of the scores in the distribution have values that are equal to or less than the median. Computation of the median requires scores that can be placed in rank order (smallest to largest) and are measured on an ordinal, interval, or ratio scale. PSYC204 – Introduction to Psychological Statistics 25 The Median Usually, the median can be found by a simple counting procedure: 1. With an odd number of scores, list the values in order, and the median is the middle score in the list. !"# (the number in the th position) $ 3 5 8 10 11 “Middle” score is 8 so m = 8 PSYC204 – Introduction to Psychological Statistics 26 The Median Usually, the median can be found by a simple counting procedure: 2. With an even number of scores, list the values in order, and the median is half-way between the middle two scores. ! ! (the average of th and + 1 th numbers) $ $ 1 1 4 5 7 9 m = (4 + 5) / 2 = 4.5 PSYC204 – Introduction to Psychological Statistics 27 The precise median for a continuous variable If the scores are measurements of a continuous variable, it is possible to find the median by first placing the scores in a frequency distribution histogram with each score represented by a box in the graph. PSYC204 – Introduction to Psychological Statistics 28 The precise median for a continuous variable Then, draw a vertical line through the distribution so that exactly half the boxes are on each side of the line. The median is defined by the location of the line. You can also use interpolation to solve for the precise median. PSYC204 – Introduction to Psychological Statistics 29 Pros of the median One advantage of the median, over the mean, is that it is less affected by extreme scores. Thus, the median tends to stay in the "center" of the distribution even when there are a few extreme scores or when the distribution is very skewed. In these situations, the median serves as a good alternative to the mean. PSYC204 – Introduction to Psychological Statistics 30 The Median, the Mean, and the Middle The mean is the balance point of a distribution Defined by distances Not necessarily at the exact center of the scores The median is the midpoint of a distribution Defined by number of scores Often is not the balance point of the scores Both measure central tendency, using two different concepts of the “middle”. PSYC204 – Introduction to Psychological Statistics 31 The Mode The mode is defined as the most frequently occurring category or score in the distribution In a frequency distribution graph, the mode is the category or score corresponding to the peak or high point of the distribution. The mode can be determined for data measured on any scale of measurement: nominal, ordinal, interval, or ratio. PSYC204 – Introduction to Psychological Statistics 32 Bimodal Distributions It is possible for a distribution to have more than one mode. Such a distribution is called bimodal. (Note that a distribution can have only one mean and only one median.) PSYC204 – Introduction to Psychological Statistics 33 Bimodal Distributions In addition, the term "mode" is often used to describe a peak in a distribution that is not really the highest point. A distribution may have a major mode at the highest peak and a minor mode at a secondary peak in a different location. PSYC204 – Introduction to Psychological Statistics 34 Mode and maps https://www.netcredit.com/blog/most-common-name-country/ PSYC204 – Introduction to Psychological Statistics 35 More maps… PSYC204 – Introduction to Psychological Statistics 36 Central Tendency and the Shape of the Distribution Because the mean, the median, and the mode are all measuring central tendency, the three measures are often systematically related to each other. In a symmetrical distribution, for example, the mean and median will always be equal. PSYC204 – Introduction to Psychological Statistics 37 Central Tendency and the Shape of the Distribution PSYC204 – Introduction to Psychological Statistics 38 Learning Check 2 A distribution of scores shows the mean = 31 and the median = 43. This distribution is probably _____. A. positively skewed B. negatively skewed C. bimodal D. open-ended PSYC204 – Introduction to Psychological Statistics 39 Skewed Distributions Mean, influenced by extreme scores, is found far toward the long tail (positive or negative) Median, in order to divide scores in half, is found toward the long tail, but not as far as the mean Mode is found near the short tail PSYC204 – Introduction to Psychological Statistics 40 Skewed Distributions If mean – median > 0, the distribution is positively skewed. If mean – median < 0, the distribution is negatively skewed PSYC204 – Introduction to Psychological Statistics 41 Thank you for your attention! PSYC 204 Introduction to Psychological Statistics Week 3: Variability (Ch. 4) 2024-09-12 Dr. Jens Kreitewolf Not Dr. but Bellete Lu Where we have been and where we are heading We are stilling thinking about how to summarize data 1. Distribution tables and visuals (Ch. 2) 2. Measures of central tendency (Ch. 3) 3. Today: How to describe how spread out the data are, measure the variability (Ch. 4) PSYC204 – Introduction to Psychological Statistics 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies a measure of central tendency as basic descriptive statistics for a set of scores. PSYC204 – Introduction to Psychological Statistics 3 Variability REALLY important, because when the mean stands alone, it doesn’t tell you much about the data Strongly Disagree Somewhat Somewhat Agree Strongly Disagree Disagree Agree Agree 1 2 3 4 5 6 PSYC204 – Introduction to Psychological Statistics 4 Central Tendency and Variability Central tendency describes the central point of the distribution. Variability describes how the scores are scattered around that central point. Together, central tendency and variability are the two primary values that are used to describe a distribution of scores. PSYC204 – Introduction to Psychological Statistics 5 Central Tendency and Variability Variability serves both as a descriptive measure and as an important component of most inferential statistics. As a descriptive statistic, variability measures the degree to which the scores are spread out or clustered together in a distribution. PSYC204 – Introduction to Psychological Statistics 6 Variability to understand a population In the context of inferential statistics variability provides a measure of how accurately any individual score or sample represents the entire population. PSYC204 – Introduction to Psychological Statistics 7 Variability to understand a population When the population variability is small, all of the scores are clustered close together and any individual score or sample will necessarily provide a good representation of the entire set. PSYC204 – Introduction to Psychological Statistics 8 Variability to understand a population On the other hand, when variability is large and scores are widely spread, it is easy for one or two extreme scores to give a distorted picture of the general population. PSYC204 – Introduction to Psychological Statistics 9 Variability can mask group differences When groups have less variability, intervention effects are easier to detect and quantify. Also, data with less variability makes it easier to predict scores, because scores are more consistent. PSYC204 – Introduction to Psychological Statistics 10 Quantifying Variability Variability can be measured with The range The standard deviation/variance In both cases, variability is determined by measuring distance. PSYC204 – Introduction to Psychological Statistics 11 The Range The distance covered by the scores in a distribution From smallest value to largest value For continuous data, real limits are used range = Upper real limit for Xmax − Lower real limit for Xmin Based on two scores, not all data An imprecise, unreliable measure of variability PSYC204 – Introduction to Psychological Statistics 12 Defining Variance and Standard Deviation Most common and most important measure of variability is the standard deviation. A measure of the standard, or average, distance from the mean Describes whether the scores are clustered closely around the mean or are widely scattered Calculation differs for population and samples Variance is a necessary companion concept to standard deviation but not the same concept. Captures the average squared distance from the mean The standard deviation and variance are in different metrics, both quantify how spread out the data are. PSYC204 – Introduction to Psychological Statistics 13 The Standard Deviation of a Population 1. Compute the deviation (distance from the mean) for each score. 2. Square each deviation. 3. Compute the mean of the squared deviations Sum the squared deviations (sum of squares, SS) and then divide by N. The resulting value is called the variance or mean square and measures the average squared distance from the mean. 4. Finally, take the square root of the variance to obtain the standard deviation. PSYC204 – Introduction to Psychological Statistics 14 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 PSYC204 – Introduction to Psychological Statistics 15 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Step 0: Collect data! PSYC204 – Introduction to Psychological Statistics 16 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Step 1: Find mean! PSYC204 – Introduction to Psychological Statistics 17 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Step 2: Compute deviation between each score and the mean! PSYC204 – Introduction to Psychological Statistics 18 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Previous Lecture: Step 2: Compute deviation between each score and the mean! The mean as balance point PSYC204 – Introduction to Psychological Statistics 19 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Step 3: Square all the deviations! PSYC204 – Introduction to Psychological Statistics 20 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 𝑁 𝑖=1 Step 4: Sum all the squared deviations (Sum of Squares)! PSYC204 – Introduction to Psychological Statistics 21 Variance and Standard Deviation Variance: 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 = 𝑁 𝑖=1 Step 5: Divide by N to get the average squared deviation (Variance)! PSYC204 – Introduction to Psychological Statistics 22 Variance and Standard Deviation Standard Deviation: 𝑁 1 2 𝜎= ∙ 𝑥𝑖 − 𝜇 = 𝑁 𝑖=1 Step 6: Take the square root (Standard Deviation)! PSYC204 – Introduction to Psychological Statistics 23 Calculating Variance and SD Table 𝑁 2 1 2 𝜎 = ∙ 𝑥𝑖 − 𝜇 X 𝑁 𝑖=1 2 4 7 3 4 PSYC204 – Introduction to Psychological Statistics 24 Calculating Variance and SD Table 𝑁 1 N=5 𝒙𝒊 𝜇 𝒙𝒊 − 𝜇 𝒙𝒊 − 𝜇 𝟐 𝜎2 = ∙ 𝑥𝑖 − 𝜇 2 𝑁 2 4 -2 4 𝑖=1 4 4 0 0 7 4 3 9 3 4 -1 1 4 4 0 0 Σ 0 14 Population Variance= SS/N 2.8 Population SD= Sqrt(Variance) 1.67 PSYC204 – Introduction to Psychological Statistics 25 Computational formulas Instead of using the table, you ( X ) 2 can also use these formulas, they SS = X − 2 are a bit faster, see calculation N video for example of how to use SS these formulas. Variance = N SS S.D. = N PSYC204 – Introduction to Psychological Statistics 26 PSYC204 – Introduction to Psychological Statistics 27 Population vs Sample You (almost) never have the whole population; we were just doing that to get used to calculations. N= number of observations in whole population n= number of observations in sample When we take a sample, the estimate of the variability will be different from the population* because of sampling error *lower than it should be PSYC204 – Introduction to Psychological Statistics 28 Sampling Error The discrepancy between a sample statistic and its population parameter is called sampling error. Members of a single PSYC 204 Class group PSYC204 – Introduction to Psychological Statistics 29 Sampling Error— Variance (and Standard Deviation) We would systematically underestimate the population variance (and SD) based on the sample information To correct for this error, we use (𝑛 − 1) instead of 𝑛 in the formula for the sample variance: 𝑛 𝑛 1 1 𝑠2 = ∙ 𝑥𝑖 − 𝑥ҧ 2 𝑠= ∙ 𝑥𝑖 − 𝑥ҧ 2 𝑛−1 𝑛−1 𝑖=1 𝑖=1 PSYC204 – Introduction to Psychological Statistics 30 Sampling Error— Variance (and Standard Deviation) We would systematically underestimate the population variance (and SD) based on the sample information To correct for this error, we use (𝑛 − 1) instead of 𝑛 in the formula for the sample variance: 𝑛 𝑛 1 1 𝑠2 = ∙ 𝑥𝑖 − 𝑥ҧ 2 𝑠= ∙ 𝑥𝑖 − 𝑥ҧ 2 𝑛−1 𝑛−1 𝑖=1 𝑖=1 PSYC204 – Introduction to Psychological Statistics 31 Calculating Variance and SD Table 𝑛 2 1 2 𝒙𝒊 ഥ 𝒙 ഥ 𝒙𝒊 − 𝒙 ഥ 𝒙𝒊 − 𝒙 𝟐 𝑠 = ∙ 𝑥𝑖 − 𝑥ҧ 𝑛−1 2 4 -2 4 𝑖=1 4 4 0 0 7 4 3 9 3 4 -1 1 4 4 0 0 Σ 0 14 Sample Variance= SS/(𝑛 − 1) 3.5 Sample SD= Sqrt(Variance) 1.87 PSYC204 – Introduction to Psychological Statistics 32 Learning Check 1 A sample of four scores has SS = 24. What is the variance? A. The variance is 6. B. The variance is 7. C. The variance is 8. D. The variance is 12. PSYC204 – Introduction to Psychological Statistics 33 Degrees of Freedom (𝒏 − 𝟏) in the formula represents the degrees of freedom Degrees of freedom (df ) are the number of observations that are free to vary when calculating an estimate from a sample. When we estimate the mean from a sample, it becomes fixed and puts a restriction on the data we can have. PSYC204 – Introduction to Psychological Statistics 34 Degrees of Freedom Example Suppose we have a sample with 3 people for some variable, X… and we calculate the mean, and its 5 Once you have the first two numbers in the sample, the 3rd number can only be one value, otherwise you won’t get a mean of 5, the last number is NOT free to vary, so df=2 in this example PSYC204 – Introduction to Psychological Statistics 35 Degrees of Freedom and Bias Degrees of freedom (df ) are the number of observations that are free to vary when calculating an estimate from a sample If we use 𝑛 instead of (𝑛 − 1) when calculating the variance from a sample, it will be a little off from the true value in the population (i.e., biased). Using (𝑛 − 1) creates an unbiased estimate of the sample variance and standard deviation, across repeated sampling (we will discuss more later). We do not need to proof this correction, we just need to trust that it yields an unbiased estimate. PSYC204 – Introduction to Psychological Statistics 36 Bias An estimate from a sample is unbiased if across all possible samples, the average of the estimates will equal the true population value. An estimate is biased if on average (across all samples) the estimate is over or under the true, population value If you use the population formula for a sample, your value of the variance will be biased. PSYC204 – Introduction to Psychological Statistics 37 Learning Check 2 A population has μ = 6 and σ = 2. Each score is multiplied by 10. What is the shape of the resulting distribution? A. μ = 60 and σ = 2 B. μ = 6 and σ = 20 C. μ = 60 and σ = 20 D. μ = 6 and σ = 5 PSYC204 – Introduction to Psychological Statistics 38 Characteristics of the Standard Deviation If a constant is added to every score in a distribution, the standard deviation will not be changed. If you visualize the scores in a frequency distribution histogram, then adding a constant will move each score so that the entire distribution is shifted to a new location. The center of the distribution (the mean) changes, but the standard deviation remains the same. PSYC204 – Introduction to Psychological Statistics 39 Characteristics of the Standard Deviation If each score is multiplied by a constant, the standard deviation will be multiplied by the same constant. Multiplying by a constant will multiply the distance between scores, and because the standard deviation is a measure of distance, it will also be multiplied. PSYC204 – Introduction to Psychological Statistics 40 The Mean and Standard Deviation as Descriptive Statistics If you are given numerical values for the mean and the standard deviation, you should be able to construct a visual image (or a sketch) of the distribution of scores. As a general rule, about 70% of the scores will be within one standard deviation of the mean, and about 95% of the scores will be within a distance of two standard deviations of the mean. PSYC204 – Introduction to Psychological Statistics 41 Thank you for your attention! PSYC 204 Introduction to Psychological Statistics Week 4: Z-Scores (Ch. 5) 2024-09-17 Dr. Jens Kreitewolf Recap: Notations Sample Population Mean 𝑥̅ 𝜇 Standard deviation 𝑠 𝜎 Variance 𝑠! 𝜎! PSYC204 – Introduction to Psychological Statistics 2 Recap: Formulas Sample Population Mean ∑%"#$ 𝑥" ∑& "#$ 𝑥" 𝑥̅ = 𝜇= 𝑛 𝑁 Standard % & deviation 1 1 𝑠= , - 𝑥" − 𝑥̅ ! 𝜎= , - 𝑥" − 𝜇 ! 𝑛−1 𝑁 "#$ "#$ Variance % & ! 1 ! 1 𝑠 = , - 𝑥" − 𝑥̅ ! 𝜎 = , - 𝑥" − 𝜇 ! 𝑛−1 𝑁 "#$ "#$ PSYC204 – Introduction to Psychological Statistics 3 Recap: Mean & Standard deviation Consider the following sample of mid-term grades of 5 randomly selected students. Student # Grade Let’s calculate mean and standard deviation! 1 67 2 72 How do you interpret these 3 76 values? 4 76 How do changes in mean and SD 5 84 affect the distribution shape? PSYC204 – Introduction to Psychological Statistics 4 Roadmap We are getting to the point where we can think about how to use stats to predict things! Frequency Z-scores Tables and Central and Variables Tendency Variability Graphs Probability PSYC204 – Introduction to Psychological Statistics 5 Z-Scores and Location By itself, a raw score or X value provides very little information about how that particular score compares with other values in the distribution. How is my score compared to the other scores? Is it above the average? Score = 26 PSYC204 – Introduction to Psychological Statistics 6 So, what if I tell you got a 26 on the progress test? PSYC204 – Introduction to Psychological Statistics 7 So, what if I tell you got a 26 on the progress test? PSYC204 – Introduction to Psychological Statistics 8 Z-Scores and Location Sample z-score Population z-score 𝑥" − 𝑥̅ 𝑥" − 𝜇 𝑧" = 𝑧" = 𝑠 𝜎 If we transform the raw data into z-scores, we get two nice properties: The sign of the z-score (+ or –) identifies whether the X value is located above the mean (positive) or below the mean (negative). The numerical value of the z-score corresponds to the number of standard deviations between X and the mean of the distribution. PSYC204 – Introduction to Psychological Statistics 9 Z-Scores and Location Thus, a score that is located two standard deviations above the mean will have a z-score of +2.00. PSYC204 – Introduction to Psychological Statistics 10 Z-Scores and Location Thus, a score that is located two standard deviations above the mean will have a z-score of +2.00. PSYC204 – Introduction to Psychological Statistics 11 Transforming Back and Forth Between X and Z From X to Z: 𝑥! − 𝜇 𝑧! = 𝜎 From Z to X: 𝑥" = 𝜇 + 𝑧" 𝜎 PSYC204 – Introduction to Psychological Statistics 12 Learning Check 1 A z-score of z = +1.00 indicates a position in a distribution _____. A. above the mean by 1 point B. above the mean by a distance equal to 1 standard deviation C. below the mean by 1 point D. below the mean by a distance equal to 1 standard deviation PSYC204 – Introduction to Psychological Statistics 13 Learning Check 2 For a population with µ = 50 and σ = 10, what is the X value corresponding to z = 0.4? A. 50.4 B. 10 C. 54 D. 10.4 PSYC204 – Introduction to Psychological Statistics 14 Calculation Example 1. Calculate z-scores for # of cups of coffee drank today 2. How to place them in a distribution and interpret them 𝑥" − 𝜇 𝑧" = 𝜎 PSYC204 – Introduction to Psychological Statistics 15 Calculation Example Average number of cups of coffee per day is µ = 1.75 with a standard deviation of σ =.85 cups (I made these up; not real population values) PSYC204 – Introduction to Psychological Statistics 16 Calculation Example PSYC204 – Introduction to Psychological Statistics 17 Visualizing Z-Scores Remember, z = 0 is in the center (at the mean), and the extreme tails correspond to z-scores of approximately –2.00 on the left and +2.00 on the right. Although more extreme z- score values are possible, most of the distribution is contained between z = –2.00 and z = +2.00. PSYC204 – Introduction to Psychological Statistics 18 Z-Scores and Locations The fact that z-scores identify exact locations within a distribution means that z-scores can be used as descriptive statistics and as inferential statistics. As descriptive statistics, z-scores describe exactly where each individual is located. As inferential statistics, z-scores determine whether a specific sample is representative of its population or is extreme and unrepresentative. PSYC204 – Introduction to Psychological Statistics 19 Z-Scores and Probability We can use z-scores to start to think about the probability of a given observation. How probable do you think a score between 120 and 125 is in this distribution? How probable is a score around the mean? PSYC204 – Introduction to Psychological Statistics 20 Z-Scores and Locations If I tell you that my test score was half a standard deviation above the mean, what was my z score? If my friend tells me that they transformed their test score into a z- score and it was -2.00, how far away from the mean were they? What if they made a mistake and it was really a +2.00 z score, how did they do? Z-scores connect to distribution tables and percentile ranks PSYC204 – Introduction to Psychological Statistics 21 Z-Scores as a Standardized Distribution When an entire distribution of X values is transformed into z- scores, the resulting distribution of z-scores will always have a mean of zero and a standard deviation of one. The transformation does not change the shape of the original distribution and it does not change the location of any individual score relative to others in the distribution. If my data were skewed, they will still be skewed… PSYC204 – Introduction to Psychological Statistics 22 PSYC204 – Introduction to Psychological Statistics 23 Z-Scores as a Standardized Distribution The advantage of standardizing distributions is that two (or more) different distributions can be compared in the same metric. For example, one distribution has μ = 100 and σ = 10, and another distribution has μ = 40 and σ = 6. When these distributions are transformed to z-scores, both will have μ = 0 and σ = 1. You can compare numbers in the same unit from different distributions! PSYC204 – Introduction to Psychological Statistics 24 Z-Scores as a Standardized Distribution Because z-score distributions all have the same mean and standard deviation, individual scores from different distributions can be directly compared. A z-score of +1.00 specifies the same location in all z-score distributions. PSYC204 – Introduction to Psychological Statistics 25 Z-Scores and Samples Populations are the most common context for computing z-scores. It is possible to compute z-scores for samples. The definition of a z-score is the same for either a sample or a population, and the formulas are also the same except that the sample mean and standard deviation are used in place of the population mean and standard deviation. PSYC204 – Introduction to Psychological Statistics 26 Z-Scores and Samples Sample z-score Population z-score 𝑥" − 𝑥̅ 𝑥" − 𝜇 𝑧" = 𝑧" = 𝑠 𝜎 Using z-scores to standardize a sample also has the same effect as standardizing a population. Specifically, the mean of the z-scores will be zero and the standard deviation of the z-scores will be equal to 1.00 provided the standard deviation is computed using the sample formula (dividing 𝑛 – 1 instead of 𝑛). PSYC204 – Introduction to Psychological Statistics 27 Why do we care about z-scores? Z-scores are useful because they help us start thinking about how probable certain (intervals of) scores or samples are. The probability of certain outcomes or samples is at the foundation of using statistical tests to answer research questions. PSYC204 – Introduction to Psychological Statistics 28 Z-Scores for making comparisons All z-scores are comparable to each other Scores from different distributions can be converted to z-scores z-scores (standardized scores) allow the direct comparison of scores from two different distributions because they have been converted to the same scale PSYC204 – Introduction to Psychological Statistics 29 Learning Check 3 Last week, Andy had exams in chemistry and in Spanish. On the chemistry exam (µ = 30; σ = 5) and Andy had a score of X = 45. On the Spanish exam (µ = 60; σ = 6) Andy had a score of X = 65. For which class should Andy expect the better grade? A. chemistry B. Spanish PSYC204 – Introduction to Psychological Statistics 30 Other standardized distributions based on z-scores You can actually standardize a distribution in many ways, creating whichever new mean and SD you’d like Example: The IQ score is a standard score with a mean of 100 and an SD of 15 PSYC204 – Introduction to Psychological Statistics 31 Other standardized distributions based on z-scores To create a standardized distribution, you first select the mean and standard deviation that you would like for the new distribution. Then, z-scores are used to identify each individual's position in the original distribution and to compute the individual's position in the new distribution. PSYC204 – Introduction to Psychological Statistics 32 Other Standardized Distributions Based on Z-Scores Suppose that you want to standardize a distribution so that the new mean is µ = 50 and the new standard deviation is σ = 10 An individual with z = –1.00 in the original distribution would be assigned a score of X = 40 (below µ by one standard deviation) in the standardized distribution. Repeating this process for each individual score allows you to transform an entire distribution into a new, standardized distribution. This is tedious and not a focus in this course PSYC204 – Introduction to Psychological Statistics 33 What if I want a new scale with a mean of 50 and an SD of 10? 30 40 50 60 70 PSYC204 – Introduction to Psychological Statistics 34 Learning Check 4 A score of X = 59 comes from a distribution with µ = 63 and σ = 8. This distribution is standardized to a new distribution with µ = 50 and σ = 10. What is the new value of the original score? A. 59 B. 45 C. 46 D. 55 PSYC204 – Introduction to Psychological Statistics 35 Thank you for your attention! PSYC 204 Introduction to Psychological Statistics Week 4: Probabilities (Ch. 6) 2024-09-19 Dr. Jens Kreitewolf Recap: Z-Scores We have seen that we can transform single scores or an entire distribution of scores into 𝑥! − 𝜇 z-scores. 𝑧! = 𝜎 This transform does not change the shape of the 𝜇" = 0 distribution. 𝜎" = 1 But it changes mean and standard deviation. PSYC204 – Introduction to Psychological Statistics 2 What is going on in chapter 6? We have learned how to summarize data in three ways: 1. In tables and graphs 2. Via measures of central tendency 3. Via measures of variability We have learned how to put data into a distribution, and some properties of distributions 1. How to describe distribution shape 2. Starting to think about how distribution shape relates to the extremity of scores 3. Starting to think about how scores are more or less likely given their position in a distributions 4. How to describe a score’s location in a distribution using a Z-score In this lecture we are going to learn how to calculate the probability of a score and how probability relates to inferential statistics. PSYC204 – Introduction to Psychological Statistics 3 Probability Basics The probability of any specific outcome is determined by a ratio comparing the frequency of occurrence for that outcome relative to the total number of possible outcomes PSYC204 – Introduction to Psychological Statistics 4 Ways to express probability 4 of 52 cards are Aces 4/52 2/26.0769 -- the proportion 8% -- the percentage PSYC204 – Introduction to Psychological Statistics 5 Learning Check 1 A deck of 52 cards contains 12 royalty cards. If you randomly select a card from the deck, what is the probability of obtaining a royalty card? A. p = 1/52 B. p = 12/52 C. p = 3/52 D. p = 4/52 PSYC204 – Introduction to Psychological Statistics 6 Why is this relevant to statistics? PSYC204 – Introduction to Psychological Statistics 7 Probability Researchers rely on probability to determine the relative likelihood for specific samples When we get a sample from a population, we start to think, what is the probability of getting this sample (or more extreme samples)? We cannot predict exactly which value(s) will be obtained for a sample, but it is possible to determine which outcomes have high probability and which ones have low probability. PSYC204 – Introduction to Psychological Statistics 8 Probabilities of scores Before we get to samples, let’s think about the probability of scores. What is the probability of getting a score within one SD of the mean? What about within 3 SDs? PSYC204 – Introduction to Psychological Statistics 9 The Empirical Rule (rule of thumb) This rule applies to data sets with frequency distributions that are (roughly) mound-shaped and symmetric: Approximately 68% of the measurements will fall within one standard deviation of the mean. Population: (𝜇 − 𝜎, 𝜇 + 𝜎) 𝜇 −𝜎 𝜎 Sample: (𝑥̅ − 𝑠, 𝑥̅ + 𝑠) PSYC204 – Introduction to Psychological Statistics 10 The Empirical Rule (rule of thumb) This rule applies to data sets with frequency distributions that are (roughly) mound-shaped and symmetric: Approximately 95% of the measurements will fall within two standard deviation of the mean. Population: (𝜇 − 2𝜎, 𝜇 + 2𝜎) −2𝜎 𝜇 2𝜎 Sample: (𝑥̅ − 2𝑠, 𝑥̅ + 2𝑠) PSYC204 – Introduction to Psychological Statistics 11 The Empirical Rule (rule of thumb) This rule applies to data sets with frequency distributions that are (roughly) mound-shaped and symmetric: Approximately 99.7% of the measurements will fall within three standard deviation of the mean. Population: (𝜇 − 3𝜎, 𝜇 + 3𝜎) 𝜇 −3𝜎 3𝜎 Sample: (𝑥̅ − 3𝑠, 𝑥̅ + 3𝑠) PSYC204 – Introduction to Psychological Statistics 12 The Empirical Rule PSYC204 – Introduction to Psychological Statistics 13 Learning Check 2 In a (roughly) mound-shaped and symmetric distribution, what is the percentage of measurements that lie between the mean and 2 SDs above the mean? A. about 95% B. about 34% C. about 47.5% D. about 16% PSYC204 – Introduction to Psychological Statistics 14 The Empirical Rule applies to mound-shaped, symmetric distributions of data (for which the mean, median, and mode are all about the same) provides a very good approximation of data distribution even if the distribution of the data is slightly skewed or asymmetric PSYC204 – Introduction to Psychological Statistics 15 The Empirical Rule— Another Example Suppose we know that the distribution of midterm scores in a Statistics class is relatively symmetric and mound-shaped, with 𝑥̅ = 75 and 𝑠 # = 25. a) Give an interval that captures approximately 95% of the observations PSYC204 – Introduction to Psychological Statistics 16 The Empirical Rule— Another Example Suppose we know that the distribution of midterm scores in a Statistics class is relatively symmetric and mound-shaped, with 𝑥̅ = 75 and 𝑠 # = 25. b) What can you say about the proportion of the grades that fall between 60 and 90? PSYC204 – Introduction to Psychological Statistics 17 The Empirical Rule— Another Example Suppose we know that the distribution of midterm scores in a Statistics class is relatively symmetric and mound-shaped, with 𝑥̅ = 75 and 𝑠 # = 25. c) What can you say about the proportion of measurements that are less than 65? PSYC204 – Introduction to Psychological Statistics 18 The Empirical Rule— Z-Scores Since z-scores have a mean of zero and a standard deviation of one, applying the empirical rule to z-scored symmetric and mound-shaped distributions yields the following properties: Approximately 68% of 𝜇" = 0 z-scores fall within the interval (−1,1). 𝜎" = 1 PSYC204 – Introduction to Psychological Statistics 19 The Empirical Rule— Z-Scores Since z-scores have a mean of zero and a standard deviation of one, applying the empirical rule to z-scored symmetric and mound-shaped distributions yields the following properties: Approximately 95% of 𝜇" = 0 z-scores fall within the interval (−2,2). 𝜎" = 1 PSYC204 – Introduction to Psychological Statistics 20 The Empirical Rule— Z-Scores Since z-scores have a mean of zero and a standard deviation of one, applying the empirical rule to z-scored symmetric and mound-shaped distributions yields the following properties: Approximately 99,7% 𝜇" = 0 of z-scores fall within the interval (−3,3). 𝜎" = 1 |𝑧| > 3 is considered unusual, possible outlier PSYC204 – Introduction to Psychological Statistics 21 Probability and the Normal Distribution We can draw a line through the distribution and estimate probabilities of scores above, or below the line… The exact location of the line can be specified by a z-score. The line divides the distribution into two sections. The larger section is called the body and the smaller section is called the tail. PSYC204 – Introduction to Psychological Statistics 22 A few drawing examples A few Z-scores Drawing them Shade in percent higher and lower A few raw scores with a certain population mean and SD Convert it to a z score Shade in percent higher and lower PSYC204 – Introduction to Psychological Statistics 23 Example So, say I give you a Z-score and I ask you what