Textbook (Ch. 5) PDF

statIstIcal analysIs Of data The union of the mathematician with the poet, fervor with measure, passion with correctness, this surely is the ideal. —William James (1842–1910), Collected Essays, 1920 IndIvIdual dIfferences OrganIzIng data Frequency Distributions Nominal and Ordered Data Score Data Graphical Representation of Data descrIptIve statIstIcs Measures of Central Tendency The Cost of Neglect 1: Lies, Damn Lies, and Statistics Measures of Variability Understanding the Concept 1: Degrees of Freedom Measures of Relationship Pearson Product-Moment Correlation Other Correlations Regression Reliability Indices Standard Scores statIstIcal Inference Populations and Samples The Null Hypothesis Statistical Decisions and Alpha Levels Type I and Type II Errors InferentIal statIstIcs Testing for Mean Differences The t-Test Analysis of Variance The Power of a Statistical Test Effect Size ethIcal prIncIples summary puttIng It IntO practIce exercIses Web resOurce materIal 01 Statistical Coverage 02 Degrees of Freedom 03 Why the Variance Works 04 Pearson Product-Moment Correlation Procedures 05 Spearman Rank-Order Correlation Procedures 06 Linear Regression 07 Reliability Indices 08 The Standard Normal Distribution 09 Independent Samples t-Test Computational Procedures 10 Correlated t-Test Computational Procedures 11 ANOVA Computational Procedures 12 Effect Size 13 Statistical Theory 14 PASW for Windows Tutorial 15 PASW for Windows Inferential Statistical Analyses 16 PASW for Windows Descriptive Statistics 16 Study Guide/Lab Manual 17 Links to Related Internet Sites From Chapter 5 of Research Methods: A Process of Inquiry, Eighth Edition. Anthony M. Graziano and Michael L. Raulin. Copyright © 2013 by Pearson Education, Inc. All rights reserved. 121 STATISTICAL ANALySIS OF DATA After deciding how to measure variables in a research project, the next step is to determine how to analyze the data statistically. statistical procedures are powerful tools with two broad purposes: describing the results of a study (descriptive statistics), and helping us understand the meaning of those results (inferential statistics). Without the use of statistics, we would learn little from most studies. Statistical procedures and research design are closely related. It is early in the research process—in the procedures-design phase—that we make decisions about which statistical procedures we are going to use. That decision is an integral part of the research design, and must be made long before any data are collected. This chapter provides ■ ■ ■ ■ 01 an overview of basic statistical concepts, strategies for organizing and describing data, an introduction to the logic of statistical decision making, and a discussion of inferential statistics. We have included more detailed coverage of statistical concepts and procedures on the Student Resource Website. IndIvIdual dIfferences No two participants or groups will respond in exactly the same manner, and statistical procedures depend on that variability among participants. Suppose, for example, that a researcher predicts that people who receive memory training will perform better on a memory task than those who do not receive such training. The researcher assigns participants to one of two conditions: (1) memory training, or (2) no training. The dependent measure is a memory test that yields scores from 0 to 100. Table 1 presents hypothetical data for this memory study. Note that the groups differ in their mean (average) scores, but there is also considerable variability of scores within each group. The scores in Group A range from 66 to 98, and those in Group B range from 56 to 94. The variation within each group shows that there are individual differences in memory skills. Some people, with or without training, remember well; others remember very little; most people fall somewhere in between. All organismic variables studied in psychology show individual differences. Therefore, in the memory study, we cannot be sure if memory training is the reason for the observed group differences. It is possible that participants in the training group had better memory initially and would have performed better regardless of the training. Here is an important point for you to remember: most of the variables manipulated in psychology make only small differences in how people perform compared with the individual differences that already exist among people. Statistics help researchers to decide whether group differences on dependent measures are due to research manipulations or are the result of existing individual differences. Research studies generate data (the scores from each person on each of the study’s measures). Some of those data sets are large and unwieldy, and we need ways to organize them. Descriptive and inferential statistics provide this organization and complement each other. descriptive statistics summarize, simplify, and describe such large sets of measurements. Inferential statistics help us to interpret what the data mean. In the study on memory training, the means (a descriptive statistic) of the two groups are different; as predicted, the trained group shows a higher mean score 122 STATISTICAL ANALySIS OF DATA table 1 examples of descriptive statistics These hypothetical data are from 22 participants in a memory study, half of whom received memory training, and the other half of whom did not. We ordered the scores in this table from highest to lowest for easier comparison. statIstIc median mode mean grOup a (traIned) grOup b (nOn-traIned) 98 93 90 89 87 87 84 81 78 71 66 87 87 84 94 88 82 77 75 74 72 72 67 61 56 74 72 74.36 than the non-trained group. The researcher wants to know whether that difference in means is large enough to conclude that it is due to more than chance variation among participants. That is, is the difference between the groups so large that it probably did not occur by chance, but rather is a real effect? Inferential statistics help to answer such questions. Quick-Check Review 1: Individual Differences 1. What is another term for the differences among people? 2. Define “descriptive statistics” and “inferential statistics.” OrganIzIng data This section introduces two groups of descriptive procedures: (1) frequency distributions, and (2) graphical representations of data. We illustrate these procedures with the hypothetical data in Table 2, which represent responses from 24 participants, aged 18 and above, selected at random from the population of a moderate-sized city. The researchers are interested in variables that may relate to voting patterns, and they gathered data from each participant on (1) age, (2) income, (3) number of times voted in the last 5 years, (4) gender, and (5) political affiliation (coded as Democrat, Republican, or other). 123 STATISTICAL ANALySIS OF DATA table 2 sample data from 24 participants persOn 1 age IncOme ($) number Of tImes vOted In last 5 years gender pOlItIcal affIlIatIOn 28 32,000 6 M R 2 46 50,000 4 M D 3 33 44,000 0 F D 4 40 45,000 5 M R 5 21 30,000 1 M R 6 26 35,000 0 F O 7 39 42,000 6 M O 8 23 34,000 0 F D 9 20 27,000 1 M O 10 26 31,000 2 M R 11 29 39,000 6 F R 12 24 34,000 2 M D 13 34 44,000 2 M O 14 35 45,000 3 M O 15 52 46,000 8 M O 16 31 39,000 4 F D 17 30 43,000 6 M R 18 45 47,000 7 F D 19 18 28,000 0 M O 20 29 44,000 7 M R 21 26 38,000 6 F D 22 23 37,000 3 M O 23 47 48,000 7 M D 24 53 51,000 8 M D Note: R, Republican; D, Democrat; O, other. What type of data does each of these variables generate? Consider the variables age, income, and the number of times the participant voted. Each of these measures has the property of magnitude; 34 is older than 25, $35,000 is more than $28,000, and so on. All three measures have the property of equal intervals; the difference in age between 25 and 20 is the same as the difference between 38 and 33. The variables also have a true zero point; a person whose income is zero does not earn anything; a person who has voted zero times in the last 5 years has not voted in that time. These variables are measured on ratio scales and therefore generate score data. The other two variables, gender and political affiliation, are measured on nominal scales producing nominal or categorical data; there is no meaningful way of ordering the categories in a nominal scale. 124 STATISTICAL ANALySIS OF DATA frequency distributions nominal and Ordered data. For most nominal and ordered data, statistical simplification involves computing frequencies: the number of participants who fall into each category. We organize the frequencies into frequency distributions, which show the frequency in each category. Table 3 shows the frequency distribution of gender for the data from Table 2. In any frequency distribution, when we sum across all categories, the total should equal the total number of participants. It is helpful to convert frequencies to percentages by dividing the frequency in each cell by the total number of participants and multiplying each of these proportions by 100, as was done in Table 3. cross-tabulation is a useful way to categorize participants based on more than one variable at the same time, such as categorizing participants based on gender and political affiliation. Crosstabulation can help the researcher to see relationships between nominal measures. In this example, there are two levels of the variable gender (male and female) and three levels of the variable political affiliation (Democrat, Republican, and other), giving a total of six (2 × 3) possible joint categories. We arranged the data in a 2 × 3 matrix in Table 4, in which the numbers in the matrix are the frequency of people in each of the joint categories. For example, the first cell represents the number of male Democrats. Note that the sum of all the frequencies in the six cells equals the total number of participants. Also, note that the row and column totals represent the univariate (one-variable) frequency distribution for the political affiliation and gender variables, respectively. For example, the column totals in Table 4 of 17 males and 7 females represent the frequency distribution for the single variable of gender and, not surprisingly, are the same numbers that appear in Table 3. score data. The simplest way to organize a set of score data is to create a frequency distribution. It is difficult to visualize all 24 scores at a glance for the voting variable shown in Table 2. Some of the participants have not voted at all during that time, and two participants voted eight times, but where do the rest of the participants tend to fall? A frequency distribution organizes the data to answer such questions at a glance. There may be no participants for some of the scores, in which case the frequency would be zero. Table 5 shows the frequency distribution for this voting variable. table 3 frequency of males and females in sample in table 2 males females tOtal frequency 17 7 24 percentage 71 29 100 table 4 cross-tabulation by gender and political affiliation males females tOtal democrats 4 5 9 republicans 6 1 7 Other 7 1 8 totals 17 7 24 125 STATISTICAL ANALySIS OF DATA table 5 frequency distribution of voting behavior in last 5 years number Of tImes vOted frequency 8 2 7 3 6 5 5 1 4 2 3 2 2 3 1 2 0 4 If there are many possible scores, then the frequency distribution will be long and almost as difficult to read as the original data. In this situation, we use a grouped frequency distribution, which reduces the table to a more manageable size by grouping the scores into intervals. A grouped frequency distribution is required with a continuous variable, in which there are theoretically an infinite number of possible scores between the lowest and the highest score. Table 6 shows a grouped frequency distribution for the continuous variable of income, which ranges from $27,000 to $51,000. Grouping salary into $2,000 intervals yields 13 intervals. table 6 grouped frequency distribution for Income 126 Interval number annual IncOme ($) frequency 1 50,000–51,999 2 2 48,000–49,999 1 3 46,000–47,999 2 4 44,000–45,999 5 5 42,000–43,999 2 6 40,000–41,999 0 7 38,000–39,999 3 8 36,000–37,999 1 9 34,000–35,999 3 1 10 32,000–33,999 11 30,000–31,999 2 12 28,000–29,999 1 13 26,000–27,999 1 STATISTICAL ANALySIS OF DATA graphical representation of data A Chinese proverb states, “one picture is worth a thousand words” (Bartlett, 1980), and this is especially true with statistical information. graphs can clarify a data set by presenting the data visually. Most people find graphic representations easier to understand than other statistical procedures. Graphs and tables are excellent supplements to statistical analyses. We can represent frequency or grouped frequency distributions graphically by using either a histogram or a frequency polygon. Figure 1 shows a histogram and a frequency polygon representing the voting data summarized in Table 5. (We generated these graphs in just a few seconds using the PASW for Windows data analysis program.) Both the histogram and the frequency polygon represent data on a two-dimensional graph, in which the horizontal axis (x-axis or abscissa) represents the range of scores for the variable and the vertical axis (y-axis or ordinate) represents the frequency of the scores. In a histogram, the frequency of a score is represented by the height of a bar above that score, as shown in Figure 1(a). In the frequency polygon, the frequency is represented by the height of a point above each score on the abscissa. Connecting the adjacent points, as shown in Figure 1(b), completes the frequency polygon. To aid in the interpretation of histograms and frequency polygons, it is important to label both axes carefully. It is possible to display two or more frequency distributions on the same graph so that one can compare the distributions. Each distribution is graphed independently with different colors or different types of lines to distinguish one distribution from the other. Figure 2 shows the distribution for the voting variable, graphed separately for males and females. When group size is small, a frequency polygon or histogram is usually jagged, but will have an overall shape, like those graphed in Figures 1 and 2. As group size increases, the frequency polygon looks more like a smooth curve. We often describe data by drawing smooth curves, even though such curves are seen only when the group sizes are extremely large. Figure 3 shows several smooth-curve drawings of various distribution shapes frequently found in psychology. Figure 3(a) shows a common shape for a symmetric distribution: a bell-shaped curve. Most of the participants are near the middle of the distribution. In symmetric distributions, (b) 5 4 4 3 Count Count (a) 5 2 3 2 1 0 1 0 1 2 3 4 5 6 7 # times voted past 5 years 8 0 5 1 2 3 4 6 7 # times voted past 5 years 8 histograms and frequency polygons. Graphing the distribution of scores with either a histogram or a frequency polygon helps the researcher to visualize the data. fIgure 1 127 STATISTICAL ANALySIS OF DATA 3.0 2.5 Count 2.0 1.5 1.0 0.5 0.0 0 1 2 3 4 5 6 # times voted past 5 years 7 8 fIgure 2 comparing two distributions. Graphing frequency data from two or more groups on the same histogram or frequency polygon gives a visual representation of how the groups compare. the right and left sides of the distribution are mirror images. Distributions with this bell-shape are normal distributions. Many variables in psychology form normal distributions, including measures of most human characteristics, such as height, weight, and intelligence. In skewed distributions, the scores pile up on one end of the distribution [see Figures 3(b) and 3(c)]. The tail of the curve indicates the direction of the skew. In Figure 3(b), the curve is positively skewed, with most of the scores piled up near the bottom (the tail points toward the high or positive end of the scale). Figure 3(c) is negatively skewed. We might see a negatively skewed distribution on an easy classroom test, on which almost everyone does well and only a few people do poorly. In addition to the shape of the curve, we also describe distributions in terms of the location of the middle of the distribution on the x-axis, which is called the central tendency of the distribution. We can also quantify the horizontal spread of the distribution, which is called the variability of the distribution. The visual display of quantitative information (Tufte, 2001) is an excellent book on the graphical presentation of data. Quick-Check Review 2: Organizing Data 1. What are frequency distributions? With what kind of data can we use frequency distributions? 2. Define “cross-tabulation.” 3. What is the difference between frequency and grouped frequency distributions? 4. What type of variable requires a grouped frequency distribution? 5. What are the basic shapes of distributions found in psychology? 128 STATISTICAL ANALySIS OF DATA (a) Symmetric (b) Skewed Positively (c) Skewed Negatively fIgure 3 symmetric and skewed distributions. Many measures yield the classic bell-shaped distribution shown in (a). When scores bunch up at either the bottom (b) or top (c) of the distribution, the distributions are skewed. descrIptIve statIstIcs Descriptive statistics serve two purposes. The first is to describe data with just one or two numbers, which makes it easier to compare groups. The second is to provide a basis for later analyses using inferential statistics. This section covers measures of central tendency, variability, and relationship and introduces the concept of standard scores. measures of central tendency measures of central tendency describe the typical or average score. They indicate the center of the distribution, where most of the scores cluster. Table 7 summarizes three measures of central tendency: mode, median, and mean. The mode is the most frequently occurring score in the distribution. In the example shown in Table 1, the modes are 87 and 72 for Groups A and B, respectively. In a frequency distribution like the one in Table 5, we can determine the mode by finding the largest number in the frequency column and noting the score with that frequency. In Table 5, the mode is 6. A distribution may have more than one mode. If there are two, then the distribution is bimodal; if there are three, it is trimodal. The mode has the advantage of being easy to compute, but it has the disadvantage of being unstable, which means that it can be affected by a change in only a few scores. We can use the mode with all scales of measurement. table 7 measures of central tendency mode Most frequently occurring score in a distribution median Middle score in a distribution; the score at the 50th percentile mean Arithmetic average of the scores in a distribution; computed by summing the scores and dividing by the number of scores 129 STATISTICAL ANALySIS OF DATA A second measure of central tendency is the median—the middle score in a distribution. The median is also the 50th percentile, which means that half the scores fall below the median. We can easily compute the median if there are few scores and they are ordered from lowest to highest. With an odd number of scores, the median is the (N + 1)/2 score, in which N is the number of scores. In Table 1, there are 11 scores. Therefore, the sixth score [(11 + 1)/2] will be the median. The sixth score in a group of 11 scores will be exactly in the middle, with 5 scores above it and 5 scores below it. When there is an even number of scores, there will be two middle scores; the median is the average of the two middle scores. In Table 1, the median for Group A is 87; in Group B, it is 74. The median can be appropriately used with ordered and score data, but not with nominal data. (To the student: See if you can figure out why the median is not appropriate for nominal data.) The most commonly used measure of central tendency is the mean—the arithmetic average of all of the scores. We compute the mean by summing the scores and dividing by the number of scores as follows: Mean = X = aX N (1) The term X (read “X bar”) is the notation for the mean. The term a X (read “sigma X”) is summation notation and simply means to add all the scores. The mean is appropriate only with score data. (To the student: Why is this so?) Researchers frequently use the mean and the median to describe the average score. The median gives a better indication of what the typical score is if there are a few unusually high or low scores in the distribution, as discussed in Cost of Neglect 1. The mean, on the other hand, is more useful in other statistical procedures, such as inferential statistics. the cOst Of neglect 1: lIes, damn lIes, and statIstIcs Mark Twain talked about “lies, damn lies, and statistics,” and his attitude strikes a responsive chord in many people. However, statistics are not inherently deceptive. Nevertheless, if you do not understand statistics, it is easy for someone to deceive you by selecting those statistics that make their case and ignoring the ones that contradict it. Let’s play with some numbers to show how easy this is. Imagine a five-person company in which everyone, including the owner, makes an annual salary of $40,000. The mean, median, and mode are all $40,000. Now suppose that business picks up dramatically and profits soar. The owner of the company decides to take all the additional profit, giving none of it to the employees. So now four people make $40,000, and the owner makes $340,000. Table 8 illustrates these data. 130 table 8 average Incomes example fIrst year ($) secOnd year ($) change (%) 40,000 40,000 0 40,000 40,000 0 40,000 40,000 0 40,000 40,000 0 40,000 340,000 750 The mode (most frequent salary) is still $40,000, and the median (the middle salary) is still $40,000. However, the mean is now $100,000 ($500,000/5). The third column of the table reflects the percentage change in the salary for each employee. Both the mode and the median for these salary increases is 0%, but the owner STATISTICAL ANALySIS OF DATA received 750%. If you compute the mean salary increase, you get 150%, which hardly reflects the typical situation in this company. Because of all this new business, the owner wants to hire new people to continue the growth. To entice new people, the owner offers a starting salary of $30,000, but tells prospective employees that there is plenty of room for advancement, noting that the mean salary is $100,000 and that the average percentage increase in salary in the past year was 150%. Those statistics are all accurate, but do they lie? The answer is actually no. The owner may be lying by presenting a misleading selection of statistics, but the statistics themselves are true. Statistics don’t lie to people who (1) have all the relevant statistics and (2) know how to interpret them. If you apply for a job and are told the mean income for the company, you should ask what the median income is. Almost every company has a few critical people who earn more than most of the rest of the company’s employees, so the median will give you a better idea than the mean what the typical salary is. If you are told that the mean salary increase last year was 150%, you might want to ask if that was across the board (meaning everyone got a 150% increase). If not, you might ask what the median increase was. For this hypothetical company, the median increase was 0%. If you insist on having all the statistics, you cannot be lied to unless the statistics were deliberately falsified. In that case, it is not the statistics that are lying, but rather the statistician who falsified the statistics. measures of variability In addition to measures of central tendency, it is important to determine the variability of scores. We illustrate the concept of variability in Figure 4, which shows two distributions with identical means. However, curve A is narrower; that is, the scores are bunched closer together. They are less variable than the scores of curve B. For example, suppose that you compared the ages of people who attend county fairs with the ages of those who attend pop music concerts. you would probably find that those attending county fairs range from infants to people over 90, whereas pop concert attendees are mostly in their teens and twenties, with few young children or people over 30. Clearly, there is far more age variability at a county fair than at a typical pop concert. Variability is an important concept and an easy one to understand. Participants differ from one another on many variables. For some variables, there are large differences among participants; Frequency Curve A Curve B 0 5 10 15 20 Score two distributions with the same mean but different variances. Although both of these distributions have the same mean, they differ in their variability. fIgure 4 131 STATISTICAL ANALySIS OF DATA table 9 measures of variability range Distance from the lowest to the highest score in a distribution; may be specified by either giving both the lowest and highest scores or by subtracting the lowest from the highest score and reporting this value average deviation Arithmetic average of the distance that each score is from the mean variance Essentially the average squared distance from the mean; the variance is computed by summing the squared distances from the mean and dividing by the degrees of freedom (equal to the number of scores minus 1) standard deviation The square root of the variance for other variables, the differences are small. There may be many reasons why scores vary among participants, but you need not worry about the reasons at this point. The important ideas to remember are that scores vary and that we can quantify the degree of variability. Individuals differ from one another on many factors, and these differences affect their responses to stimuli. This variability among participants (thought of as natural variability) often masks the effects of the psychological variables under study. Most research designs and statistical procedures were developed to control or minimize the effects of the natural variability of scores. Table 9 summarizes the measures of variability. The simplest measure of variability is the range, the distance from the lowest to the highest score. The range is easy to compute, but is unstable because it depends on only two scores (the highest and lowest), and therefore a single unusually high or low score can dramatically affect the range. For example, the scores for curve A in Figure 4 range from 3 to 17 (a range of 14), and the scores for curve B range from 1 to 19 (a range of 18). However, if one more score were added to curve A (a score of 21), the ranges for curves A and B would be equal. Note, however, that even with the addition of this one deviant score, the scores are more tightly clustered (less variable) in curve A than in curve B. A better measure of variability is the variance. The variance utilizes all of the scores, instead of just the lowest and highest scores. Furthermore, it has statistical properties that make it useful in inferential statistics. To begin our discussion of variance, suppose that you have a set of scores, and you have calculated the mean of this set. Now suppose that you ask a reasonable question about variability: On average, how much do the scores in this set differ from the mean of the set? It is a simple matter to find this value; just subtract the mean from each score (called the deviation), add up these deviations (ignoring the + and − signs), and find their average by dividing the sum of the deviations by the number of scores (called the average deviation). The scores in Table 10 differ from the mean by an average of 1.6 units. The plus or minus sign is ignored when adding the deviations because, if the sign is not ignored, the average deviation from the mean will always be zero, no matter how variable the scores. The average deviation is included here to help explain the concept of deviation. We never use it in statistical analyses, because it lacks the statistical qualities that would make it useful. Instead, the variance and standard deviation are used, both of which use the same concept of variability of scores from the mean. (Be sure you understand what deviation scores are; if it is not clear, go over that material again.) 132 STATISTICAL ANALySIS OF DATA table 10 computing measures of variability Compute the average deviation, variance, and standard deviation for the following data. 1. start by setting up three columns labeled X, | X - X |, and 1 X - X 2 2. tOtals X |X - X| 10 2.0 7 1.0 8 0.0 0.00 5 3.0 9.00 10 2.0 4.00 40 8.0 18.00 1X - X22 4.00 1.00 2. compute the mean. X = 40 aX = = 8.0 N 5 3. compute the average deviation. [note that | X - X | means to take the absolute value of the difference, which means that you should ignore the direction of the difference in computing this value.] Average Deviation = 8.0 a|X - X| = = 1.60 N 5 4. compute the variance. s2 = 2 18.00 a 1X - X2 = = 4.50 N - 1 5 - 1 5. compute the standard deviation. s = 2s 2 = 24.50 = 2.12 We calculate the variance by squaring the deviations of the scores from the mean to make them all positive. Therefore, the variance is essentially the average squared deviation of each score from the mean. The notation s2 refers to variance. The equation for variance is s2 = 02 2 SS 1Sum of Squares2 a (X - X) = df 1Degrees of Freedom2 N - 1 (2) The variance equals the sum of the squared differences of each score from the mean (called the sum of squares) divided by the number of scores (N) minus 1 (called the degrees of freedom). The degrees of freedom is an important concept in statistics, referring to the number of scores that are free to vary. Understanding the Concept 1 explains the idea of degrees of freedom. We provide a more detailed discussion of degrees of freedom on the Student Resource Website. 133 STATISTICAL ANALySIS OF DATA understandIng the cOncept 1: degrees Of freedOm “Degrees of freedom,” a basic statistical concept used in many statistical computations, refers to the number of scores that are free to vary. Suppose that someone asks you to pick any three numbers. There are no restrictions, and the numbers are completely free to vary. In standard terminology, there would be three degrees of freedom; that is, three numbers are free to vary. Suppose someone now asks you to choose any three numbers, but they must total 15; in this case, there is one restriction on the numbers. Because of the restriction, you will lose some of the freedom to vary the numbers that you choose. If you choose the numbers 8 and 11 as the first two numbers, the third number must be −4. Two numbers are free to vary, but one is not. In standard terminology, there are two degrees of freedom. In comparison to the first example, in which there were no restrictions and all the numbers were free to vary, we have lost one degree of freedom. 03 Now suppose that you are to choose three scores in which (1) the total must be 15, and (2) the first score must be 7. Note that there are two restrictions placed on this set of scores. Consequently, two degrees of freedom have been lost, leaving only one degree of freedom. The only score that can vary freely is the second score. In statistics, the restrictions imposed on data are not arbitrary as they were in these examples. Instead, they are determined by the demands of the statistical procedures. For example, many statistical procedures require that we estimate values, such as the population mean. These estimates constitute restrictions. The more such restrictions there are, the more degrees of freedom we lose. In the computation of the variance, one such restriction is imposed and, consequently, the degrees of freedom are reduced by one. Hence, the denominator is N − 1. To use equation (2) 1. compute the mean; 2. subtract the mean from each score and square this difference; 3. sum the squared differences to calculate the sum of squares; [the sum of squares (SS) is short for “the sum of squared deviations from the mean.”] 4. divide the sum of squares (SS) by the degrees of freedom (N – 1) to obtain the variance. Table 10 shows this computation. The variance is an excellent measure of variability and is used in many inferential statistics. The Student Resource Website explains why the variance, which is not an intuitive measure of variability for most students, is used so frequently in inferential statistics. Note that the variance is expressed in squared units because we squared the deviation scores before summing them. In contrast, the mean is expressed in the original units of the variable. We can easily transform the variance back into the same units as the original scores by computing the standard deviation. The standard deviation (written s) is equal to the square root of the variance. (Note that the variance and standard deviation can be used only with score data.) s = 2s 2 = 2Variance (3) measures of relationship At times, we want to quantify the strength of the relationship between two variables, which indicates the degree to which the two scores tend to covary (vary together). The best way to index the relationship between variables is with a correlation coefficient, also referred to as a correlation. There are different correlation coefficients for different types of data. 134 STATISTICAL ANALySIS OF DATA 60 50 Age (in years) 04 pearson product-moment correlation. The pearson product-moment correlation is the most widely used correlation, but it is appropriate only for score data. The Pearson product-moment correlation can range from −1.00 to +1.00. A correlation of +1.00 means that the two variables are perfectly related in a positive direction; as one variable increases, the other variable also increases by a predictable amount. A correlation of −1.00 represents a perfect negative relationship; as one variable increases, the other decreases by a predictable amount. A correlation of zero means that there is no relationship between the variables. The size of the correlation coefficient indicates the strength of the relationship. For example, a correlation of .55 indicates a stronger relationship than a correlation of .25, and a correlation of −.85 indicates an even stronger relationship. Remember, the sign of the correlation indicates only the direction of the relationship and not its strength. The standard notation for correlation is r. Thus, the correlations above would be noted as r = .55, r = .25, and r = −.85. The Pearson productmoment correlation is covered in more detail on the Student Resource Website. The Pearson product-moment correlation is an index of the degree of linear relationship between two variables. What this means is best illustrated by examining a scatter plot, which is a graphic technique used to represent the relationship between two variables. To construct one, we label the standard x- and y-axes with the names of the two variables. We have created a scatter plot in Figure 5 for the relationship between age and income using data from Table 2. As indicated in the figure, participant 1 is 28 years old and earns $32,000 a year. The point that represents participant 1 is directly above $32,000 on the x-axis and directly across from 28 on the y-axis. To complete the scatter plot, we plot each person’s set of scores in the same way. The pattern of scores in the scatter plot is informative. For example, the people with the highest incomes are all older; younger people tend to have lower incomes. We could draw a straight line through the middle of the dots from the lower left to upper right of the graph, with most of the dots falling close to that line. This is a good example of a linear relationship because the points in this scatter plot cluster around a straight line. It is a positive correlation because incomes are higher for older participants. 40 Participant 1 30 20 $25,000 $30,000 $35,000 $40,000 $45,000 $50,000 Annual Income scatter plot for age and salary. We construct a scatter plot by graphing each person’s data point, which is determined by the two scores for that person. fIgure 5 135 STATISTICAL ANALySIS OF DATA It is not a perfect correlation. In a perfect correlation (r = 1.00), all the dots form a straight line, as seen in Figure 6(a). The scatter plots in Figure 6 illustrate several types of relationships. Figure 6(b) illustrates a strong negative correlation (r = −.92). Note that the points cluster close to a straight line. Figure 6(c) illustrates a zero correlation (r = .00). Figure 6(d) illustrates a nonlinear relationship, in which the correlation coefficient does not represent the data well. In fact, in this case, the correlation (r = −.03) is misleading. The near-zero correlation suggests there is no relationship between the variables, but the scatter plot indicates there is a relationship; it is just not a linear (straight-line) relationship. This is one reason why it is advisable to create a scatter plot to see how the scores cluster, instead of relying on a single number (the correlation coefficient) to summarize the relationship between variables. With modern computer packages, it takes just a few seconds to create a scatter plot. Other correlations. If either or both variables are measured on an ordinal scale and neither variable is nominal, the appropriate coefficient is the spearman rank-order correlation. If both of the variables produce nominal data, the appropriate coefficient is phi. We interpret the Spearman correlation like the product-moment correlation: a correlation of −1.00 is a perfect negative relationship; a correlation of +1.00 is a perfect positive relationship; a correlation of zero means that no linear relationship exists. We interpret Phi a bit differently because Y (a) Y 6 6 3 3 (b) r = 1.00 r = –.92 X Y (c) X Y 6 6 3 3 (d) r = –.03 r = .00 X 3 6 9 X scatter plots and regression lines. you cannot always tell what the relationship between two variables is like from the correlation. A scatter plot allows you to see the relationship, including such complex relationships as the one shown in (d) (see text). fIgure 6 136 STATISTICAL ANALySIS OF DATA 05 there really is no direction to a correlation between nominal variables, which have no natural ordering. Therefore, we interpret only the size of the correlation, again with a correlation of 1.00 indicating a perfect relationship and zero indicating no relationship. The Spearman rank-order correlation is covered in more detail on the Student Resource Website. 06 regression. Correlation coefficients quantify the degree and direction of relationship between variables. Finding such relationships is a major goal of science. Another goal is to make predictions about events. The correlation coefficient is an important part of this, because a strong relationship between two variables provides information that will help to predict one variable by knowing the values of the other. For example, if there is a correlation between test scores and later job performance, then we have information that may help us to predict future job performance. “regression” refers to the prediction of the value of one variable from the value of another. We typically assume a linear or straight-line relationship. Nonlinear regression is possible, but the applicable procedures are well beyond the level of this text. you may have noticed that we drew a line in each of the scatter plots in Figure 6. This line is the linear regression line for predicting the variable Y from the variable X. In Figures 6(a) and 6(b) the points cluster close to the line, suggesting a strong linear relationship. When the correlation is zero, as in Figure 6(c), the line is horizontal. In Figure 6(d), the regression line, like the correlation, is misleading in that it does not reflect the data well. Statistical analysis packages can compute a regression line easily for any data set. However, you should always request a scatter plot so that you can see how well the data fit a straight-line function. The Student Resource Website covers both the theoretical aspects of regression and the computational procedures. 07 reliability Indices. Correlation coefficients quantify test-retest and interrater reliability. Since these reliability indices are correlations, they behave like any other correlation. They range from a −1.00 to a +1.00, although negative correlations for reliability are unlikely unless something is seriously amiss, such as raters using different rating scales. A correlation of +1.00 indicates perfect reliability, and a correlation of 0.00 indicates no reliability. The internal consistency reliability index, called coefficient alpha, is also a correlation coefficient, although a much more complicated one than those covered in this chapter. Coefficient alpha is an index of the degree of intercorrelation among the items in a measure. The more highly correlated the items are with one another, the higher the coefficient alpha. The Student Resource Website covers the computation and interpretation of the various reliability indices. standard scores The standard score (written Z; also called the Z-score) is a transformation frequently used in research. We compute the standard score by subtracting the mean from the score and dividing the difference by the standard deviation, as shown in equation (4). The standard score is a relative score because it tells how a participant scored relative to the rest of the participants. If the participant scores above the mean, the standard score is positive; if the participant scores below the mean, the standard score is negative. The size of the standard score indicates how far from the mean the participant scored. Z = X - X s (4) 137 STATISTICAL ANALySIS OF DATA 08 Many tests convert the standard score to avoid negative numbers and decimals. For example, the standard score on an intelligence test is converted into an IQ by multiplying the standard score by 15, adding 100, and rounding to the nearest whole number, producing an IQ distribution with a mean of 100 and a standard deviation of 15. The SAT uses a similar conversion to give it a mean of 500 and a standard deviation of 100 for each subtest. If the distribution is approximately normal, we can easily convert the standard score into a percentile rank. A person’s percentile rank tells what percent of the group scored below the person. The details of this transformation are included on the Student Resource Website. Quick-Check Review 3: Descriptive Statistics 1. 2. 3. 4. 5. 6. 7. 8. What are the three measures of central tendency? What are deviation scores? What are the chief measures of variability? Define “correlation” and “regression.” How are these used? What is a standard score, and how is it computed? Why is the variance a better measure of variability than the range? Why is the mean a better measure of central tendency than the mode? How does a correlation differ from other descriptive statistics? statIstIcal Inference Using statistics to describe data is the first step in analyzing the results of a research study. The rest of the analysis concerns not so much the specific participants tested, but what the data indicate about a larger group. This section will cover the logic of this process, and the next section will cover the statistical procedures. populations and samples It is seldom possible to observe whole populations, unless the population is extremely small, like an endangered species that lives in only one specific place. Instead, we select and observe relatively small samples from relatively large populations. The researcher uses a sample as if it adequately represents the population. In human research, a population is the larger group to which all the people of interest belong, and a sample is a subset of that population. For example, a researcher might select a sample of 100 high school students from the school’s total population of 1,000 high school students. Here’s an important distinction to keep in mind: we carry out research projects on samples; but we generalize the results to populations. It is the population that we are interested in, not the sample. We want to draw conclusions about a population based on samples from that population, but how accurate are the samples? How confident can we be in generalizing results from samples to the population? The use of inferential statistics helps us to solve that issue. No two samples drawn from the same population will be exactly alike. Most samples are reasonably representative of the population from which they are drawn, but sometimes samples are 138 STATISTICAL ANALySIS OF DATA unrepresentative, even though the researcher carried out the sampling procedure flawlessly. This normal variation among different samples drawn from the same population is called sampling error. This term does not suggest a mistake, but refers to the natural variability among samples due to chance. Because samples are not perfectly representative of the population from which they are drawn, we cannot be sure that conclusions drawn from samples will apply to the entire population; the best we can do is to calculate probabilities that our inferences are valid. probability provides a numerical indication of how likely it is that a given event, as predicted by our findings, will occur. Probability is a critical concept in inferential statistics, in which the goal is to help us to differentiate between “chance” patterns (like differences and relationships) that are due to sampling error, and “real” patterns that are due to other factors (like our research variables). Suppose that you are comparing men and women in a study of gender differences in reaction time by recording how quickly participants press a button in response to an auditory signal. The mean reaction times are 0.278 seconds for men and 0.254 seconds for women. The sample means are different, but not very different. However, you are not primarily interested in the samples, but in the population of men compared with the population of women. The aim is to draw conclusions about characteristics of the populations from the results of the samples. Does the observed difference in mean reaction time in these samples suggest that a similar difference exists between the populations, or could the observed difference be a result of sampling error? Suppose that men and women have the same reaction time. In that case, samples drawn from the populations should have approximately equal mean reaction times. Are the mean reaction times of 0.278 seconds and 0.254 seconds approximately equal? Are they close enough to infer that the population means, which are unknown, are equal? Answering these questions involves testing the null hypothesis. the null hypothesis The null hypothesis states that there is no statistical difference between the population means. Null is from the Latin nullus, meaning “not any.” If the observed sample means were very different, we would reject the null hypothesis (i.e., of no difference) and conclude that the population means are not equal. The question is, “How different is very different?” The use of inferential statistics gives a probabilistic answer to this question. statistical decisions and alpha levels Researchers use inferential statistics to compute the probability of obtaining the observed data if the null hypothesis is true. If this probability is large, then the null hypothesis is likely to be true. Alternatively, if this probability is small, the null hypothesis is likely to be false. In that case, the researcher would say that the results are statistically significant. To say a finding is statistically significant means that it is unlikely that the finding is due to chance alone. An arbitrary cutoff point called the alpha level (written ~) is used for making this decision.i Traditionally, researchers set alpha to a small value, such as 0.05 or 0.01. Referring back to the example will help to clarify these important concepts. Let’s walk through the steps. iAlpha, as used here, is entirely different from the reliability index of coefficient alpha. It is a historical accident that the first letter of the Greek alphabet was used for these two different statistical concepts. 139 STATISTICAL ANALySIS OF DATA 1. The researcher is interested in the reaction time of men and women. 2. The null hypothesis states that the mean reaction times of the two populations do not differ. 3. The inferential statistical procedure evaluates the size of the observed mean difference between the samples. 4. If the sample means are so different that it is unlikely that the samples could have come from populations with equal means, then we reject the null hypothesis and we infer that the population means must be different. type I and type II errors The alpha level guides our decisions about the null hypothesis. When the probability is greater than the alpha level, we retain the null hypothesis; when the probability is equal to or less than the alpha level, we reject the null hypothesis. Of course, there is always the chance that a researcher’s decision will be wrong. For example, a researcher might reject the null hypothesis and conclude that the population means are not equal when they actually are equal. In this case, the researcher has made a type I error. (In a Type I error, we conclude that there is a statistically significant difference when there actually is not.) The probability of this error is equal to the alpha level that the researcher selects. The alpha level is the proportion of Type I errors that we can expect to make if we repeat the study many times. If an alpha of .05 is used, Type I errors will occur 5% of the time. If the alpha is .01, Type I errors will occur 1% of the time. If alpha is the level of Type I error and the researcher decides what alpha to use, why not set alpha to zero to avoid all Type I errors? The reason is that there is another possible error, known as a Type II error. A type II error occurs when we fail to reject the null hypothesis when it is false; that is, when we conclude the population means are equal when they are not. (In a Type II error, we conclude there is no significant difference when there actually is.) The term “beta” (b) refers to the probability of making a Type II error. Researchers want to avoid both errors, but because they can never be sure of the real state of nature, there is always the chance for error in the decision. Decreasing the Type I error rate, without doing anything else, will automatically increase the Type II error rate. Therefore, we must balance these two errors against one another. The researcher handles this by selecting a Type I error rate and adjusting the sample size to control the Type II error rate. The procedures for determining the sample size to get a specific Type II error rate are beyond the scope of this text, but we will discuss the principles shortly in our discussion of power. Table 11 summarizes the definitions of Type I and Type II errors. table 11 type I and type II errors researcher’s decIsIOn true state Of nature reject the null hypOthesIs retaIn the null hypOthesIsa null hypothesis is true Type I error Correct decision null hypothesis is false Correct decision Type II error aTechnically, we never actually accept the null hypothesis. Instead, we “retain” or “fail to reject” the null hypothesis. The interested student should consult an introductory statistics textbook for the reasoning behind this subtle distinction. 140 STATISTICAL ANALySIS OF DATA Quick-Check Review 4: Statistical Inference 1. Distinguish populations from samples. 2. What is sampling error? 3. Define “alpha level,” “Type I error,” and “Type II error.” InferentIal statIstIcs We use inferential statistics to draw inferences about populations based on samples drawn from those populations. This section introduces the most common inferential statistics and several statistical concepts, including statistical power and effect size. testing for mean differences Researchers use inferential statistics most frequently to evaluate mean differences between groups. The two most widely used procedures are the t-test and the analysis of variance. 09 10 11 the t-test. The most commonly used procedure for comparing two groups is the t-test. The null hypothesis for the t-test states that there is no difference between the two population means; that is, the observed difference between the sample means is due only to sampling error. The general procedure in inferential statistics is to compute the test statistic (t in this case) and the probability (p-value) of obtaining this value of the test statistic if the null hypothesis is true. If the p-value is less than alpha, we reject the null hypothesis and conclude that the population means are different. There are two different types of t-tests. Which one you should use will depend on the research design. The procedures for computing these t-tests are included on the Student Resource Website. analysis of variance. We use an analysis of variance (anOva) to test for mean differences between two or more groups. The term “analysis of variance” is confusing because the test actually compares the group means, but it compares these means by computing and comparing different population variance estimates. Explaining how this is accomplished is beyond the scope of this chapter. ANOVA is one of the most flexible analysis procedures used in psychology. There are many variations. Which variation of ANOVA you should use will depend on the research design. The Student Resource Website shows how to compute these inferential statistics either manually or by using the PASW statistical software program. the power of a statistical test The term “power” or “statistical power” refers to a statistical procedure’s sensitivity to mean differences. Power is the capability of correctly rejecting the null hypothesis when it is false. It is in the 141 STATISTICAL ANALySIS OF DATA researcher’s interest to increase the power of the tests (i.e., the tests’ ability to recognize when the means differ). Note that if you increase power, you are decreasing the Type II error level. The primary way to increase power is to increase sample size. The sample size needed to achieve a specified level of power can be computed based on pilot data as part of the proceduresdesign phase, a process called power analysis (Cohen, 1992). Most graduate-level statistics textbooks cover this procedure. Power is not a function of the statistical procedure or sample size alone, but also depends on the precision of the research design. To understand power, you need to know that any improvement in a research design that increases sensitivity will increase power. This includes sampling more precisely, using more precise measures, using better standardization of procedures, and controlling individual differences through the choice of a research design. effect size The primary way to increase power, which is the same as decreasing Type II error, is to increase the sample size. In theory, you can increase the sample size of a study to a level that will detect even the most trivial difference between two populations. you might think that such action would be desirable because the researcher wants to find statistically significant differences in the research, but consider the following. Suppose research on obesity compares an experimental group that attends a weight-reduction program with a no-treatment control group. After 6 months, the treated group has lost a mean of 2.2 pounds and the control group gained 0.2 pounds, and this difference is statistically significant. However, the effect of that weight-reduction program is so small as to be trivial. Although it is statistically significant, is that small weight loss after 6 months of intense effort of any personal significance? Most dieters would say no. It has become a standard procedure for researchers to go a step beyond statistical significance testing and to compute the effect size of an experimental manipulation. This is a measure of the size of the difference between the group means expressed in standard deviation units, as shown in equation (5). For example, an effect size of 0.5 says the mean difference is 0.5 times the standard deviation. Cohen (1992) proposed a scale for the size of the effect (small, 0.2; moderate, 0.5; or large, 0.8) to guide researchers in interpreting the calculated effect. Effect Size = 12 142 X1 - X2 Average s (5) Now, a little test. Think this through: suppose that research on three different treatments to reduce phobic anxiety shows that all three are effective and the results are statistically significant. The study also reports effect sizes of 0.1, 0.2, and 0.8. What does that information suggest about the three treatments? As effect size increases—that is, as the difference between populations means increases— power increases. The reason for the increase in power is that it is easier to detect large differences in population means than small differences. Think of it this way: it would be easier to find a shovel in a haystack than a needle, because the shovel is larger and therefore easier to detect. The same is true of effect sizes. If an independent variable has a small effect, it is like looking for a needle in a haystack, but large effect sizes stand out more readily and therefore are easier to detect statistically. The details of computing the effect size are covered on the Student Resource Website. STATISTICAL ANALySIS OF DATA 13, 14, 15, 16 The Student Resource Website also has a series of tutorials on statistical theory, using PASW for Windows, and computing descriptive and inferential statistics using PASW for Windows. These resources give you maximum flexibility in deciding how much statistical detail you want to cover. Quick-Check Review 5: Inferential Statistics 1. Which statistical tests evaluate mean differences? Under what conditions should each of these statistical tests be used? 2. What is power analysis and why is it important? 3. What is effect size? What important information does this measure provide? ethIcal prIncIples We noted in the earlier Cost of Neglect box that it is possible to deceive people by selecting those statistics that support one’s position and ignoring contradictory information. Scientists have a particularly important ethical obligation to present their findings and the statistics that summarize those findings in a manner that accurately reflects the data. Selecting data or using statistical procedures that deliberately emphasize some aspects while de-emphasizing other aspects in order to distort the results and mislead people is dishonest and unethical. It is called cherry-picking the data and, like data fabrication and plagiarism, is grossly unethical scientific behavior. It falls into the ethical responsibilities category of honestly and accurately obtaining and reporting data and authorship. Cherry-picking is not limited to scientific data but occurs throughout society. In

Textbook (Ch. 5) PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue