Psychological Measurement: Differences, Consistency, and Test Scores PDF
Document Details

Uploaded by ZippyHeliotrope9386
Háskóli Íslands
Tags
Related
- Technical Assistance Paper on Standard Error of Measurement (SEm) PDF
- Psychological Testing and Assessment - An Introduction (PDF)
- Concepts of Reliability (PDF)
- Psychological Testing & Measurement PDF
- Chapter III Reliability and Validity PDF
- The Importance of Reliability: Psychological Measurement PDF
Summary
This document, the third chapter of a book, delves into psychological measurement, focusing on differences, consistency, and the interpretation of test scores. It discusses the importance of variability in understanding individual differences and how these concepts underpin psychometric principles and the analysis of test results, with a focus on fundamental statistical concepts.
Full Transcript
Here is the converted text from the images into a structured markdown format # 3 DIFFERENCES, CONSISTENCY, AND THE MEANING OF TEST SCORES This chapter covers three key building blocks of psychological measurement: differences (variability), consistency (covariability), and the interpretation of te...
Here is the converted text from the images into a structured markdown format # 3 DIFFERENCES, CONSISTENCY, AND THE MEANING OF TEST SCORES This chapter covers three key building blocks of psychological measurement: differences (variability), consistency (covariability), and the interpretation of test scores. These issues are fundamental to measurement theory, test evaluation, and test use. Many of the concepts covered in this chapter are statistical in nature, and some will be familiar to many of you. These concepts are crucial for a deep and coherent understanding of psychometrics and the meaning of psychological test scores. The current chapter integrates these three key building blocks. It begins by discussing variability—the degree of differences within a set of test scores or among the values of a psychological attribute. We will first discuss the importance of this concept and then cover the procedures for quantifying variability. Next, the chapter describes the concept of consistency or covariability the degree to which variability in one set of scores corresponds with (or is consistent with) variability in another set of scores. We again discuss the importance of the concept and then cover the procedures for quantifying such covariability. Finally, the chapter turns to the interpretation of test scores, describing the procedures that test users and test takers use to interpret those scores. As we will see, these procedures are firmly based on the concept of variability. # THE NATURE OF VARIABILITY As mentioned previously, psychological measurement rests on the assumption that people differ (or might differ) in their behavior or other psychological characteristics. This assumption is sometimes explicit, as in research that explores the source and meaning of psychological differences among people. However, this assumption is sometimes implicit. For example, a test user might wish to understand a single individual, as in making a diagnosis regarding mental disability. Even in such "single-case" situations, the measurement process rests on the assumption that differences exist among people and that a diagnostic measure is capable of detecting those differences. There are at least two kinds of differences that behavioral scientists attempt to measure. **Interindividual variability** refers to differences that exist between people. For example, when high school students take the SAT, all do not get the same score. The differences among the students' SAT scores represent interindividual differences. Similarly, when a researcher conducts an experiment and measures a dependent variable (DV), the participants do not all have the same score on the DV. The differences among the participants' scores are interindividual differences. This latter fact is sometimes misunderstood or not fully appreciated. This is because, in an experiment, some of the interindividual differences are between people in the same experimental group, and some are between people who are in different experimental groups. In such studies, researchers often focus on the differences between groups; however, it is important to realize that such "between-group" differences are fundamentally differences between people. In this way, even strict, heavily controlled experimental research hinges on the measurement and analysis of interindividual differences. The other kind of differences that behavioral scientists attempt to measure is **intra-individual variability**. These are differences that emerge in one person over time or under different circumstances. For example, intraindividual differences might be seen if we recorded changes in a psychiatric patient's symptom level over the course of therapeutic treatment. Our ability to create, evaluate, and ultimately use any measure in psychology requires that psychological differences exist and can be quantified. This chapter primarily focuses on interindividual variability, which is consistent with many applications of psychological measurement. # IMPORTANCE OF INDIVIDUAL DIFFERENCES It would be nearly impossible to overemphasize the importance of individual differences in psychology. Psychology is about variability in the behavior of individuals. Indeed, the behavioral sciences are largely about understanding differences among people, including differences among people in different groups or differences among people in different conditions. The measurement of those differences is a necessary component of those sciences. As has been emphasized, variability is at the heart of research and the application of research in the behavioral sciences. In a research context, behavioral scientists often strive to understand important differences among people (including differences between groups of people). When psychologists and other behavioral scientists study phenomena such as aggression, intelligence, psychopathy, happiness, marital satisfaction, or academic aptitude, they are attempting to identify and understand the causes and consequences of differences between people. Why are some people more aggressive than others? Are differences in intelligence associated with differences in biological traits? Is variability in parental marital satisfaction related to variability in children's self-esteem? Do differences in medication dosage affect differences in patients' levels of depressive affect? All such questions begin with the assumption that people differ in important ways and that these differences can be measured. In an applied context, behavioral scientists often assume that psychological characteristics can and do vary. Employers attempt to detect variability in characteristics such as conscientiousness, integrity, and intelligence to improve their hiring efficacy. College admissions committees attempt to detect variability in academic aptitude to improve their admission choices. Clinicians attempt to detect variability in various psychological disorders to identify which clients might benefit from which therapeutic interventions. Individual differences are also fundamental to psychological measurement. As described earlier, measurement is based on the simple but crucial assumption that psychological differences exist and can be detected through well-designed measurement processes. The existence and detection of individual differences lie at the heart of test construction and test evaluation. More specifically, as we shall see, psychometric concepts such as reliability and validity are entirely dependent on the ability to quantify the differences among people. As noted in *Chapter 1*, individual differences are often viewed as a concern only for those who construct and use traditional psychological tests. For example, the study of individual differences is often seen as relevant only to researchers who study personality, intelligence, or achievement. This traditional, commonly held view limits the importance or meaning of "individual differences" to only some areas of behavioral science. As such, this view limits the relevance of psychometrics to only a few areas of behavioral science. Although this view is long-standing and common, it is simply incorrect. In fact, all research in psychology and all scientific applications of psychology depend on the ability to measure individual differences. For example, research in experimental psychology involves exposing people to different experiences and then measuring the effects of these experiences on their behavior. For example, a clinical experimentalist might want to test the efficacy of a new medication for the treatment of depression. To do this, she randomly assigns some depressed individuals to receive a new medication and randomly assigns other depressed individuals to receive placebo pills. She then measures all the individuals' levels of depressive affect after 2 months of taking their respective "treatments.” Of course, the researcher likely expects to find that the differences between the individuals' depression levels are highly related to the type of "treatment" they received-individuals who received the new medication are expected to have lower levels of depression than are individuals who received the placebos. In this way, the experimental psychologist is trying to show that individual differences in a psychological response(i.e., depressive affect) are, in part, related to the types of medication the participants took. Likewise, in all cases, the scientific application of psychology requires at a minimum that individual differences be measured. For example, in clinical settings, diagnosis of psychological pathology rests on a clinician's ability to measure the pathology. This requires that the clinician can show how an individual with the pathology differs from those who do not exhibit the pathology. If the clinician is committed to the science of psychology, then they will also try to determine whether there is a change in client behavior over time and, if there is a change, try to establish whether the change might be attributable to therapy. It is important to realize that any domain of scientific psychology-experimental or nonexperimental, basic or applied-depends on the existence and quantification of individual differences. To quantify psychological differences, we begin by assuming that scores on a psychological test or measure will (or at least can) vary from person to person or from time to time. When taken from a group of people or at different points in time from the same individuals, a set of test scores is called a **distribution of scores**. The differences among the scores within a distribution are often called variability. A key element in most behavioral research is to quantify precisely the amount of variability within a distribution of scores. # VARIABILITY AND DISTRIBUTIONS OF SCORES To understand many key concepts in psychometrics, we must first understand some basic statistical concepts. In particular, we must understand how the concepts of variance and covariance are computed and how they are related to each other. Variance is a statistical way of quantifying variability or individual differences in a distribution or set of scores. Covariance is a way of quantifying the connection between variability in one set of scores and variability in another set of scores. Many fundamental concepts in psychological measurement emerge from the detection and description of distributions of test scores. When a group of people take a psychological test, each person obtains a score. Usually, these scores differ from each other, with some people obtaining high scores, some obtaining low scores, and some scoring in between. The group's set of scores is a distribution of scores. *Table 3.1* presents a small example in which six people take an IQ test. As you can see, this small distribution of six scores reflects individual differences-the individuals' scores range from a high of 130 to a low of 90. One key goal of statistics is to describe a distribution of scores in a meaningful way. As summarized in *Table 3.2*, at least three kinds of information can be used to do this. Many of you are probably already familiar with concepts such as central tendency, variability, and shape. These concepts set the stage for a discussion of psychometric concepts such as reliability and validity. **Table 3.1 Example for Describing a Distribution of Scores** | Person | IQ (X) | Deviation $(x-x)$ | Squared Deviation $(x-x)^2$ | | :----- | -------: | ---------------: | :--------------------------: | | 1 | 110 | 0 | 0 | | 2 | 120 | 10 | 100 | | 3 | 100 | -10 | 100 | | 4 | 90 | -20 | 400 | | 5 | 130 | 20 | 400 | | 6 | 110 | 0 | 0 | $$ Note: Sum \sum(X) = 660, sum of squares \sum(X-X)^2 = 1,000, mean (X) = 110; variance (s^2) = 166.67, standard deviation(s) = 12.91. $$ **Table 3.2 Describing and Quantifying Distributions of Test Scores: Conceptual Issues and Statistical Indexes** | Conceptual Issue | Statistical Index | Equation | | :----------------------------------------------------- | :--------------: | :----------------------------------------- | | The average person's test score | Mean $(x)$ | $\frac{\sum X}{N}$ | | Central tendency: Which single score best represents the set of scores? | | | | Differences among people's test scores | Variance $(s^2)$ | $\frac{\sum (X-X)^2}{N}$ | | Variability: To what degree do the scores differ from each other? | | | | Shape of a distribution of test scores | Skew | $\frac{\sum (X-X)^3}{Ns^3}$ | | To what degree is the set of scores evenly/symmetrically distributed around the mean? | | | | Consistency between two sets of scores from the same people | Covariance ($C_\text{xy}$) | $\frac{\sum (X-\bar{X})(Y-\bar{Y})}{N}$ | | Covariability: To what degree are the differences in one set of scores consistent with the differences in another set of scores? | Correlation ($r_\text{xy}$) | $\frac{C_{XY}}{S_X S_y}$ | # Central Tendency Perhaps the most basic facet of a distribution of scores is **central tendency**: What is the “typical” score in the distribution or what is the score that is most representative of the entire distribution? Several statistical values can be used to reflect central tendency (e.g., median and mode), but the mean (or average) is the most common. The arithmetic mean $(\bar{x})$ represents the “typical” score in a distribution of scores. Many of you are probably familiar with the equation for the mean: $$ \qquad Mean = \bar{x} = \frac{\sum X}{N} $$ In this equation and those that follow, each individual's score is represented by an "X." Those of you who are familiar with summation notation will recall that the sigma (Σ) symbol tells us to sum the X values. In addition, N represents the total number of people in the group (or, more generally, the total number of X values in the distribution). For the data in *Table 3.1*, the mean IQ is 110: $$ \qquad \bar{x} = \frac{110+120+100+90+130+110}{6} = \frac{660}{6} = 110 $$ Thus, the "average person" in the group has an IQ of 110. Although the mean of a distribution can be useful, we are more interested in quantifying the degree to which the people in a group differ from each other. One way to do this is to quantify the degree to which each person's score deviates (i.e., differs) from the group mean. We turn to that next. # Variability As has been (and will continue to be) emphasized, measurement rests on the concept of variability. If our measures are to be useful, then they need to be sensitive to psychological variability (i.e., they must reflect the differences in people's standing on a psychological attribute). Thus, we must be able to quantify precisely the amount of variability within a distribution of test scores. Although several statistical values can be used to quantify the variability within a distribution of scores, we will focus on two-the variance and its close relative, the standard deviation. These are the most commonly used indexes of variability in behavioral research, and they lie at the heart of psychometric theory in particular. The variance and the standard d