Chapter 4 Criterion Measurement - Paul E. Levy (2017) PDF

CHAPTER Criterion Measurement CHAPTER OUTLINE Defining Criteria and Their Properties Ultimate Criterion Actual Criterion Criteria for the Criteria The Criterion Problem Multiple Versus Composite Criteria PRACTITIONER FORUM: Deirdre J. Knapp Dynamic Criteria Distinctions Among Performance Criteria Objective Criteria Subjective Criteria Contextual Performance I/O TODAY Performance and Disability Summary LEARNING OBJECTIVES This chapter should help you understand: What the criterion problem is How criteria are defined in I/O psychology What the criteria for the criteria are The difference between the ultimate criterion and the actual criterion How to differentiate between criterion contamination and criterion deficiency The important issues that revolve around multiple criteria 4 Different ways to weight criteria to arrive at a composite criterion What dynamic criteria are and how they can affect HR functioning The differences between objective and subjective criteria The potential role of contextual performance (including counterproductive behavior) in criterion development The complex relationships between task behavior, contextual behavior, and counterproductive work behavior As noted in the last chapter, job analysis provides the foundation on which everything in the field of industrial psychology is built. In this chapter, I will talk about the first aspect of this “everything”: criteria. Criteria are quite important to industrial psychology because they reflect organizational or individual performance—and, in a competitive market, companies are driven by performance and profits. Thus, I/O psychologists are often hired to help organizations develop criteria, as well as to implement HR processes that are directly linked to those criteria. Criteria are typically dependent variables that provide an indication of success or performance. For instance, when your instructor grades your paper, he uses criteria. When your boss evaluates your performance and decides on whether or not to give you a raise, she uses criteria. This chapter will present the nuts and bolts of criteria —what they are, how they are developed, how they are evaluated, and what they are used for—along with a discussion of the different types. DEFINING CRITERIA AND THEIR PROPERTIES Every time you evaluate something or someone, you are using criteria— though you’re probably not aware of doing so. We even use criteria to evaluate teams such as a work team or a baseball team (for a more complete discussion of team criteria, see LePine, Piccolo, Jackson, Mathieu, & Saul, 2008). When, for the first time, you walked into your dorm room as a freshman or your office cubicle as a new employee, you very likely wondered whether you and the students in that dorm room or the coworkers in the nearby cubicle would get along. In other words, you wondered whether you would like those people. When you did finally decide that Kristen was a pretty cool roommate, you may have made that decision based on her personality (funny, serious, kind, trustworthy), intelligence (smart about academic things, smart about everyday things), hobbies (enjoys sports, music, movies), hang-ups (dislikes people who are stuck on themselves and people who don’t clean up after themselves), and so on. You made your decision based on all of these criteria. Consciously or unconsciously, you defined various criteria that led you to the overall evaluation that Kristen was a cool roommate. A CLOSER LOOK Measuring job performance can be a tricky business. In what kind of work situations do you think measuring job performance is especially difficult? Of course, we don’t always make the same decisions. For instance, you may have another roommate, Sarah, who doesn’t think Kristen is cool at all. Often, when we disagree in this way, it is because we have different criteria. Different judgments may be involved in determining whether Sarah thinks someone is cool versus whether you think someone is cool. For example, Sarah may think it’s important to be involved in community service, which Kristen isn’t. In other words, you and Sarah come to different conclusions about Kristen because you are using different criteria. At other times, however, we disagree about things not because our criteria differ but because we evaluate people and things differently on the basis of those criteria. When movie critics disagree about a movie, the disagreement is often based not on different criteria but on how the same criteria are evaluated. For instance, one may think the movie is good because the characters are believable, and the other may think the movie is bad because the characters aren’t at all believable. Character believability is an important criterion for both reviewers, but their perceptions about the extent to which the movie scores high on this criterion differ, resulting in different evaluations of the film. In I/O psychology, criteria are defined as evaluative standards that can be used as yardsticks for measuring employees’ success or failure. We rely on them not only for appraising employees’ performance but also for evaluating our training program, validating our selection battery, and making layoff and promotion decisions. Therefore, if we have faulty criteria, these HR functions would be flawed as well because they depend directly on the adequacy of the criteria. For instance, it seems reasonable to fire an employee because she is not performing at an acceptable level on the performance criterion. However, if that performance criterion is not a good measure of job performance, there is the potential for making a huge mistake by firing someone who may actually be a good worker. In such cases, we leave ourselves vulnerable to lawsuits (we’ll talk about legal issues in the workplace in Chapters 5 and 7). Because key organizational decisions will be made directly on the basis of criteria, organizations need to be sure that these yardsticks are measuring the right thing and measuring it well. criteria Evaluative standards that can be used as yardsticks for measuring an employee’s success or failure. TECHNICAL TIP In Chapter 2, we talked a bit about interrater reliability, which is the consistency with which multiple judges evaluate the same thing. This is an example of low interrater reliability because the judges—two movie critics, in this case—don’t see the movie similarly. In general, the criterion in which I/O psychologists are most interested is performance, which can be defined as actual on-the-job behaviors that are relevant to the organization’s goals. Performance is the reason that organizations are interested in hiring competent employees. performance Actual on-the-job behaviors that are relevant to the organization’s goals. Throughout this chapter, I will use the terms criterion, performance, and performance criterion interchangeably. Ultimate Criterion The ultimate criterion encompasses all aspects of performance that define success on the job. R. L. Thorndike (1949) discussed this very notion in his classic text Personnel Selection, arguing that the ultimate criterion is very complex and not even accessible to us. What he meant is that the ultimate criterion is something we can shoot for, but in actuality we can never completely define and measure every aspect of performance. Doing so would require time and details that simply are not available. As an example, let’s say we’re interested in defining the ultimate criterion for an administrative assistant’s job. Toward that end, we might generate the following list: Typing speed Typing quality Filing efficiency Interactions with clients Interactions with coworkers Creativity Written communication Oral communication Procedural adherence Computer skills Interactions with supervisors Punctuality Initiative ultimate criterion A theoretical construct encompassing all performance aspects that define success on the job. This appears to be a fairly complete list, but when I asked two administrative assistants in our psychology department office if they had any to add, they came up with quite a few more that I hadn’t thought of, including organizational skills. My guess is that if I gave this list to additional people, they would come up with still other dimensions or subcriteria. The point here is that merely listing all the criteria that define success on the job is difficult enough; measuring all of them would be virtually impossible. Thus, the ultimate criterion is a theoretical construct that we develop as a guide or a goal to shoot for in measuring job success. FIGURE 4.1 Criterion Properties Actual Criterion The actual criterion is our best real-world representation of the ultimate criterion, and we develop it to reflect or overlap with the ultimate criterion as much as possible. Figure 4.1 illustrates this concept with respect to the administrative assistant job that we discussed earlier. Because we cannot possibly measure every facet of performance that we think is involved in the administrative assistant job and probably could not list everything that would indicate someone’s success on the job, we consider only those elements that seem most important and that are most easily measured. Again, as applied psychologists, we need to keep practical issues in mind; we do this by thinking about the time, effort, and cost associated with measuring the different elements of the criteria involved because these are of great importance to the company with which we are working. In a perfect world, everything that is listed as being a part of the ultimate criterion would be included in the actual criterion, but typically this isn’t possible. Thus, our actual criterion includes only those elements of the ultimate criterion that we intend to measure. actual criterion Our best real-world representative of the ultimate criterion, which we develop to reflect or overlap with the ultimate criterion as much as possible. Criteria for the Criteria In this section, we focus on what makes a good criterion. There are many lists of dimensions along which criteria can be evaluated. One of the earliest of these lists features 15 dimensions (Blum & Naylor, 1968). In a classic book, John Bernardin and Dick Beatty (1984) delineate more than 25 characteristics that have been examined as yardsticks for measuring the effectiveness of criteria. For our purposes here, I will highlight what I think are the five most fundamental of these criteria for the criteria (see Table 4.1 for a summary). TABLE 4.1 Criteria for the Criteria Dimension Definition Example/Explanation Relevance The extent to which the actual criterion A measure that seems to capture the measure is related to the ultimate major elements of the ultimate criterion criterion; that is, it represents job performance very well Reliability The extent to which the actual criterion A measure that does not give measure is stable or consistent drastically different results when used for the same employees at close time intervals Sensitivity The extent to which the actual criterion A measure that consistently identifies measure can discriminate among performance differences among effective and ineffective employees employees; that is, everyone is not rated the same Practicality The degree to which the actual criterion can and will be used by those whose job it is to use it for making important decisions Fairness The extent to which the actual criterion A measure that does not result in men measure is perceived by employees always being rated higher than to be just and reasonable women, and vice versa A measure that can be completed in a reasonable amount of time by a supervisor and does not require excess paperwork Relevance The crucial requirement for any criterion is relevance, or the degree to which the actual criterion is related to the ultimate criterion. Relevance reflects the degree of correlation or overlap between the actual and ultimate criteria. In a more technical, statistical sense, relevance is the percentage of variance in the ultimate criterion that can be accounted for by the actual criterion. In other words, variance in this context is an index of how much people differ in their ultimate criterion scores. We would like to see as much overlap between the ultimate criterion and the actual criterion as is practical to achieve. Look at Figure 4.1 and note the nine dimensions in the intersection of the ultimate criterion and the actual criterion. These constitute relevance. The overlap itself thus represents the portion of the ultimate criterion that is tapped by our actual criterion. We want this portion of the Venn diagram to be as large as possible, indicating that our actual criterion mirrors the ultimate criterion fairly well. Relevance in this sense is analogous to validity. TECHNICAL TIP Recall that in Chapter 2, construct validity was defined as the extent to which a test measures the underlying construct that it is intended to measure. Two conditions can limit the relevance of a criterion. The first is criterion deficiency, which refers to dimensions in the ultimate measure that are not part of the actual measure. A criterion is deficient if a major source of variance in the ultimate criterion is not included in the actual criterion. We want this part of the Venn diagram to be very small. Figure 4.1 shows the criterion deficiency as a list of dimensions that are in the ultimate criterion but are not part of our actual measure—namely, filing efficiency, written communication, oral communication, and creativity. Our criterion is deficient to the extent that these dimensions are not tapped by our criterion measure but are important to the criterion construct. If these particular dimensions are not terribly important to the criterion construct, then our criterion is not very deficient; but, if they are important, then we may have a problem with criterion deficiency. criterion deficiency A condition in which dimensions in the ultimate measure are not part of or are not captured by the actual measure. The second condition that can limit the relevance of a criterion is criterion contamination, which refers to those things measured by the actual criterion that are not part of the ultimate criterion. This is the part of the actual criterion variance that is not part of the ultimate criterion variance. Criterion contamination occurs in two ways. First, it can be caused simply by random measurement error: No matter how carefully we develop a precise and accurate measurement, there is always some error in that measurement— unreliability. This measurement error is random in the sense that it tends to even itself out over time, but any single performance measurement may not reflect true performance. For instance, we might measure performance every six months for two years and find that there are some differences in our four performance measurements, even though true performance (i.e., the ultimate criterion) may not have changed. criterion contamination A condition in which things measured by the actual criterion are not part of the ultimate criterion. The second cause of criterion contamination, bias, is more troubling given its systematic rather than random nature. Bias is most prevalent when the criteria of interest are judgments made by individuals. It is not unusual for raters to allow their biases to color their performance ratings. For instance, bias would result in criterion contamination if an evaluator, when rating administrative assistants on the dimension of organizational skills, gave a higher rating to those she liked or to those who were more attractive, regardless of actual performance levels. (Two additional examples, age and status, are listed in Figure 4.1.) Sometimes this is done on purpose, while at other times it is unconscious; we will discuss both instances of bias in more detail in the next chapter. Criterion contamination occurs when the measure of the actual criterion reflects things other than what it should measure according to the ultimate criterion. When we are talking about criteria, a criterion is contaminated if it measures something other than what it is intended to measure, such as likability or attractiveness. Contamination is often a problem in the context of performance ratings, which we will discuss at length in the next chapter. Reliability Although unreliability was mentioned as a potential cause of criterion contamination, its opposite—reliability—is important enough to be listed as a criterion itself. As discussed in Chapter 2, reliability refers to the stability or consistency of a measure. Indeed, an unreliable criterion is not very useful. For instance, suppose that we measured an administrative assistant’s typing speed on six successive occasions and found that he typed at 50, 110, 40, 100, 60, and 45 words per minute. Such variability in the criterion measure would lead me to conclude that this particular measure is unreliable. When a measure is unreliable, we can’t confidently use it in making important decisions. Another way to conceptualize the situation is to ask if this administrative assistant is a fast typist, as the 110 and 100 words per minute might indicate, or a below-average typist, as the 40 and 45 words per minute might indicate. Because the criterion is unreliable, we simply don’t know—and the criterion is thus not useful. TECHNICAL TIP In Chapter 2, we defined reliability as consistency or stability in measurement, but unreliability is always present to some degree because we can’t measure these kinds of things perfectly. A CLOSER LOOK Your company has only one opening available for a new technical support staff member. On paper, all the candidates in this photo meet the criteria for the job. Can you think of other characteristics individuals might use in choosing which candidate to hire? Sensitivity A criterion must be sensitive enough to discriminate among effective and ineffective employees. If everyone is evaluated similarly according to the criterion, then it is not very useful. Suppose Goodyear wants to make a series of promotion decisions based on a performance criterion consisting of the number of tire valves produced by each employee in an eight-hour shift. At first glance, this seems a reasonable criterion; but on closer examination, we find that all 100 employees produce between 22 and 24 tire valves per eight-hour shift. In short, there doesn’t appear to be much variability in the performance of these 100 employees on this criterion. The criterion isn’t sensitive enough to discriminate between effective and ineffective performers, so Goodyear could not reasonably make promotion decisions based on it. (After all, would it make sense to promote those employees who produce 24 tire valves over those who produce 23?) Yet there are probably other criteria in this situation that could distinguish among the employees. Can you think of any? The first one that comes to my mind is quality. Although all 100 employees produce about the same number of tire valves, there may be some serious differences in the quality of these valves. A second criterion, and one that may be even more sensitive, is the amount of waste each employee leaves behind after producing tire valves. Quite a few other alternative measures would work as well. Every situation is different, of course—but the bottom line is that I/O psychologists work hard to develop sensitive criteria so that confidence can be placed in the criteria on which decisions are based. Practicality I have alluded to the importance of practical issues throughout the book thus far, but here the concept of practicality is tied more directly to organizational functioning. Practicality refers to the extent to which a criterion can and will be used by individuals making important decisions. Consider the following example. Let’s say I spend an extraordinary amount of time conducting a thorough job analysis using the Common-Metric Questionnaire. Then I develop a set of relevant and reliable criteria that have hardly any criterion deficiency or contamination. But there is one problem: The organization’s employees choose not to use the criteria on the grounds that they are too difficult to measure and too abstract to be useful. My error here was that I neglected to consider the usefulness or practicality of the measures and failed to work with the organization’s management and other employees in the development process. Indeed, criteria must always be relatively available, easy to obtain, and acceptable to those who want to use them for personnel decisions. Fairness The last element in my list of criteria for the criteria is fairness—in other words, the extent to which the employees perceive the criteria to be just and reasonable. Whereas relevance (along with the interplay of deficiency and contamination), reliability, and sensitivity are rather technical concepts, fairness and practicality have more to do with the human element involved in the use of criteria. Here, the issue is whether the employees who are being evaluated along certain criterion dimensions think that they are getting a fair shake. A criterion that is viewed by the employees as unfair, inappropriate, or unreasonable will not be well received by these employees. This point, of course, has widespread implications for organizational functioning and interpersonal dynamics. For instance, it has been found that employees’ perceptions of justice are linked to important organizational criteria such as turnover intentions and customer satisfaction ratings (Simons & Roberson, 2003). THE CRITERION PROBLEM You might remember from your introductory psychology class that human behavior is determined by multiple causes, such that it is impossible to point to a particular behavior and demonstrate beyond a shadow of a doubt that the behavior was determined by any one particular cause. Similarly, it is impossible to point to any one performance criterion and argue that it is the perfect measure of performance; performance is simply more complicated than that. In other words, performance (the criterion in which organizations, employees, managers, and I/O psychologists are most often interested) usually includes more than one dimension; some performance criteria are suitable for one type of organizational decision, while other criteria are suitable for a different type of organizational decision. Thus, no one performance criterion fits the bill for all organizational purposes. Given these difficulties, the measurement of performance criteria can be quite complicated. We do not have performance criteria entirely figured out any more than we have human behavior entirely figured out, but, just as we can often predict, understand, and explain human behavior, we can develop, use, and understand criteria in the world of work. However, I will use this opportunity related to our discussion of criteria to present two of the most important elements of the criterion problem. Multiple Versus Composite Criteria Most I/O psychologists believe that performance is multifaceted—in other words, that it is made up of more than one dimension. Look back at the list of criteria I presented for the job of administrative assistant; it should be clear that there are many related, as well as potentially unrelated, dimensions of performance for this job. Belief in a multiple-factor model (Campbell, Gasser, & Oswald, 1996)—the view that job performance is composed of multiple criteria—has been common among I/O psychologists for years. However, many experts in the area have noted that studies of criteria-related issues and, especially, examinations of multiple-factor models have been minimal at best (Austin & Villanova, 1992; Campbell, 1990; Campbell, McCloy, Oppler, & Sager, 1993). Campbell, one of the leaders in this area, has developed an eight-factor model that he suggests should account for performance variance in every job listed in the Dictionary of Occupational Titles, or DOT (Campbell, 1990), and now the O*NET. He asserts that although not every dimension will be relevant for every job, every job can be described adequately in terms of a subset of these eight dimensions. (These are listed in Table 4.2, along with definitions and examples.) In addition, he maintains that three of these dimensions—job-specific task proficiency, demonstrating effort, and maintaining personal discipline—are necessary components of every job. There appears to be some empirical support for the dimensions he suggests (Rojon, McDowall, & Saunders, 2015). TABLE 4.2 Campbell’s Taxonomy of Performance Performance Factor Definition Example Job-specific task proficiency Degree to which individual can perform core tasks central to a particular job An administrative assistant who does word processing Non–job-specific task proficiency Degree to which individual can perform tasks that are not specific to a particular job A plumber who handles phone inquiries and sets up appointments to provide estimates for work Written and oral communication tasks Proficiency with which one can write and speak A military officer who gives a formal talk to recruits Demonstrating effort Consistency and persistence in an individual’s work effort A college professor who goes into the office to continue working on her book even though the university is closed due to a winter storm Maintaining personal discipline Avoidance of negative behaviors such as alcohol abuse, substance abuse, and rule infractions An off-duty nurse who avoids excessive alcohol in the event that he is needed in an emergency Facilitating peer and team performance Helping and supporting A journalist who helps a coworker peers with job writing a story meet the deadline problems Supervision All the behaviors associated with managing or supervising other employees Management/administration All the behaviors associated with management that are independent of supervision A retail store manager who models appropriate behavior, provides fair and timely feedback, and is available to handle the questions of her subordinates A plant manager who gets additional resources, sets department goals and objectives, and controls expenditures Source: Information from Campbell, 1990. Returning to our administrative assistant example, let’s assume that the job has five criteria that are reflective of performance and that each is important and relatively unrelated to the others. Now, when it comes time to hire, promote, fire, lay off, or increase salaries, which criteria will be used to make these decisions? You see, things would be much easier if performance were one-dimensional, because then we could make all these decisions based on the one criterion without any confusion. For instance, if punctuality were the only dimension of performance that mattered, we could easily make hiring and firing decisions based on that one performance dimension. However, because performance is multidimensional, we have to make some tough decisions about which criteria to use or how to combine them for the purposes of HR decisions. This is a part of the criterion problem: Performance is best represented by multiple criteria, but organizations need to make decisions based on one score, number, or combination of these multiple criteria. There are at least two alternatives for dealing with this criterion issue. Both focus on creating a composite criterion, which is a weighted combination of the multiple criteria, resulting in one index of performance. First, if the five criteria are scored on the same scale, they can be combined using equal weighting. Here, they are simply added up to compute one number that represents the performance of each individual. For instance, if each employee is rated on each dimension along a 1–5 point scale, we could simply add these up and promote the person based on the highest computed score. But on what grounds do we base this decision? Does it seem appropriate to take these five dimensions, which are largely unrelated, and just add them up and call it performance? Doing so would seem to take us back to the problem that Campbell pointed out—we often mistakenly assume that performance is a single construct (Campbell et al., 1993) when, in fact, it is not. composite criterion A weighted combination of multiple criteria that results in a single index of performance. A second alternative composite criterion can also be created through unequal weighting, whereby some procedure is employed to weight the criteria differentially. For instance, it might make sense to employ only those criterion dimensions that seem especially relevant for the particular HR decision that needs to be made and to weight them according to their importance so that the overall weights sum to 1 (see Table 4.3). Note that the way in which the information is weighted is a value judgment made by the organization and that this decision should be based on the organization’s goals and on research conducted by the organization (Campbell, McHenry, & Wise, 1990). Furthermore, the weighting scheme may vary across HR decisions, such that different weights may be employed for making a promotion decision than for making a termination decision about employees. An example is provided in Table 4.3, which demonstrates how different dimensions might be used depending on the HR decision in question. In this example, the decision to terminate a particular employee would be based largely on such dimensions as punctuality and performance quality, somewhat on computer performance and organizational performance, and not at all on interpersonal behaviors and leadership behaviors. On the other hand, leadership behaviors and interpersonal behaviors would be much more important for deciding to promote this person into a supervisory position, whereas punctuality would be much less important. TABLE 4.3 Combining Criterion Dimensions for the Job of Administrative Assistant Criterion Weights for Promotion to Supervisor Criterion Dimension Criterion Weights for Termination Decision .00 Punctuality .30 .20 Performance quality .40 .20 Interpersonal behaviors .00 .30 Leadership behaviors .00 .15 Computer performance .20 .15 Organizational performance .10 Still, unequal weighting does not remove the concern that we have now created an index that is somewhat illogical. Let’s say that we combine one’s score on organizational performance with one’s score on productivity. What is this index? What should we call it? What is the construct that it really measures? Finally, haven’t we lost the very thing that we wanted to capture with our measures, the multidimensionality of performance? You can see that combining multiple criterion dimensions is a tricky business. Given these complexities and concerns, you might think it best to keep the criteria separate; but, of course, this causes problems for making HR decisions because you need to make that one hire/don’t hire, retain/fire, promote/don’t promote decision based on some number or score. A classic article suggests that the issue becomes one of purpose (Schmidt & Kaplan, 1971). Its authors argue that if psychological understanding is the primary goal, then multiple criteria should be examined in light of predictor variables without combining them into a composite. On the other hand, if the primary goal is decision making with an economic focus, the I/O psychologist should combine the criteria into one measure and use that as an index of economic success. However, the authors also point out that most applied psychologists should use their weighted composite to make hire/don’t hire decisions while at the same time using the multiple dimensions to examine how they relate to various predictors, thereby enhancing the psychological understanding of the performance process and adhering to the scientist/practitioner’s ideal— whereby both psychological understanding and applied decision making are important. PRACTITIONER FORUM Deirdre J. Knapp PhD, 1984, Bowling Green State University Vice President, Research and Consulting Operations Human Resources Research Organization (HumRRO) Early in my career, I was very lucky to play an important role in seminal research on the criterion domain. This work involved dozens of industrial/organizational psychologists across four organizations and included many of the most renowned academics and practitioners in this field (see discussion of Campbell’s work earlier in this chapter). The project was exciting because a wide array of carefully developed criterion measures were administered across the globe to thousands of soldiers in a variety of army occupations (I was in charge of that part of the project!). The measures included written job knowledge tests; handson tests of critical job tasks; peer and supervisor ratings of team, technical, and leadership skills; and ratings of behaviors related to indiscipline and getting along with others. Performance indicators were also retrieved from soldiers’ military records. This was a historic opportunity to learn about what it means to “perform.” Considerable effort was made to use the data collected in this project to “model” performance. That is, considering all the scores from all the criterion measures, what performance factors surfaced? The answer has since been corroborated in research on civilian occupations. Six performance factors are relevant to most entry-level jobs: Job-specific task proficiency Non–job-specific task proficiency Written and oral communication Demonstrating effort Maintaining personal discipline Facilitating peer and team performance Supervisory/leadership and management/administration factors emerge at higher-level jobs. Most organizations are not able to include all the measures used by the army in this project, nor is it always necessary to examine all possible performance outcomes. Organizations conducting validation studies should first think about what performance factors they want to predict, pick selection tests that would be expected to predict those performance factors, and develop one or more criterion measures that assess the performance factors of interest. I worked on this project first as an employee of the U.S. Army Research Institute for the Behavioral and Social Sciences and then as an employee at HumRRO (it was a 10-year project). I have continued to use the lessons I learned in much of my work. For example, I’ve worked with quite a few clients who certify or license professionals in various occupations (e.g., veterinary surgeons, law office managers, physical therapists) and who need valid and fair assessments of job skills. I’ve also continued to work with the U.S. Army to develop, validate, and implement improved selection procedures. Applying Your Knowledge 1. How does the army research project outlined here relate to the criterion problem discussed earlier in this chapter? 2. What is the benefit to researchers and practitioners of having this taxonomy of performance? 3. What attempts were made to ensure that the six criteria that emerged from this study were both valid and reliable? Dynamic Criteria The issue of dynamic criteria, measures reflecting performance levels that change over time, has been the focus of much debate in I/O psychology (Austin & Villanova, 1992; Steele-Johnson, Osburn, & Pieper, 2000). This notion may seem unimportant, but let’s consider a situation in which an employee performs well for the first 9 months on the job but by the end of 18 months is performing at unacceptable levels. The problem becomes clearer if we consider a selection context in which measures are developed to predict an individual’s performance on the job. The measure—say, a measure of intelligence—might predict performance early on in the employee’s career (in fact, we believe that it will, which is why we hired the employee based on an intelligence test to begin with), but not over the long haul. In other words, the validity with which our measure predicts job performance deteriorates over time. This, of course, is a big problem for those trying to staff an organization with people who are most likely to be successful on the job. We will address the selection process in greater detail in Chapters 6 and 7, but, for now, just note that the concept of dynamic criteria raises some important issues. dynamic criteria Measures reflecting performance levels that change over time. Some I/O psychologists argued that criteria were not really dynamic at all (Barrett, Caldwell, & Alexander, 1985), but this argument was largely refuted by some more recent empirical work (Deadrick, Bennett, & Russell, 1997; Deadrick & Madigan, 1990; Hofmann, Jacobs, & Baratta, 1993) demonstrating that the relative performance of individuals changes over time. In short, the best worker for the first 6 months on the job is sometimes not among the best workers for the next 6 months on the job. One study demonstrated that National Hockey League players who had become captains of their team (and thus had a new leadership position) also demonstrated significantly better productivity as operationalized by an index of points based on goals and assists (Day, Sin, & Chen, 2004). In other words, performance levels of players (relative levels) changed and appeared to change as a function of new leadership responsibilities. This study was among the first to try to explain dynamic criteria from a motivational/leadership perspective. A more recent study of an entire cohort of European medical students found an interesting relationship between personality variables and GPA scores (Lievens, Ones, & Dilchert, 2009). The results suggested that personality predicts success later in the curriculum much better than it predicts success in classes taken earlier in the curriculum (which tend to be on a more basic level). The studies cited above certainly suggest that part of “the criterion problem” is reflected in performance changes over time. Predicting performance is all the more difficult when the performance criterion changes. Keep this point in mind when we discuss the various predictors used in the selection of employees in Chapter 6. DISTINCTIONS AMONG PERFORMANCE CRITERIA Among the many different criteria relevant for various jobs are two traditional types of criteria, which I discuss next. After that, I discuss a third criterion type that has received a great deal of attention from I/O psychologists and organizations in the last 20 years—namely, contextual performance. Objective Criteria Objective criteria are taken from organizational records; often based on counting, they are not supposed to involve any subjective judgments or evaluations. They have traditionally been viewed as the “cleanest” criterion measures because they tend not to necessitate making a judgment call about an employee’s performance. Objective criteria are also sometimes called hard or nonjudgmental criteria, reflecting the belief that they are somehow more solid and believable than other types. Table 4.4 presents some examples of often-used objective criteria. objective criteria Performance measures that are based on counting rather than on subjective judgments or evaluations; sometimes called hard or nonjudgmental criteria. TABLE 4.4 Examples of Objective Criteria and Measurement Criteria Measurement Examples Absence Number of days absent per year; number of incidents of absence per year Lateness Number of minutes late per month; number of incidents of lateness per month Turnover Percentage of employees who leave the organization over a 12-month period Accidents Number of accidents per year; number of workers’ compensation claims per year Grievances Number of grievances filed per year; number of grievance cases lost per year Productivity Number of products produced per day; sales volume in dollars per month Counterproductive Dollars lost in theft per year; number of unlikely events (e.g., arson) per behaviors year without a reasonable explanation Some types of objective criteria are relevant for many different jobs. (These will be discussed in more detail in Chapter 11 when we examine common outcomes of job-related attitudes.) For instance, absenteeism rate is often deemed an important criterion by organizations. Related criteria include turnover rates and lateness rates, which are also used by organizations to evaluate employees and organizational efficiency. Turnover is interesting because, although it generally costs organizations a great deal of money, it can be beneficial to the extent that it involves the replacement of low performers with potentially better performers. Probably the most typical objective criterion is productivity, which is usually measured in terms of the number of acceptable products produced in a given time period. This criterion is commonly used in manufacturing plants and assembly lines, where individuals or groups of employees can be evaluated according to how many cars are produced, how many computers are assembled, and so on. Nonmanagerial jobs lend themselves to objective criteria, but managerial and other higher-level jobs do not. One reason is that most managerial jobs are not based solely on a 40-hour workweek during which the manager punches a time clock; another is that these jobs are not directly linked to a particular item that was produced. Nor are objective criteria appropriate for evaluating college professors, physicians, real estate agents, attorneys, and restaurant managers—with perhaps two exceptions: grievances and counterproductive behavior. The number of grievances is a measure of the number of times others have filed complaints about a particular manager. Counterproductive behavior is a term that refers to actions, such as theft and sabotage, that reflect the intent to harm the organization in some way. You may have noted that when I defined objective criteria, I said that these criteria are “not supposed” to require any judgment by others. Indeed, we describe these criteria as objective, and for the most part they are, but they’re not completely objective. For instance, a manager can record an employee’s absence as being excused, he can record it as being unexcused, or he can fail to record it at all. On average, objective criteria tend to involve less judgment than do subjective criteria (which we will turn to next). But we need to realize that subjectivity can come into play in virtually all measures of performance. Furthermore, objective criteria can be limited by unexpected factors or situational constraints. For instance, a production employee may not produce at her usual level because her machine has been malfunctioning frequently, thus slowing down her production. The objective criterion would not likely take this into account, but a supervisor could consider it in rating her performance. Performance ratings are typically considered subjective measures of performance, but the subjectivity in this instance is probably a good thing. Subjective Criteria Subjective criteria are performance measures that are based on the judgments or evaluations of others rather than on counting. Most published studies on criteria deal with subjective criteria, also called soft or judgmental criteria. Typical subjective criteria include ratings or rankings of employees by other employees, such as supervisors, coworkers, and subordinates. Of course, these kinds of measures are much more likely to be affected by the biases, attitudes, and beliefs of the raters than are the more objective criteria. On the other hand, they can be an excellent alternative to objective criteria for jobs in which objective criteria don’t exist or are insufficient (as in the evaluations of managers). A recent meta-analysis examined the role played by the type of criterion used (subjective versus objective) in assessing the relationship between information sharing among team members and team performance (Mesmer-Magnus & DeChurch, 2009). The researchers found a stronger relationship between information sharing and performance when performance was defined by a behavioral (subjective) measure, such as how much effort was put forth; they argued that this is because the behaviors were more under the control of the team members than were results-oriented (objective) measures, such as sales or profit. As this study suggested, research results can vary as a function of the type of criterion used. As you will see in the next chapter, we tend to rely largely on subjective criteria, although we may work very hard to make them as objective as possible. subjective criteria Performance measures that are based on the judgments or evaluations of others rather than on objective measures such as counting; sometimes called soft or judgmental criteria. TECHNICAL TIP When we talked about correlations in Chapter 2, we said the correlation coefficient was an index of magnitude or strength of relationship. In this example, you can see that magnitude differences can be driven by various factors, such as elements of measurement. Contextual Performance Important work in the area of contextual performance has attempted to expand the criterion domain to include more than just the traditional task performance criteria (Borman, 2004). Task performance encompasses the work-related activities performed by employees that contribute to the technical core (what is the actual product, what gets done on the job) of the organization (Borman & Motowidlo, 1997), while contextual performance encompasses the activities performed by employees that help to maintain the broader organizational, social, and psychological environment in which the technical core operates (Motowidlo, Borman, & Schmit, 1997). Slightly different renditions of the concept known as contextual performance are organizational citizenship behaviors (OCBs) and prosocial organizational behaviors (POBs) (Borman & Motowidlo, 1997). task performance The work-related activities performed by employees that contribute to the technical core of the organization. contextual performance Activities performed by employees that help to maintain the broader organizational, social, and psychological environment in which the technical core operates. Up to this point, we’ve focused largely on task performance. An administrative assistant is rated high on task performance if he types quickly and answers phones efficiently, but he is rated high on contextual performance (OCBs or POBs) if he demonstrates enthusiasm, helps his coworker with a problem that she may be having, and works hard to support his organization. Perhaps we can view the basic distinction as one between what is required in the way of on-the-job behaviors (task performance) and other behaviors that are specifically of value to one’s workplace and coworkers (contextual performance). I think of contextual behaviors as reflective of employees who go that extra mile rather than putting forth only what is required or expected of them. There are many different categorical schemes for the dimensions of contextual performance. One such scheme divides these multiple dimensions into five categories (Borman & Motowidlo, 1997). The first pertains to individuals who work with extra enthusiasm and effort to get their jobs done; the second, to those who volunteer to do things that aren’t formally part of their jobs, sometimes taking on extra responsibility in the process; the third, to those who help others with their jobs (reflecting what some researchers call sportsmanship or organizational courtesy); the fourth, to those who meet deadlines and comply with all the organization’s rules and regulations (reflecting civic virtue or conscientiousness); and the fifth, to those who support or defend the organization for which they work (as when they stay with the organization through hard times or “sell” the organization to others). Researchers have done extensive work on contextual performance, differentiating it from task performance and arguing for its inclusion in the domain of performance criteria (Hoffman, Blair, Meriac, & Woehr, 2007). They suggest three major distinctions between task performance and contextual performance. First, task activities vary a great deal across jobs (think of the job descriptions for a plumber and for a musician), but contextual behaviors tend to be similar across jobs (a plumber can help a colleague just as much as a musician can). Second, compared with contextual activities, task activities are more likely to be formally instituted by the organization as items on a job description or performance appraisal form. Third, task activities have different antecedents. One important study, for example, demonstrated that cognitive ability was more strongly related to sales performance (task activities) than to OCBs such as volunteerism and, similarly, that conscientiousness was more strongly related to OCBs than to sales performance (Hattrup, O’Connell, & Wingate, 1998). Researchers argue for the inclusion of contextual performance as part of the criterion space, suggesting that these criterion elements can be differentiated from task performance (Podsakoff, Whiting, Podsakoff, & Blume, 2009). Some scholars have begun to propose selection instruments that predict the likelihood of employees exhibiting OCBs (Allen, Facteau, & Facteau, 2004). Recent research has found that individuals make use of their beliefs that applicants are likely to exhibit OCBs in making hiring decisions (Podsakoff, Whiting, Podsakoff, & Mishra, 2011). Participants were shown video clips of applicants answering questions about their tendency and interest in exhibiting OCBs and the results indicated that applicants profiled as high on exhibiting OCBs were rated as better applicants, were rated as more competent, and received higher recommended salaries. Other work has focused on individual difference constructs as predictors of OCBs and for making selection decisions (Chiaburu, Oh, Berry, Li, & Gardner, 2011). This meta-analysis shows consistent and moderate relationships between the Big Five Factors of personality and OCBs. The authors argued that citizenship is especially important for organizational success in contexts that are plagued by limited resources, intense competition, and the need for teamwork. Therefore, using information that predicts citizenship provides an advantage for selection professionals. Clearly, recent research suggests that the likelihood of contributing to the social and psychological environment of the organization has the potential to play a significant role in selection decisions. Although this work is still in the early stages, it suggests the potential for making hiring decisions based in part on a prediction that certain individuals are more likely to exhibit OCBs than are others. One framework proposes that OCBs result in favorable relationships among coworkers, which serve as “social capital,” and that this capital then affects the organization’s performance and effectiveness (Bolino, Turnley, & Bloodgood, 2002). Some scholars have begun to view OCBs as a grouplevel phenomenon. A recent study of work groups found that relationship conflict and task conflict among group members had negative effects on traditional group task performance; however, task conflict such as disagreements about how best to approach the task increased group-level OCBs and relationship conflict decreased group-level OCBs (Choi & Sy, 2010). However, a recent meta-analysis of 38 studies conducted in the last 15 years that focused on group-level OCBs and performance found a positive relationship between OCBs at the group level, such as group members’ ratings of OCBs, and group performance, such as group sales performance or profit margin (Nielsen, Hrivnak, & Shaw, 2009). Furthermore, they reported that group-level OCBs improved social processes (e.g., coordination and communication) within teams. A more recent study took a slightly different approach to examining the relationship between OCBs and organizational effectiveness (Nielsen, Bachrach, Sundstrom, & Halfhill, 2012). In a study of work groups from six different organizations, researchers found OCBs exhibited by organizational work groups were significantly correlated with customers’ ratings of the group’s performance. The positive correlation indicates that groups that exhibit OCBs are seen by their customers as more effective. This interesting and important effect only held for those groups that were high on task interdependence, which means that the group members relied on each other to complete their work. You may recall from the historical review of I/O psychology in Chapter 1 that industrial psychology was developed within a measurement framework, resulting in a rather narrow conceptualization of performance as productivity —a measurable outcome. Many now argue, however, that an expansion of the criterion space is more consistent with the modern organization. This more expansive view of performance criteria is illustrated in Figure 4.2. FIGURE 4.2 Expansion of the Criterion Domain It has become clear with recent research that OCBs are relevant and important across various cultures and international business contexts. For instance, researchers have developed a model of performance criteria for expatriates—employees who are temporarily working and residing in a foreign country. This model includes contextual performance dimensions such as initiative, teamwork, and interpersonal relations (Mol, Born, & van der Molen, 2005). In a Belgian sample, researchers demonstrated that traditional structures or forms of OCBs that have been identified in U.S. samples hold reliably well in this international context (Lievens & Anseel, 2004). However, a recent study of 269 Portuguese men and women working in 37 different companies found that justice-related antecedents of OCBs were perceived differently than is typically the case among North American workers (Rego & Cunha, 2010). Specifically, the authors found that interactional justice—particularly on the interpersonal dimension—was more important to the Portuguese employees than it tends to be in the United States. Furthermore, these interpersonal concerns were also more strongly related to OCBs and specific OCB dimensions like interpersonal harmony, personal initiative, and conscientiousness than were informational and distributive justice concerns. The authors argued that these results reflect the fact that Portugal’s culture is less assertive, higher on power distance (i.e., it is accepted that power is not shared equally), and more collectivistic (i.e., a focus on the group and loyalty to the group) than that of the United States. As the work of I/O psychologists continues to become more global as a function of the increasing globalization of business and industry, we will likely see additional research with an international focus being conducted. expatriates Employees who are temporarily working and residing in a foreign country. I/O TODAY Performance and Disability One of the trickier problems in I/O psychology is known as the diversityvalidity dilemma—that is, there is sometimes a trade-off between having diversity in an organization, and maximizing performance. There are a variety of reasons why this occurs, but hiring an employee with a disability may provide a good illustration of this dilemma and show why the criterion problem is relevant to this issue. As you will learn in Chapter 7, employment law indicates that as long as someone can perform the essential components of a job, they are qualified for the job. We also cannot use certain characteristics to discriminate against a qualified worker; disability is one example of these characteristics. Therefore, an individual with a vision impairment could likely be a massage therapist, or someone in a wheelchair could work as a tour guide. However, along with disabilities come some additional accommodations the organization needs to make, as well as some extra tasks the disabled employee must complete. Take, for example, a tour guide who needs to use a wheelchair. Even though he is qualified for the job, the organization may need to make accommodations for him—perhaps they can only have him lead certain tours that don’t involve having to climb stairs. Similarly, this individual may need to receive more training —if the bus he drives uses hand controls so he can drive without foot pedals, he may need extra training to learn how to use these hand controls. While over the long term, this person could likely perform just as well as a nondisabled individual, it may take him time to learn these new tasks, which could limit his task performance in the short term. In addition, if his coworkers don’t know how to work with a person with a disability, this can also affect his performance (see Chapter 8 for a review of how diversity training might help his coworkers learn about how to work with him). As you can see, if we evaluate this employee only on task performance, he may seem like he is underperforming. However, he still might be engaging in valuable performance—he may be a great team member, and he might have some great ideas for the organization, such as how to provide accommodations so that customers with impairments could still enjoy a tour. This example illustrates why it is important for I/O psychologists to help decision-makers understand the other values that come with a diverse workforce, such as better team performance, fewer CWBs, more OCBs, and more robust ideas. I/O psychologists must be thoughtful about defining performance—focusing on only one aspect may lead us to make decisions that reduce the diversity and flexibility in the organization. We must also think carefully and creatively about how to include and support individuals in the workplace, while still ensuring they can do the job well by using training, accommodations, and other tools. Discussion Questions 1. Imagine you are in the scenario presented above, and you do notice that your tour guide in the wheelchair has lower task performance than his colleagues. How might you change your criterion for the job? What might be your first steps for addressing his lower task performance? 2. Consider one type of diversity that can exist in an organization (e.g., age, race, religion, country of origin). What might be one benefit in performance that could occur if an organization included more diverse membership on this characteristic? 3. If you were working with an organization that was concerned about decrements in performance related to increased diversity, what arguments might you make to persuade them that diversity in an organization is valuable? We have been talking about OCBs as purely positive behaviors; however, some researchers argue that OCBs can actually have negative effects on organizational functioning and the employees exhibiting those behaviors (Bolino & Turnley, 2005; Bolino, Turnley, & Niehoff, 2004). So, the question is whether there are boundary conditions on the positive effects of OCBs. Can focusing too much on OCBs get in the way? The answer appears to be “yes,” according to some recent research. In a large study of data from over 3,500 employees, researchers found that time spent on task performance was a better predictor of career outcomes (promotions, salary increases, performance evaluations) than was time spent on OCBs (Bergeron, Shipp, Rosen, & Furst, 2013). Further, they fo

Chapter 4 Criterion Measurement - Paul E. Levy (2017) PDF

Document Details

Tags

Related

Summary

Full Transcript