Personnel Selection and Assessment Chapter 5 PDF

Summary

This chapter focuses on personnel selection and assessment, exploring the process of selecting qualified candidates for various positions within an organization. It discusses the different stages involved, the importance of choosing appropriate selection tools, and measuring employee performance. The chapter also delves into Binning and Barrett's selection model.

Full Transcript

5. Personnel selection and assessment 5.1. What is this chapter about? Selection is the process of choosing the best-qualified applicant who was recruited for a given job. However, personnel selection is not only relevant for decisions on who enters the organization, but also for decisions on trans...

5. Personnel selection and assessment 5.1. What is this chapter about? Selection is the process of choosing the best-qualified applicant who was recruited for a given job. However, personnel selection is not only relevant for decisions on who enters the organization, but also for decisions on transfer, rotation, and promotion of employees. The traditional selection paradigm consists of at least three stages.  In the first stage, the job for which individuals will be chosen is examined to determine what tasks and responsibilities will be required. This specification of the domain of job tasks is followed by the generation of hypotheses concerning the KSAOs required of individuals who must perform these tasks. The attentive reader will notice that the first stage was covered extensively in Chapter 3 on job analysis and competency modelling.  Stage 2 consists of the development or choice of selection tools (e.g., cognitive ability tests, personality inventories) to measure those individual characteristics that are hypothesized to predict job performance. Usually, a wide range of selection tools is available, but occasionally an organization decides to develop its own method. What is almost always forgotten (yes, also by students at their final exam) is that the choice of a selection method goes hand in hand with the choice (or development) of an on-the-job performance measurement. After all, the whole point of selection is to predict whether someone will perform well in the job. It is therefore impossible to make statements about the quality of the selection procedure without subsequently looking at how the chosen applicant (now employee) performs in the job. The latter, performance appraisals, is discussed in detail in the next chapter. For now, it suffices to understand that the chapters on personnel selection and performance appraisal are naturally intertwined.  In stage 3, the information gathered about the person during the selection is compared with the critical job requirements. This comparison then results in a decision about the suitability of an applicant. The conclusion may be that the applicant is unqualified, is qualified but not ranked first, will be included in a candidate reserve list, or will receive a job offer. In the next section, we will further explain the different components of the selection paradigm using Binning and Barrett’s selection model146. This chapter is mainly based on the work and writings of authorities in the field of personnel selection and assessment, such as Walter Borman, David Chan, Filip Lievens, Kevin Murphy, Robert Ployhart, Ann Marie Ryan, Paul Sackett, and Neal Schmitt147,148. I am also indebted to the late Robert Roe for his work on personnel selection that I discovered in 1997 when I started working as a fresh graduate in a recruitment and selection agency. His methodical approach and clear descriptions instilled in me a love for the selection profession; a love that continues to this day. 5.2. Binning and Barrett’s selection model The field of personnel selection has its roots in the notion that the future job performance of a particular applicant may be predicted at the time of selection based on relatively enduring and stable characteristics of that applicant. In their seminal article, Binning and Barrett present a conceptual framework (see Figure 5.1) that describes how psychological constructs (i.e., predictor constructs, performance constructs) and operational measures of these constructs (i.e., predictor measures, performance measures) are related in the context of personnel selection and performance measurement. In particular, the framework provides an overview of the inferences made in personnel selection, illustrating the complexities of conducting personnel selection research. Let’s explain the model step by step, starting with the performance domain. 5.3. The performance domain According to Binning and Barrett, the performance domain is “a subset of all possible behaviours that contribute to organizational goals and objectives.” So, simply stated, performance is about an employee performing well in his or her job and in the organization. Inherent in this definition, and relevant to our discussion, is that performance is multidimensional. That is, there is no one single performance variable, but different types of work behaviour relevant to organizations in most contexts. Campbell and his colleagues149 proposed a ‘theory of job performance’ that hypothesizes that that this construct comprises eight major dimensions, each of which consists of several more-specific features as well. The eight general factors, listed below, are meant to describe the universe of things people do (not the results of their actions) across all jobs, though one or more of the factors may be missing or irrelevant in any given job. 1. Job-specific tasks proficiency behaviours are those technical or core tasks that are central to the job. These define the substantive content of what gets done by, for instance, computer programmers, college professors, carpenters, and so on. 2. Task proficiency of a non-job-specific nature include those things that people across jobs are often required to perform in organizations. For example, in many workplaces, all individuals are required to help keep the workplace clean and to be watchful for unsafe conditions. 3. Communication is defined as the ability of a person to clearly express him/herself both in writing and orally, regardless of the content of the communication. 4. Demonstrating effort pertains to the observation that people in most jobs are required to commit themselves to the performance of work tasks, exert a high degree of effort, and persist in that effort. 5. Maintaining personal discipline is also a requirement in most jobs, in that employees are required to arrive at work on time, avoid substance abuse problems, and abide by company rules and policies. 6. Facilitating peer and team performance includes behaviours related to helping colleagues with work-related and personal problems, serving as a role model, and promoting participation of colleagues in the organization’s work. 7. Supervisory/leadership behaviour is directed at influencing the behaviour of subordinates. Thus, this factor includes various leadership behaviours, as well as coaching and empowerment. 8. Management/administrative tasks include those tasks that help to manage, report, or define what an organization does without direct interaction with subordinates, as would be the case for supervisory tasks. Rotundo and Sackett150 narrowed down the number of performance dimensions and suggested that job performance can be described by three broad performance components: task performance, contextual performance, and counterproductive performance.  Task performance refers to activities that are formally recognized as part of the job and that contribute to the organization’s technical core (e.g., production, service, sales). Task performance include Campbell et al.’s job-specific and non-job-specific task proficiency behaviours. The notion of contributing to the technical core is an important feature that helps distinguish this performance component from the others. An example of task performance is a truck driver getting cargo from point A to B on time and without accidents.  Contextual performance (also referred to as extra-role behaviour) refers to the voluntary performance of task activities that are not formally part of the job and to helping and cooperating with others in the organization to get tasks accomplished. Contextual activities are important because they contribute to the social climate and psychological atmosphere of an organization: they are the ‘social glue’ that holds an organization together. Generally, five broad clusters of contextual or extra-role behaviour are distinguished: altruism, conscientiousness, sportsmanship, civic virtue, and courtesy. Note that performance is always about behaviours (what employees do), and not about personality (employees' traits). o Altruism refers to helping other members of the organization perform their duties. Examples: volunteering to help new employees, helping colleagues who are overworked, assisting employees who were absent, guiding employees in performing difficult tasks. o Conscientiousness is a discretionary behaviour that goes far beyond the organization's minimum required levels of attendance, punctuality, and housekeeping. Examples: not taking extra breaks, always arrive on time. o Sportsmanship refers to maintaining positive attitudes and willingness exhibited by the employees to sacrifice own interests for the greater good of the organization. High sportsmanship is characterized by a great willingness to endure the inevitable inconveniences and obligations of work without complaining. For example, employees with low sportsmanship always feel that there is something wrong with the way the organization operates. o Civic virtue refers to constructive involvement in the organization's political process and contributing to it by freely and openly expressing opinions, attending meetings, discussing with colleagues on matters of concern to the organization, and reading organizational communications such as e-mails for the good of the organization. o Courtesy refers to behaviours and gestures made to avoid problems for colleagues and supervisors. Leaving the copier or printer in good condition for other workers’ use is an example of courtesy at work.  Counterproductive performance (also referred to as counterproductive work behaviour) refers to any intentional behaviour on the part of an employee viewed by the organization as contrary to its legitimate interests. While accidental actions may also cause harm, they are not part of counterproductive performance. Counterproductive performance includes a broad variety of phenomena. Employee absenteeism, abusive supervision, aggression, blackmail, bribery, bullying, destruction of property, discrimination, drug use, extortion, fraud, harassment, industrial espionage, interpersonal violence, kickbacks, lying, sabotage, sexual harassment, social loafing at work, social undermining, tardiness, theft, tyranny, violations of confidentiality, violence, and withdrawal behaviours have all been the subject of applied psychological research and are included in the counterproductive performance construct space. 5.4. Performance measures The next step is to translate the performance domain into measurable criteria. We ask the question: once someone is hired, how can we check whether he or she is indeed performing well? Choosing good criteria is not an easy task, and a selection programme built on a simple but inadequate and error-contaminated criterion cannot help but fail. Therefore, the selection procedure can never be more valid than the validity of the chosen criterion. First, we need to establish an unambiguous measure of success in the job. This seems easier than it is. After all, job success is not a dichotomous concept. Employees succeed to varying degrees: some succeed with flying colours and exceed expectations, others perform adequately but no more than that, and still others underperform or do not meet the requirements at all. Another difficulty is that it is often not clear what employees need to know and be able to do to succeed in their jobs, and that success is partly determined by (accidental) environmental factors over which employees have little or no influence. For example, the energy crisis has caused sales of solar panels and heat pumps to skyrocket. So, sales figures are rising regardless of the performance of sales representatives. Choosing criteria to some extent always involves arbitrary judgement. Determining what we mean by success is a policy decision. That decision may be formalised in a set of objectives, informal as a general goal, or known only implicitly. The problem becomes even more complicated when we consider that performance criteria can change over time. Criterion measures or criteria are classified into two broad groups. The first group includes objective criterion measures. With the emergence of HR information systems (HRIS), it has become much easier to store and link objective data. One type of objective data stored in HRIS concerns personnel data, such as data on staff turnover, transfers, training time, promotions, and salary growth. Other data refer to the productivity of employees, such as number of units produced per unit of time, number of units sold, salary based on commission or piece rate (see Chapter 8), and so on. Still other data are more likely to refer to the quality of outcomes, such as cost of failed work, amount of waste, errors of codification or classification, goods sold back, number of customer complaints, number of days absent, number of times late for work, sick leave, length, or frequency of unauthorised rest breaks. It is important to note that objective measures, as nice as they may seem, have serious drawbacks.  First, they tell us little about the job requirements or required competencies. Consider, for example, the job of a secretary. If we were to focus only on the number of words per minute a secretary can type and the number of mistakes made, we do not consider more difficult to measure aspects of the job that are also part of successful job performance of secretaries, such as answering phone calls in a friendly manner, taking initiative to solve problems independently, meeting deadlines, and being able to look up information quickly. In this case, objective criteria fall short, and we would do well to also rely on the information that emerges from job analyses (see Chapter 3). If we take as the criterion for a medical representative the number of leaflets delivered (objective measure), this is obviously not a sufficient criterion either (a medical representative must do much more than distribute leaflets). In other words, relying solely on typed words per minute, mistakes made, leaflets delivered, etc. gives a distorted and incomplete picture of the person's performance. These measures are therefore deficient. They fall short of encompassing the entire performance domain.  Second, criterion measures are often contaminated. This means that the criterion measure contains factors beyond the control of the person being assessed. Recall our sales representative of solar panels and heat pumps. Because of the energy crisis, demand is rising, and this cannot be attributed to the representative’s qualities. Another example is a police officer working in a deprived neighbourhood making more arrests than her colleague working in an affluent neighbourhood. In addition to the group of objective criterion measures, there are subjective criterion measures. They are called subjective because they involve assessments of how staff members perform. That data is usually collected semi-annually or annually for performance appraisals (see Chapter 6). It is evident that such assessments should relate to the competencies that are crucial for the job. Research shows that objective and subjective measures are not interchangeable. Rather, they complement each other. For instance, objective and subjective measures generally show a moderately positive correlation (between.20 and.40). Therefore, it is strongly recommended to collect both objective and subjective criterion data. 5.5. Predictor constructs Let us now return to the selection part of Binning and Barrett's model. In fact, we now turn back in time because selection obviously takes place before hiring and before the newly hired employee can start performing. The left part of the model chronologically precedes the right part of the model. Or put differently, selection involves a set of predictive hypotheses about how someone will perform in the future (=on the job). The assumption is that certain KSAOs (=predictive constructs) are related to the criterion domain (see box 5.1 for more information on the term ‘construct’). For example, the predictive hypothesis of firm A is that the performance of salespeople depends mainly on their level of extraversion (i.e., ‘extraversion leads to more sales’) and therefore gives preference to applicants who score high on this trait. Firm B, on the other hand, hypothesizes that salespersons’ agreeableness is the best predictor of performance (i.e., ‘agreeableness leads to customer loyalty’) and therefore focuses on applicants’ level of agreeableness. This example shows that the choice to focus on a predictor construct (e.g., extraversion versus agreeableness) depends on the choice of the criterion domain and on the meaning of ‘performing well’ in an organization (e.g., quantity versus quality). From Chapter 3, we already know that job analysis and competency modelling can provide important information about what a job entails and what characteristics distinguish high-performing from low-performing employees. In this section, we discuss the main predictor constructs in the context of personnel selection. In the subsequent section we focus more specifically on the methods that can be used to assess the predictor constructs. Clarification box 5.1 Constructs are abstract summaries of some regularity in nature, and are related to or connected with concrete, observable entities, or events. Gravity provides a good example of a construct: when apples fall to earth, the construct gravity is used to explain and predict their behaviour. It is impossible to see gravity itself; all one sees is the falling apple (unless you’re Thomas A. Anderson, aka Neo/the One). Nevertheless, it is perfectly logical to measure gravity and to develop theories that use the construct gravity. It certainly seems more sensible to deal with the abstract force we call gravity than to develop theories and methods that apply only to falling apples. Constructs are essential to science. They represent departures from our immediate sensory experience that are necessary to form scientific laws. They allow us to generalize from an experiment involving falling apples to situations involving a variety of falling objects. Thus, once I learn about gravity, I will be able to predict a wide variety of phenomena. Constructs are not restricted to unseen forces, such as gravity. Rather, any group of similar things or events may serve to define a construct. Most categories that we use to classify and discuss everyday objects or events are in fact constructs. The colour red, poverty, beauty, quality, reading ability, etc. are all labels for constructs. Although constructs are themselves hypothetical abstractions, all constructs are related to real, observable things or events. The distinguishable feature of psychological constructs is that they are always related, directly or indirectly, to behaviour or experience. 5.5.1. Personality Personality can be defined as a dynamic and organized set of characteristics possessed by a person that uniquely influence his or her cognitions, motivations, and behaviours in different situations. The word personality comes from the Latin persona, meaning mask. In the theatre of the ancient Latin-speaking world, masks were not used to conceal a character's identity, but rather to represent or characterize that character. Over time, personality research in selection contexts have evolved from examinations of a hodgepodge of personality traits to a clear focus on using (sub-)dimensions of the five-factor model (FFM) or the ‘Big Five.’ The FFM is a taxonomy of personality that classifies all human personality traits into one of five broad dimensions. The earliest version of the FFM dates to Tupes and Christal’s work in the 1950s and early 1960s. The original five factors were emotional stability, surgency, culture, dependability, and agreeableness. The specifics of the FFM have evolved somewhat over the years, and the factors are now often labelled emotional stability (or neuroticism), extraversion, openness, conscientiousness, and agreeableness. These traits are presumed to be normally distributed in a bell curve, statistically independent of each other, and largely stable across the lifespan (although the debate on this last issue is ongoing). You may find it helpful to use the acronym OCEAN when trying to remember the ‘Big Five’. For completeness, let me add that there are taxonomies other than the FFM. A well-known alternative is the HEXACO model that includes six factors: honesty-humility, emotionality, extraversion, agreeableness, conscientiousness, and openness to experience. 5.5.1.1. Openness (to experience) Openness to experience refers to being inventive and curious versus consistent and cautious. The openness to experience dimension addresses range of interests and fascination with novelty. Individuals scoring high on openness are motivated to seek new experiences and are generally receptive to entertaining new and challenging facets of cultural life, as well as personal thoughts and emotions. Individuals scoring low tend to be conventional and traditional in their outlook and behaviour, prefer familiar routines to new experiences, and generally have a narrower range of interests. 5.5.1.2. Conscientiousness Conscientiousness refers to being efficient and organized versus easy-going and careless. It reflects dependability; that is, being careful, thorough, responsible, organized, and planful. So, high-conscientious individuals are task-focused and orderly, rather than distractible and disorganized. In addition to these traits, conscientiousness incorporates volitional variables, such as hardworking, achievement-oriented, and persevering. High-conscientious individuals are better able than low-conscientious individuals to motivate oneself to perform tasks that the individual desires to accomplish. 5.5.1.3. Extraversion Extraversion refers to being outgoing and energetic versus solitary and reserved. The extraversion dimension captures our comfort level with relationships. Extraverts tend to enjoy human interactions and to be enthusiastic, talkative, assertive, and outgoing. They tend to be energized when around other people. Extraverts are also surgent (dominant and ambitious) and active (adventuresome and assertive). They tend to experience more positive emotions than introverts. Introverts are introspective, reserved, and reflective. They enjoy time spent alone, prefer a quiet environment, and may feel overwhelmed by too much stimulation from social gatherings. Susan Cain, in an inspiring TED talk, entitled “The power of introverts”, argues that introverts bring extraordinary talents and abilities to the world. 5.5.1.4. Agreeableness Agreeableness (others have labelled it ‘Likeability’ or ‘Friendliness’) refers to being friendly and compassionate versus challenging and detached. The agreeableness dimension refers to an individual’s propensity to defer to others. Agreeable individuals are cooperative (trusting of others and caring) as well as likable (good-natured, cheerful, and gentle). Individuals scoring high on agreeableness are conflict avoidant, empathetic, forgiving, tolerant, modest, and prosocial, whereas individuals scoring low are cold and antagonistic, and tend to engage in behaviours that are selfish, and at its extreme, manipulative, and deceptive. 5.5.1.5. Neuroticism Neuroticism refers to being sensitive and nervous versus secure and confident. The neuroticism dimension taps an individual’s inability to withstand stress. Common traits associated with neuroticism include being anxious, depressed, angry, embarrassed, emotional, worried, and insecure. Individuals who score high on neuroticism are more likely to experience a variety of problems, including negative moods and physical symptoms, and have difficulties coping with negative life events. Individuals who score low on neuroticism tend to be calm, even-tempered, and are less likely to feel tense. They are less reactive to stress than their more neurotic counterparts. 5.5.2. Cognitive abilities The concept of cognitive abilities (also: intelligence) has been defined in many ways. According to the American Psychological Association, intelligence refers to the ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought. Notwithstanding this definition, scholars continue to debate about the exact meaning and structure of intelligence. 5.5.2.1. The Cattell-Horn-Carroll theory The Cattell-Horn-Carroll theory of cognitive abilities (CHC theory) is widely regarded as the most influential theory in the study of human intelligence. In CHC theory, the dimensions of cognitive ability have a hierarchical structure, meaning that some have a broader scope that others. At the lowest level of the hierarchy are specific factors or narrow abilities, which are tied to a specific task or test (e.g., the ability to repeat sentences back after hearing them once) and which are the only abilities that can be measured directly. All other abilities are theoretical (‘latent’) entities inferred from the observed (‘manifest’) relations among specific abilities. Moving up the hierarchy, we find narrow abilities, which are clusters of highly correlated specific abilities. For example, auditory short-term memory storage capacity would be the latent narrow ability that explains why individuals scoring high on repeating back sentences also score high on repeating back digits, letters, and nonsense words. Broad abilities are clusters of narrow abilities that are mutually more correlated with each other than with abilities in other broad-ability clusters. For example, auditory short-term memory storage, visual short-term storage, and attention control form a cluster of narrow abilities, referred to as working memory capacity. The broad abilities are described in Table 5.1. Research is ongoing and it is to be expected that new abilities will be added while others will disappear. At the apex of the hierarchy is general intelligence (g), or general mental ability (GMA), indicating that all cognitive abilities have a common cause. Table 5.1: Up-to-date list of broad abilities according to CHC theory Fluid Reasoning (Gf) Broad ability to reason, form concepts, and solve problems using unfamiliar information or novel procedures. Comprehension-Knowledge (Gc) Depth and breadth of a person’s acquired knowledge, the ability to communicate one’s knowledge, and the ability to reason using previously learned experiences or procedures. Domain-Specific Knowledge (Gkn) Depth, breadth, and mastery of specialized knowledge. Quantitative Knowledge (Gq) The ability to comprehend quantitative concepts and relationships and to manipulate numerical symbols. Reading & Writing Ability (Grw) Depth and breadth of a person’s knowledge and skills related to written language. Working Memory Capacity (Gwm) The ability to maintain and manipulate information in active attention. Learning Efficiency (Gl) The ability to learn, store, and consolidate new information over periods of time measured in minutes, hours, days, and years. Retrieval Fluency (Gr) The rate and fluency at which individuals can produce and selectively and strategically retrieve verbal and nonverbal information or ideas stored in long-term memory. Processing Speed (Gs) The ability to control attention to perform relatively simple repetitive cognitive tasks automatically, quickly, and fluently. Reaction and Decision Speed (Gt) The speed of making very simple decisions or judgments when items are presented one at a time. Psychomotor Speed (Gps) The ability to perform skilled physical body motor movements with precision, coordination, fluidity, or strength. Visual Processing (Gv) The ability to make use of simulated mental imagery to solve problems—perceiving, discriminating, manipulating, and recalling non-linguistic images in the “mind’s eye.” Auditory Processing (Ga) The ability to discriminate, remember, reason, and work creatively (on) auditory stimuli, which may consist of tones, environmental sounds, and speech units. Olfactory Abilities (Go) The abilities to detect and process meaningful information in odours. Tactile Abilities (Gh) The abilities to detect and process meaningful information in haptic (touch) sensations. Kinaesthetic Abilities (Gk) The abilities to detect and process meaningful information in proprioceptive sensations. Psychomotor Abilities (Gp) The abilities to perform physical body motor movements with precision, coordination, or strength. Emotional Intelligence (Gei) The ability to perceive emotions expressions, understand emotional behaviour, and solve problems using emotions. 5.5.2.2. The triarchic theory of intelligence The triarchic theory of intelligence was formulated by Robert Sternberg151. He was one of the first to criticize the dominant psychometric approach to intelligence. In his view, intelligent people are more than ‘book-smart.’ Intelligence also shows by how well individuals deal with environmental changes throughout their lifespan and by their level of creativity. The theory describes three types of intellectual abilities: analytic, creative, and practical. According to Sternberg, these abilities are interdependent constructs, and every person demonstrates a distinct blend of strengths in one, two, or all three triarchic ability categories:  Analytic intelligence is used to solve problems, to analyse, evaluate, explain, and compare, and is the kind of intelligence measured by standard IQ tests.  Creative intelligence is one’s ability to use existing knowledge to create new ways to handle new problems or cope in new situations.  Practical intelligence is one’s ability to successfully interact with the everyday world. Practically intelligent people are especially adept at behaving in successful ways in their external environment. A central element in practical intelligence is tacit knowledge, which refers to action-oriented knowledge, acquired without direct help from others, that allows individuals to achieve goals they personally value. Sternberg and his colleagues152 found that tacit knowledge seems to increase with experience. For example, business managers received higher tacit knowledge scores than business graduate students, who in turn outperformed undergraduate students. Consistent with Sternberg's propositions, some scholars153 found that scores on tacit knowledge showed low correlations (below.20) with measures of analytical intelligence. Not everyone agrees with Sternberg's views though. For example, Linda Gottfredson has been one of the biggest opponents of Sternberg's work on the triarchic theory of intelligence. Her main argument is that Sternberg has not been able to empirically demonstrate a distinction between practical and analytical intelligence. She further argues that is absurd to claim that traditional intelligence tests do not measure practical intelligence. After all, traditional intelligence tests show a moderate positive correlation with income, especially in middle age when people have had the opportunity to reach their maximum career potential and show an even higher correlation with occupational prestige. Also, cognitive intelligence tests predict the ability to stay out of jail and stay alive. Surely a comfortable income, a prestigious job and getting out of jail and staying alive are important signs of ‘practical intelligence’ or ‘street smarts’. If it is possible to predict these practical ‘successes’ with traditional intelligence tests then there is really no need for a new intelligence concept, she argues154,155. Gottfredson claims that what Sternberg calls practical intelligence is not a broad aspect of cognition at all, but simply a specific set of skills that people learn to cope with a specific environment (task-specific knowledge). 5.5.2.3. The theory of multiple intelligences Howard Gardner argued for a notion of intelligence that included non-cognitive abilities and formulated the theory of multiple intelligences. He claimed that standard intelligence tests typically probe only verbal- linguistic and logical-mathematical intelligences. In his view, there are at least seven other human intelligences: musical-rhythmic, visual-spatial, bodily-kinaesthetic, naturalistic, interpersonal, intrapersonal, and existential intelligence. Gardner’s theory has been heavily criticized. For example, Gardner argues that there is a wide range of cognitive abilities, and that these abilities are only weakly correlated. However, this claim does not appear to be consistent with the available evidence: intelligence tests and psychometrics have generally found high correlations between different aspects of intelligence, rather than the low correlations which Gardner's theory predicts, supporting the prevailing theory of general intelligence rather than multiple intelligences. Gardner’s theory has also been criticized by mainstream psychology for its lack of empirical evidence, and its dependence on subjective judgement156,157. The theory itself argues against the use of standardized tests to measure intelligence and therefore it is virtually impossible to falsify the theory. 5.5.3. Emotional intelligence Since the mid-1990s, emotional intelligence is probably the psychological construct that has received the greatest attention in both popular and academic literatures. Emotional intelligence has two constituent components—intelligence and emotion—which has led to heated debates about whether emotional intelligence should be considered an ability, a personality trait, or a combination of both. 5.5.3.1. Ability-based model of emotional intelligence According to proponents of the ability-based model of emotional intelligence, individuals vary in their ability to process information of an emotional nature and in their ability to relate emotional processing to a wider cognition. This means that there is an objective standard that can be used to determine which individuals are more emotionally intelligent than others. In this paradigm, emotional intelligence is viewed as another legitimate type of intelligence. The attentive reader will have noticed the parallels between the ability-based model of emotional intelligence and the role of emotional intelligence in CHC theory. The ability-based model claims that emotional intelligence includes four types of broad abilities158:  Perceiving emotions: This ability refers to how accurately individuals can identify which emotion(s) others feel, typically by processing nonverbal information such as facial expressions and vocal tones. This ability also concerns how accurately individuals can decipher the emotions that they themselves are feeling.  Using emotions: This ability concerns how well individuals capitalize on the systematic effects of emotions on cognitive activities such as creativity and risk taking. For example, emotionally intelligent traders might know that they will be risk averse when they are anxious, whereas traders with lower emotional intelligence may not be aware of this effect. This ability also concerns how effectively individuals can generate emotions ‘on demand.’ For example, emotionally intelligent students may imagine negative study outcomes as a method of motivating performance.  Understanding emotions: This ability concerns how accurately individuals reason about various aspects of emotions, such as when they attach labels to emotions (e.g., using the precise term to express a feeling) and identify connections between (future) events and emotional reactions (e.g., anticipating that unfair procedures will arouse anger in employees).  Regulating emotions: This ability concerns how well individuals can increase, maintain, or decrease the magnitude or duration of their or others’ emotions. Emotionally intelligent individuals are capable of setting goals for modifying their emotions, if necessary (e.g., switching from happiness to anger), and of selecting and implementing the appropriate regulation strategy (e.g., a fiery speech versus a motivational speech by a coach during half-time). 5.5.3.2. Trait models of emotional intelligence Trait models of emotional intelligence define it as a constellation of emotion-related self-perceptions and dispositions located at lower levels of personality hierarchies, including assertiveness, happiness, self- esteem, and self-perceived ability to manage stress. In other words, trait models of emotional intelligence mix emotion-related self-perceptions (e.g., self-perceived ability to regulate emotions) with dispositions (e.g., self-esteem). Proponents of trait models claim that emotional intelligence can reliably be measured by self-reports. 5.5.3.3. Mixed models of emotional intelligence Mixed models of emotional intelligence describe a conception of emotional intelligence that includes not only mental abilities related to intelligence and emotions, but also other personality dispositions such as warmth and persistence, and motivation. Goleman, in his 1995 best-selling book “Emotional intelligence” (5 million copies in print worldwide in 30 languages) claims that this mixture of ability and trait emotional intelligence is highly predictive of leadership performance (and of many other indicators of success). Goleman’s findings are heavily disputed though, and researchers nowadays tend to agree that there is no strong evidence showing that emotional intelligence predicts leadership outcomes when accounting for personality and intelligence. 5.5.4. Work experience Work experience is perhaps one of the most encountered concepts in personnel research and practice. Work experience is relevant for many human resource functions such as selection, training, and career development. Most studies on work experience use time on the job, or tenure, to measure work experience. However, other studies have measured experience by counting the number of times an individual has performed a given task. These findings suggest that the construct of work experience is more complex than it seems. According to Quińones and colleagues159, work experience varies along two dimensions: level of specificity (task, job, organization) and measurement mode (amount, time, type). As Figure 5.2 suggests, individuals can vary in their level experiencing performing specific tasks. They can perform a particular task a given number of times (amount). They can also vary in the types of tasks (e.g., simple/routine versus difficult/complex) that they have performed (type). Finally, they can vary in the amount of time spent working on a given task (time). It is also possible to measure an individual’s experience at the job level of specificity. Individuals can differ in the total number of jobs that they have held (amount). They also can have distinct experiences by performing different types of jobs which vary in terms of prestige, difficulty, criticality, or contribution to organizational effectiveness (type). Finally, differences in work experience can be represented by the amount of time spent in a particular job, or job tenure (time). Differences in experience can also exist at the organizational level of specificity. Individuals can vary in the number of organizations for which they have worked (amount). Organizational experience can also vary depending on the type of organization in which a person has worked (e.g., manufacturing, R&D, etc.) (type). Finally, organizational experience can vary depending on the amount of time spent in each organization (time). The model by Tesluk and Jacobs160 extends the dimensionality of experience by proposing additional measurement modes and levels of specificity. One additional measurement mode, which the authors refer to as ‘density’, is intended to capture the intensity of experiences (e.g., the number of challenging situations in a particular period). A notable characteristic of high-density experiences is that they are likely to have dramatic effects on subsequent work experiences and outcomes such as learning, motivation, or performance in ways that can dramatically change an individual’s career trajectory. For example, Inge Vervotte entered the political arena after she became known by the public as a union representative when she defended the interests of Sabena workers. Another additional measurement mode is the timing of an experience. Timing refers to when a work event occurs relative to a longer sequence of successive experiences such as those that characterize a career. For example, feedback is most effective when given immediately following a challenging assignment. Activities targeted toward developing social and interpersonal relationships are particularly effective at the time of organizational entry. Tesluk and Jacobs also introduce additional levels of specificity. In addition to the task, job, and organizational levels of specification described by Quińones and colleagues, the Tesluk and Jacobs’ model include the team level and occupation level of specificity. 5.5.5. Vocational interests One set of predictor constructs that has received comparatively little attention in the selection literature is vocational interests, which are traits that reflect preferences for certain types of work activities and environments. The primary reason for considering interests in employee selection lies in the assumption that a person will be happiest and most productive when he or she is working in a job or occupation in which he or she is interested. Characteristic of vocational interests is that: (a) within individuals, interests tend to be quite stable over time, (b) interests are rooted in the work context and focus on the types of activities individuals prefer to perform and the environments in which they prefer to perform those activities, an (c) interests are thought to influence the way individuals behave at work by increasing motivation to perform the work activities they prefer and by inspiring workers to increase knowledge and skills relevant to performing those activities. John Holland’s six-factor model, also known as the RIASEC model161,162, has become the best known and most widely researched vocational interests theory. According to Holland, interests can be used to categorize individuals into six types: realistic, investigative, artistic, social, enterprising, and conventional.  Realistic: Individuals with realistic interests, or ‘doers,’ like practical, hands-on types of work activities. According to Holland, realistic individuals have athletic ability, prefer to work with objects, machines, tools, plants, or animals, or to be outdoors. Career possibilities include software technician, mechanical engineer, and forester.  Investigative: investigative types, or ‘thinkers,’ like scholarly, intellectual, and scientific types of work. They like to observe, learn, investigate, analyse, evaluate, or solve problems. Career possibilities include economist, mathematician, and psychologist.  Artistic: artistic types, or ‘creators,’ like work that involves creative, expressive, and unconventional activities. These individuals have artistic, innovating, or intuitional abilities and like to work in unstructured situations using their imagination and creativity. Career possibilities include architect, copy writer, and photographer.  Social: social types, or ‘helpers,’ like work that involves helping, teaching, and caring for others. These individuals like to work with others to enlighten, inform, help, train, or cure them, or are skilled with words. Career possibilities include teacher, social worker, and medical assistant.  Enterprising: enterprising types, or ‘persuaders,’ like work that involves assertive, persuasive, and leadership-oriented activities. They like to work with others, influencing persuading, leading or managing for organizational goals or economic gain. Career possibilities include sales representative, business manager, and lawyer/attorney.  Conventional: conventional types, or ‘organizers,’ like work that involves well-ordered and routine activities. They like to work with data, have clerical or numerical ability, carry out tasks in detail, or follow through on others’ instructions. Career possibilities include tax accountant, safety inspector, and financial analyst. In Holland’s theory, the six RIASEC types are arranged in a circular ordering, with distances between types inversely proportional to the degree of similarity between them (see Figure 5.3). Holland referred to this structure as a hexagon, although several scholars have noted that the underlying circular ordering and structure of the six RIASEC types is a circumplex. The RIASEC interest circumplex is often referred to as a two-dimensional model. As illustrated in Figure 5.3, the most well-known dimensional interpretation is a model with the dimensions Data-Ideas (contrasting E-C with I-A) and People-Things (contrasting S with R). An alternative interpretation uses the dimensions Sociability (contrasting S-E with R-I) and Conformity (contrasting C with A). Figure 5.3: The RIASEC model 5.5.6. Person-organization fit Person-organization (PO) fit can be broadly defined as the compatibility between individuals and organizations. Other types of fit, such as an individual’s compatibility with his or her job, work group, and supervisors have also emerged as important research domains. Fit has been operationalized using a variety of content dimensions, including skills, needs, preferences, values, personality traits, goals, and attitudes. Amy Kristof-Brown85,86 distinguishes between two types of PO-fit: supplementary and complementary fit. Supplementary fit occurs when an individual supplements, embellishes, or possesses characteristics (e.g., personality, preferences) which are like other individuals in an environment. For example, a sales firm employs mostly workers which are extravert and decides to offer a job to a similar extravert job candidate. Complementary fit occurs when an individual’s characteristics ‘make whole’ the environment or add to it what is missing. For example, the sales firm offers a job to an introvert candidate (because the firm lacks someone who is reflective). Another useful distinction made by Kristof-Brown is that between needs- supplies fit and demands-abilities fit. The former occurs when an organization satisfies individuals’ needs, desires, or preferences (e.g., the organization is willing to meet the candidate’s salary and benefit expectations). The latter occurs when an individual has the abilities required to meet organizational demands (e.g., the newly hired nurse has the right skill set to meet the patients’ needs). From the previous it is apparent that the construct PO-fit entails much more than the ‘feeling of a connection.’ 5.5.7. Physical and psychomotor abilities Though the changing nature of work suggests intellectual abilities are becoming increasingly important for many jobs, physical and psychomotor abilities have been and will remain valuable. Unlike cognitive abilities, physical and psychomotor abilities do not seem to show the sort of hierarchical structure shown by cognitive abilities, although there is some evidence for at least a weak general factor in measures of psychomotor performance. Put differently, there is little relationship among these abilities: a high score on one is no assurance of a high score on others. The most comprehensive studies of the domain of human psychomotor abilities have been developed by Fleishman and his colleagues. Fleishman’s taxonomy of psychomotor abilities is shown in Table 5.2. Table 5.2: Fleishman’s taxonomy of psychomotor abilities Arm-hand steadiness The ability to keep your hand and arm steady while moving your arm or while holding your arm and hand in one position. Control precision The ability to adjust the controls of a machine or a vehicle quickly and repeatedly to exact positions. Finger dexterity The ability to make precisely coordinated movements of the fingers of one or both hands to grasp, manipulate, or assemble very small objects. Manual dexterity The ability to quickly move your hand, your hand together with your arm, or your two hands to grasp, manipulate, or assemble objects. Multi-limb coordination The ability to coordinate two or more limbs (for example, two arms, two legs, or one leg and one arm) while sitting, standing, or lying down. Rate control The ability to time your movements or the movement of a piece of equipment in anticipation of changes in the speed and/or direction of a moving object or scene. Reaction time The ability to quickly respond (with the hand, finger, or foot) to a signal (sound, light, picture) when it appears. Response orientation The ability to choose quickly between two or more movements in response to two or more different signals (sounds, lights, pictures). Speed of limb movement The ability to quickly move the arms and legs. Wrist-finger speed The ability to make fast, simple, repeated movements of the fingers, hands, and wrists. The psychomotor abilities shown in Table 5.2 are manifest largely in terms of speed, control, and precision of movement. Fleishman also identified nine physical abilities that have to do with strength and physical proficiency. These are described in Table 5.3. Table 5.3: Fleishman’s nine basic physical abilities. Strength Factors Dynamic strength Ability to exert muscular force repeatedly or continuously over time Trunk strength Ability to exert muscular strength using the trunk (particularly abdominal) muscles Static strength Ability to exert force against external objects Explosive strength Ability to expend a maximum of energy in one or a series of explosive acts Flexibility Factors Extent flexibility Ability to move the trunk and back muscles as far as possible Dynamic flexibility Ability to make rapid, repeated flexing movements Other Factors Body coordination Ability to coordinate the simultaneous actions of different parts of the body Balance Ability to maintain equilibrium despite forces pulling off balance Stamina Ability to continue maximum effort requiring prolonged effort over time 5.6. Predictor measures The final component of Binning and Barrett’s framework relates to different ways in which the predictor constructs can be operationalized or measured, hence predictor measures. Whereas there are a limited set of predictor constructs (though we did not discuss all of them), there are countless ways of measuring them, some more creative than others, and unfortunately, not all equally reliable and/or valid. Before we start describing commonly used predictor measures, it is important to discuss the areas in which these measures may differ from each other. Predictor measures differ in terms of their underlying assumptions, the temporal perspective used, their psychometric properties, the extent to which they are accepted by applicants, among other things. Let's take a closer look at some of these features. Sign versus sample method The sign method is based on the notion that there is a relationship between a latent person characteristic and job performance. 'Latent' means that these characteristics cannot be directly observed. Personality would be a good example, but also other characteristics, such as vocational interests or emotional intelligence, cannot be directly observed. The link between latent characteristics and job performance is of a theoretical nature and can be made more explicit by means of a hypothesis (e.g., ‘Extrovert people make better salespersons’). During selection, measurements are made of these latent characteristics. As these characteristics can be linked to future job performance by hypotheses, the measurements can be taken as expected job performance, that is, as indicators (signs) for later job performance. The sample method is characterized by the fact that parts of the job content and context are directly represented in the selection instrument. This can be done, for example, by means of work samples (e.g., asking an applicant to give a sales pitch), as these can be seen as part of replica of (parts of) the job. The sample method does not refer to latent characteristics. The prediction is based on statistic generalization: from the performance in the work sample (or other sample instrument for that matter) an extrapolation is made into the future towards the performance in the job. Temporal perspective Predictor measures differ from each other based on the temporal perspective they adopt. Some measures ask about the past (e.g., behavioural interview), others about the present (e.g., role play), and still others ask applicants to imagine how they would react in the future work situation (e.g., situational judgment test). Note that this difference is not just a methodological variation that only psychometricians care about. The difference in approach says something about the underlying assumptions of these instruments, namely the assumption that past, current or imagined behaviour is the best predictor of future behaviour. It is recommended to use a mix of instruments in which all time perspectives are represented. Psychometric properties Predictor measures differ from each other in terms of measurement quality. Different quality indicators exist. We discuss the two most important ones: reliability and validity. We conclude by introducing a heavily debated topic in employee selection: the fact that certain groups of applicants systematically obtain lower test scores than other groups and this may lead to adverse impact. Reliability Neither physical measurements nor psychological tests are completely consistent. If some attribute of a person is measured twice, the two scores are likely to differ. For example, an applicant taking two different forms of a test of GMA might obtain an IQ of 110 on one test and 114 on the other. Similarly, an applicant taking the same intelligence test at intervals of several weeks may also achieve slightly different scores. Thus, test scores and other measures generally show some inconsistency. In understanding factors that affect the consistency of scores, it is useful to ask, “Why do scores vary at all?” The fact that applicants obtain different scores may indicate true differences in their cognitive ability level (=true scores). Cognitive ability is the attribute that we try to measure and that contributes to consistency of scores. However, it is also likely that some of the differences in scores are caused by factors unrelated to cognitive ability (=error measurement). For example, applicants who are tired or distracted by their neighbour's groaning perform less well than if they took the test rested and undisturbed. Tiredness and distractedness are features of the individual or the situation that can affect scores but have nothing to do with the attribute being measured. These factors contribute to inconsistency of scores. The idea that scores reflect the influence of two sorts of factors can be expressed by means of a simple equation: 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑠𝑐𝑜𝑟𝑒 = 𝑇𝑟𝑢𝑒 𝑠𝑐𝑜𝑟𝑒 + 𝐸𝑟𝑟𝑜𝑟𝑠 𝑜𝑓 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡 The goal of estimating reliability is to determine how much of the variability in scores is due to errors in measurement and how much is due to variability in true scores. Reliability can be assessed in different ways: (1) test-retest reliability for stability, (2) inter-item reliability for internal consistency and interrater reliability, or (3) parallel scale for equivalence. Box 5.2 contains detailed information on the different methods of measuring reliability. Clarification box 5.2 A measurement scale’s stability is the extent to which scores are consistent from one administration to the next. It involves administrating a measure to a group of applicants; re-administrating that same measure to the same group at some later time; and correlating the first set of scores with the second. The correlation between scores on the first and second administration is used to estimate the reliability of the instrument. The higher the correlation coefficient, the more stable the measure. The estimation of reliability here focuses on the instrument’s susceptibility to extraneous factors from one administration to the next. There are several reservations about using test-retest correlation as a measure of reliability. First, the characteristic or attribute that is being measured may change between the first and the second administration. Consider, for example, an applicant taking a Microsoft Word test just after graduation and then again after 6 months of work experience as an intern. Because of her experience gained in the meantime, we expect her to perform better the second time. Second, the experience of taking the test itself can change a person’s true score; this is referred to as a reactivity. It is quite conceivable that our applicant will focus more on word processing after the Microsoft Word test with particular attention to the questions that appeared on the test, thus changing her true word processing skills. Third, one must be concerned with carry-over effects, particularly if the interval between first and second administration is short. At the second administration, applicants may remember their original answers, which could affect their answers the second time around. In addition to the theoretical problems inherent in the test-retest method, there is a practical limitation to this method of estimating reliability: the test-retest method requires two test administrations. Since testing can be time consuming and expensive, retesting solely for the purpose of estimating reliability may be impractical. Internal consistency is applied not to single items but to groups of items that are thought to measure different aspects of the same construct. An instrument may be said to be internally consistent or homogeneous to the extent that all its subparts are measuring the same characteristic. The internal consistency method involves administering a test to a group of individuals; computing the correlations among all items and computing the average of those correlations; and estimate reliability. Coefficient alpha represents the most widely used and most general form of internal consistency estimate. If we think of each test item as an observation of behaviour, internal consistency estimates suggest that reliability is a function of (1) the number of observations that one makes and (2) the extent to which each item represents an observation of the same thing observed by other test items. If every item on the test measures essentially the same thing as all other items and if the number of items is large, internal consistency methods suggest that the test will be reliable. It is important to note that tests that are reliable are not necessarily valid! Table 5.4 presents a five-item test of GMA that is highly reliable but not valid (i.e., the test does not provide accurate predictions of job performance). Equivalence can be addressed in two ways: (1) the use of the scale by the same administrators at the same time (i.e., interrater reliability) or (2) administrating two parallel forms of the same scales to the same sample successively and calculating the correlation between the two parallel forms (i.e., alternative form reliability). Although, the alternate forms method avoids many of the problems inherent in the test-retest method, there are still many drawbacks to this technique. For example, since two separate administrations are required, the alternate forms method could be as expensive and impractical as the test-retest method. In addition, it may be difficult, if not impossible, to guarantee that two alternate forms of a measure are, in fact, equivalent. Table 5.4: A test of GMA that would be highly reliable but not at all valid. 1. In what month were you born? ___________________________________________ 2. What is your mother’s first name? ________________________________________ 3. 1 + 1 = ____________________________________________________________ 4. How many days are there in a week? _______________________________________ 5. Which of the following is a triangle? _______________________________________ a) b) c) Validity Two of the principal problems in psychological measurement are determining whether a test measures what it is supposed to measure and determining whether that test can be used in making accurate decisions. Suppose that an HR professional devises a test and claims that it measures sales skills and can be used to predict success as a salesperson. These claims would not carry much weight unless they were supported by evidence. Hence, the HR professional must present data to show that the claims are accurate, or valid. For example, if test scores are correlated with sales numbers or with customer satisfaction scores, it is reasonable to conclude that the test is a valid predictor of sales success. If test scores are related to other measures of sales skills, or if the test provides a reasonable sample of tasks that measures sales skills, then the test probably does indeed measure what it purports to measure. There are four different ways of defining validity, also called the four faces of validity: content validity, construct validity, predictive validity, and concurrent validity. Each of these types of validity is briefly described below. Finally, we discuss two other types of validity: incremental validity and face validity.  Content validity (also known as logical validity) considers whether a measure has included all the relevant and excluded irrelevant issues in terms of its content. From a psychometric perspective this means the extent to which the measure adequately samples all possible questions that exist. For example, a measure of emotional intelligence may lack content validity if it only assesses the ability to perceive emotions but fails to measure the other emotional intelligence abilities. Another example: the Belgian State Security Service uses knowledge questions for its selection of employees (e.g., ‘What is the name of the terrorist group that infiltrated the Olympic village in 1972, killed two members of the Israeli Olympic team, and took nine others hostage?’ (a) Black September, (b) Abu Sayyaf, (c) Shining Path, (d) Abu Abbas. [State Security insists this question to be kept secret; so, don’t tell anyone]. A knowledge test like this might provide a valid measure of a person’s knowledge of contemporary international politics, it is unlikely to provide a valid measure of a person’s ability to do the job or his/her motivation to do the job. There is no exact, statistical measure of content validity. It is usually assessed by (1) a critical review by a panel of recognized subject matter experts for clarity and completeness, (2) comparing with the literature, or (3) both. The difficulty for test developers is that there is no definitive list of ‘correct content’ and that, therefore, it is impossible to sample the content of a construct and establish total content validity. It can also be difficult to ensure that the measure includes all the components of a construct.  Construct validity refers to the extent to which a scale measures what it is purported to measure. Modern validity theory views construct validity as the overarching form of validity encompassing other forms of validity (i.e., content validity, criterion validity). A more restrictive view of construct validity is to focus on measure’s structural properties (=structural validity). A common method of studying structural validity involves the mathematical technique known as factor analysis. Factor analysis tells us, among other things, whether the measure is unifactorial or multifactorial (e.g., emotional intelligence having more than one dimension), and whether the items are good representatives of the hypothetical constructs (i.e., factors). Another approach is to correlate scores on the scale in question with scores on other scales that theoretically should be related (=convergent validity) or unrelated (=divergent validity). For example, a new measure of job satisfaction will have convergent validity if it correlates strongly with measures of, for example, intrinsic work motivation, work engagement and flow at work. The measure will have divergent validity when it does not correlate (highly) with theoretically unrelated constructs such as financial literacy or conscientiousness. A third common method of studying construct validity involves experimental manipulation of the construct that is to be measured. For example, a scale designed to measure anxiety should show higher scores for subjects in an experiment who are led to expect shocks than for subjects who fill out an innocuous questionnaire. On the other hand, if a study has nothing to do with anxiety, the control group and the experimental group would not be expected to receive different scores on the anxiety scale. A combination of experiments in which the construct of interest is manipulated and experiments in which that construct is not manipulated provides a powerful method for assessing construct validity.  Criterion-related validity involves comparing the scale of interest with a criterion measure that has been established as valid. There are two types of criterion validity: (1) predictive validity, and (2) concurrent validity. The goal of a predictive validity study is to determine the correlation between test scores, which are obtained before making hiring decisions, and criterion scores, which are obtained after making hiring decisions. A selection tool has predictive validity when there is a high correlation between scores on the selection test obtained before hiring and scores on the criterion measure obtained after hiring. Let’s clarify with an example. When scores on the Wechsler adult intelligence scale are used in the selection of applicants for the military, the expectation, based on theory and available meta-analytical evidence (see below), is that applicants with high intelligence scores will be successful in the military bootcamp and may also be successful as soldiers. The intelligence test shows predictive validity if we indeed observe that applicants with high intelligence scores are also the employees who receive high ratings of training performance and job performance. An important caveat is that we can only make statements about the predictive validity of a selection instrument if the criterion measure is valid! The example in Table 5.5 tries to illustrate this. The second column shows the GMA scores of eight individuals who applied for the military. We assume for this example that all eight applicants were recruited (because the army has a severe shortage of new recruits and has lowered its standards). Lieutenants X and Y are tasked with evaluating the recruits’ performance after one month of bootcamp. X takes her assignment seriously and rigorously assesses the performance of all new recruits on a scale of 1-5. Lieutenant Y does not feel like extra administration and gives everyone a passing grade except recruit F with whom he sometimes goes out on weekends. Therefore, recruit F receives the maximum score. The correlation coefficients for both cases differ dramatically (r = 0.63 versus r = -0.44). If all lieutenants acted like Y, we would be in danger of advising the military to recruit mostly ‘non-intelligent’ applicants. This example may seem far-fetched but is based on true facts: as part of a validation study, my co- workers and I found a negative correlation between intelligence test scores and job performance ratings of newly recruited soldiers. Fortunately, we discovered in time that the performance measures lacked validity and that nothing was wrong with the intelligence test. Table 5.5: Validity (correlation) for different criterion measures. GMA Criterion X Criterion Y Person A 100 3 4 Person B 90 2.5 4 Person C 110 2 4 Person D 115 2 4 Person E 125 4 4 Person F 95 2.5 5 Person G 130 4.5 4 Person H 120 3 4 Correlation 0.63 -0.44 The advantage of the predictive validity approach is that it provides a simple and direct measure of the relationship between scores on predictor measures and performance on the criterion for the population of applicants in general. If predictor measures were used to select applicants, the correlation between predictor measures and performance measures (which are collected at some later time) would indicate the validity of the predictor measures in the population of people with high test scores (those selected using the test), rather than the validity of the measures in the population in general. In most decision situations, the goal is to select those most likely to succeed out of the total population of applicants. To estimate the validity of a measure for this type of decision, an estimate of the correlation between test scores and performance scores must be obtained for all applicants. Most practical methods for estimating the validity of decisions involve correlating test scores and criterion scores in some preselected population (e.g., present workers) and therefore fall short of the ideal predictive validity approach. The practical alternative to a predictive validity strategy is simply to obtain both test scores and criterion scores in some intact, preselected population and to compute the correlation between the two. Since many research designs of this type call for obtaining test scores and criterion scores at roughly the same time, they are known as concurrent validation strategies. However, the delay between obtaining test scores and obtaining criterion scores is not really the most fundamental difference between predictive and concurrent validity. The most fundamental difference is that a predictive validity coefficient is obtained in a random sample of the population about whom decisions must be made, whereas a concurrent validity coefficient is generally obtained in a preselected sample that may be systematically different from the population in general. For example, to estimate the validity of a situational judgment test, a researcher might correlate test scores from present workers with their most recent job evaluation. What argues against the use of concurrent validity strategies is the fact that the preselected sample of workers is likely to differ from the applicant population in systematic ways. For example, the sample of workers differs from the applicant population because, on average, the workers are better qualified (otherwise they would not have been hired!). Although the predictive validity is generally preferred over the concurrent validity approach, in applied settings concurrent validity is much more common for pragmatic reasons: it is not necessary to select randomly or to allow a significant time lag between testing and criterion measurement to obtain concurrent validity coefficients. Also, although test theory suggests that concurrent validity coefficients seriously underestimate the population validity, concurrent validities are in fact often similar in size to predictive validities. Research designs that use the correlation between predictor measures and criterion measures in a highly selective population to estimate the validity coefficient for making decisions in the population in general give rise to several statistical problems. By far the most serious problem in concurrent designs is the range restriction that occurs when people are selected according to their test scores (only those with high scores are selected). Range restriction refers to the fact that some data are missing for estimating the validity coefficient, namely the job performance ratings of rejected applicants if they were hired. There may also be range restriction in the criterion, such as when poorly performing employees are laid off. Range restriction serves to reduce the correlation between test scores and criterion measures (see Figure 5.4). Fortunately, there are formulas to correct for the impact of range restriction, but discussion of these formulas is beyond the scope of this course. Figure 5.4: Effects of range restrictions in the predictor on the correlation between test scores and criterion measures  Incremental validity refers to the extent to which a predictor measure increases the predictive ability beyond that provided by existing predictor measures. Or in technical terms, the extent to which the new predictor instrument explains variance in the outcome variable on top of the variance already explained by the current instruments. Using a selection interview, cognitive ability test, and personality inventory, we can already predict reasonably well whether applicants will perform well in their jobs. The question arises whether it is still worth adding an additional test to our test battery, e.g., a situational judgment test. If we can make even better predictions with the situational judgment test than without it, the extra test has incremental validity. Note that this type of validity depends not only on the predictor measure in question, but also on the measures which make up the base set. Hierarchical multiple regression is the technique most frequently used to assess the amount of variability a predictor explains. This is often done by fitting a model to the data without the predictor measure of interest, and then afterward adding the focal predictor measure and fitting a new model. The two models are then compared (by calculating the R-squares statistic); and a significant change in R-square indicates that the new predictor measure does have significant incremental validity, or additional predictive power.  Face validity refers to the extent to which applicants perceive the content of the predictor measure to be related to the content of the job. In other words, does the predictor measure appear reasonable to those who take it? Face validity is not necessarily a characteristic of the predictor measure itself. There is no ‘face validity coefficient’ that can be estimated, for example. It rather represents an interaction between what the test asks the applicant to do and the applicant’s understanding of what the test is designed to measure. A test is most likely to be judged to be face valid if the tasks that the applicants are required to perform are in some way related to their understanding of what the test is designed to measure. Questions about one's professional past are generally considered more relevant than questions about one's childhood. A role-play in which applicants are asked to sell a random product is more relevant for a sales position than for an administrative position. There are good reasons to be concerned about applicants’ perceptions of predictor measures. Many tests and inventories ask applicants the perform to the best of their ability or to provide candid answers to potentially ambiguous or embarrassing questions. If applicants perceive the test as trivial or irrelevant, they may not take the test seriously and may not provide responses that reflect the attributes that the test is designed to measure. This might severely limit the validity and usefulness of the measure. Another important reason is that applicants use the information during recruitment and selection to form a picture of what it will be like to work for the organization140. The use of seemingly irrelevant test or questions may lead applicants to conclude that the selection procedure is unfair163 and that they are better off not continuing their application with that company. Consider, for example, the use of bizarre interview questions such as "How would you describe the colour yellow to a blind person?" Whereas such questions in theory could be relevant for measuring creativity or stress tolerance (although other methods are better), it is highly questionable whether applicants understand and appreciate these questions accordingly. Thus, a negative selection experience, partly because of seemingly irrelevant tests, not only might lead to diminished measurement quality, but can also cause good applicants to drop out and spread negative word-of- mouth among potential applicants. Adverse impact Predictor measures are evaluated not only in terms of their reliability and validity, but also in terms of the amount and magnitude of any subgroup differences (differences in scores between majority and minority groups in terms of gender, race, and so on) they generate. The size of the subgroup differences determines whether a predictor measure produces adverse impact: a significantly different selection ratio in hiring, promotion, or other work-related decisions so that members of minority groups are disadvantaged, leaving those groups underrepresented in the organization. That issue is a hot topic in the United States and is becoming increasingly important in Europe as well. The following section describes some of the most used predictor measures and their characteristics. Unlike predictor constructs, there is no limit on the number of predictor measures. New measures are still being developed and innovative ways of measuring constructs are being devised. Of course, not all attempts are equally successful. 5.6.1. Biodata Biographical data, or biodata, include information about an individual’s background and life history (e.g., civil status, education, and previous employment), ranging from objectively determined dates – date of first job, time in last job, years of higher education – to subjective preferences (though some rule that out as invalid biodata on the grounds that biodata must be objective and verifiable). The diversity of constructs assessed by biodata is such that there is no common definition. Biodata enthusiasts argue that the ‘best predictor of future performance is past performance,’ and that it is one of the best routes for understanding and improving the prediction of work performance. Biodata is typically obtained from an application form. The aim is to design an application form that collects only data known and shown to predict specific work- related performance. What sets biodata apart from the more informal use of application forms, references, or CVs, is the fact that biodata questions are scored. Scoring takes place based on how strongly the item empirically (e.g., as derived from previous samples relates with the criterion), or theoretically (e.g., as informed by a job analysis) relates with criterion. Some sample biodata items are shown in Table 5.4. Table 5.4: Sample biodata items. Predictor Biodata item construct Cognitive ability  What grade did you get for your last math exam? Personality  How often did you participate in social activities during your last year at high school?  How many museums did you visit this year? Vocational  When you were at college, how often did you read interests technology magazines?  How much did you like technology class when you were in high school? Work and life  When you were in high school, how important was it to you values to be seen by others as successful?  As a college student, how interested were you in politics and public affairs? Biodata inventories typically include both verifiable and non-verifiable items. Verifiable items, such as demographic or background information, are uncontrollable (there is nothing one can do to alter one’s place of birth or ethnicity) and intrusive as compared to more controllable, unverifiable items assessing attitudes and behaviours, such as “do you think people should drink less alcohol?” The advantage of verifiable items is that they yield very consistent responses, even across different jobs, and are less susceptible to faking than are non-verifiable items (although people also lie about things that are verifiable). Biodata is a valid predictor of an applicant’s suitability. Because of the heterogeneity of different biodata studies and different samples used validities differ significantly across studies (from the low to mid-0.20s up to the 0.50s). Meta- analytic studies show that the validity coefficient is in the range between 0.20-0.35. Studies have also provided evidence for the incremental validity of biodata over established personality and cognitive ability measures. These studies are important because of the known overlap between these measures and biodata and show that even if personality and intelligence are considered, biodata scales provide additional useful information about the predicted outcome. 5.6.2. References and letters of recommendation Another widely used method in personnel selection is the reference report or letter of recommendation, simply known as the reference. Referees are often former employers, teachers or colleagues who are asked to provide a description of the applicant. They are expected to have sufficient knowledge of the applicant’s previous work experience and suitability for the job. References can be used to check for prior disciplinary problems, confirm details on an application form or gain new and salient information about a possible employee. References may be obtained through a few different options:  The applicant may provide a letter of recommendation from each recommender, and the recommender can provide whatever information he or she chooses about the applicant.  The applicant may be asked to provide information for one or more references, including their name, professional affiliation/title, contact information, and relationship type and duration with the applicant. The prospective employer will reach out to each reference with specific questions for recommenders to respond to, either via e-mail or a phone conversation.  The applicant may provide the prospective employer with the names and e-mail addresses for one or more references, and each recommender will receive a link to a structured online reference form to complete. References can be classified based on how structured or standardized they are. In a letter of recommendation, recommenders can include and exclude whatever information they choose. In addition to being overly positive, such an unstructured recommendation letter may not provide relevant job-related information or enough specifics about an applicant’s performance to make it a useful information source. Conversely, reference checks through phone/email questions and online forms are much more structured. References tend to have low levels of inter-rater reliability. That is, ratings from different referees tend to be only moderately correlated with each other, with correlation coefficients ranging from 0.40 to 0.60. These differences are not problematic as such. After all, there would be little point in using multiple referees if we expected them to provide the exact same information. However, inter-rater agreements of 0.60 are low and mean that only 36% of the variance in applicants’ attributes is accounted for, leaving a substantial percentage of variance unexplained. Also, (structured) references tend to have a poor validity. Meta- analytic studies report correlations with job performance in the range of 0.15-0.25. There are several explanations for these low coefficients:  References tend to be very lenient, which produces highly skewed data. This leniency bias can be partly explained by the Pollyanna principle, which refers to the human tendency to focus on the positive and to recall more positive than negative phenomena from their memories. The principle is named after the titular character from author Eleanor Porter’s children’s book Pollyanna, a cheerful and optimistic girl who always looks on the bright side. Besides the Pollyanna principle, also the fact that there is no incentive for referees to be harsh, and the fact that their primary interest is with the applicant, and not with the organization, explains why referees provide biased and inflated reviews of the applicants.  Referees tend to write similar references for all applicants and thereby provide more information about themselves than about the applicant. Moreover, personality factors and mood distort references significantly, affecting reliability as well as validity.  Referees may wish to retain good employees and know that a positive reference may have the opposite effect. For the same reasons they may choose to write very positive references for staff they are eager to see off. The validity of references can be improved by using standardized forms, multiple referees, comparative ranking scales and preserving the anonymity of the referee. Still, it remains highly questionable if in that case referees can provide any additional information to psychometric tests, interviews, and biodata. 5.6.3. Cognitive ability tests An extensive body of research conducted over the last 50 years has led to the consensus that cognitive abilities manifest a hierarchical structure. In conjunction with this, many tests have been developed to measure both GMA and specific cognitive abilities, such as numerical, spatial, verbal, and perceptual ability. Examples of cognitive abilities tests include the Stanford-Binet Intelligence Scales, Wechsler Adult Intelligence Scale, Woodcock-Johnson Tests of Cognitive Abilities, the Multidimensional Aptitude Battery II, Cattell Culture Fair III, and Raven’s Progressive Matrices. New cognitive ability tests are continuously being developed and launched by psychometric test developers, publishers, and consultants, such as CEB SHL, Hudson, Hay Group, CEBIR, Quintessence, Talento, Mapwave, Psychological Consultancy Ltd, Psytech International Ltd, PSI, the Myers-Briggs Company, IBM Watson Talent (formerly Kenexa), Korn Ferry, Hogan Assessments, Hogrefe, Cubiks, and many more. The usefulness of cognitive ability tests has been documented in dozens of quantitative reviews in the form of publications and technical reports, incorporating more than 1,300 (!) meta-analyses summarizing results from more than 22,000 primary studies164–167. The total sample size of job applicants and employees providing data for these validation studies is well above five million individuals. Probably the most comprehensive meta-analyses were conducted by Hunter and Hunter168 in 1983 and 1984, with a database consisting of 515 studies and a total sample of over 38,000 individuals. This definitively answers the question of whether cognitive ability tests are useful predictors of performance in occupational environments: Yes, they are excellent predictors of training and job performance. In fact, no other predictor measure in employee selection produces as high validities, as consistently, as does cognitive ability. Training performance. Cognitive ability tests predict learning, acquisition of job knowledge, and job training performance with outstanding validity (correlations in.50 to.70 range). Validities for training criteria generalize across jobs, organizations, and settings. Validities are the highest for GMA and specific quantitative and verbal abilities, and somewhat lower for memory (although still highly useful with a validity of.46). Validities are moderated by job complexity: the greater the complexity of jobs being studied, the higher the validity of cognitive ability tests in predicting training performance. Job performance. Cognitive ability tests predict overall job performance with high validity (correlations in.35 to.55 range). Validities for overall job performance generalize across jobs, organizations, and settings. Support for these key conclusions comes from meta-analyses of studies using narrow job groupings (e.g., mechanical repair workers, first-line supervisors, health technicians), broad job groupings (e.g., clerical jobs, law enforcement), and heterogeneous job groupings (e.g., by job complexity). Individual large sample studies also point to the same conclusions. Validities are highest for general mental ability and quantitative abilities and somewhat lower for memory. Job complexity also moderates the validities of cognitive ability tests for predicting job performance. Higher validities are found for job of higher complexity. Cognitive validity tests show no declines in validity as workers gain experience. An important issue regarding the use of cognitive ability tests in personnel selection is whether they produce adverse impact. Table 5.5 summarizes race and ethnic group differences on cognitive ability based on the largest meta-analysis of the employment literature169. The measure of group differences is Cohen’s d, which expresses the differences between the means of two groups in terms of standard deviation units. In general, d values of.80 or greater are considered large effects, those around.50 are moderate, and those below.20 are small. Table 5.5: Race and ethnic group mean score differences in general mental ability among job applicants. Group comparison Setting Job complexity N k D White-Black Industrial Across complexity levels 375,307 11 1.00 Industrial (within-job Low 125,654 64.86 studies) Moderate 31,990 18.72 High 4,884 2.63 Military Across complexity levels 245,036 1 1.46 White-Hispanic Industrial Across complexity levels 313,635 14.83 Military Across complexity levels 221,233 1.85 k = number of studies summarized in meta-analysis; d = standardized group mean difference; N = total sample size. Positive effect sizes indicate Whites scoring higher on average. On average, Blacks score 1.00 and Hispanics.83 standard deviation units lower than Whites on general mental ability measures used in employee selection. Group differences on measures used in military settings are somewhat larger, especially for the White-Black comparison. One explanation for this finding could be the greater heterogeneity among military job applicants. Cognitive ability differences between Black and White applicants to high-complexity jobs are smaller than among applicants to lower-complexity jobs, most likely because of severe self-selection as well as higher minimum requirements with regards to educational credentials. Among applicants to medium- and low-complexity jobs, White-Black and White-Hispanic differences in cognitive ability tests are large and almost certain to result in adverse impact if cognitive ability were the only predictor used in employee selection. The situation is slightly better among applicants to high-complexity jobs, in which group mean-score differences in cognitive ability are only moderate, and thus carry slightly less severe implications for adverse impact. But what do applicants think about cognitive ability tests? After all, applicant perceptions and reactions may be crucial for the success of personnel selection decisions. If methods are seen as intrusive, disrespectful of privacy, or unscientific (think about polygraphs, graphology, astrology, crystal balls, palmistry, and tea leaves), applicants may withdraw from the selection procedure and run to a competitor. Meta-analytical research shows that cognitive ability tests are among the most favourably scored methods170. The authors concluded that cognitive tests 1) are generally rated positively, 2) are perceived as the most scientifically valid method, 3) show respect for privacy, and 4) provide applicants with the opportunity to show their best. The main drawback of cognitive ability tests is that they are regarded as ‘interpersonally cold’. Especially applicants for higher positions seem to perceive cognitive ability tests rather negatively. It even goes so far that even clients sometimes stop using them in a selection procedure because they damage the reputation of the company or consultancy firm. To address these issues, several consulting firms have developed cognitive ability tests with business content. The questions are packaged in an appealing realistic context. The job relevance in that business-related skill test is promoted by using realistic graphs, tables, or job- specific text passages in the item presentation. 5.6.4. Personality inventories Personality inventories are standardized multiple-choice questionnaires that ask applicants to indicate their typical behaviour, thinking, reactions toward people or situations. For each group of statements, the applicant must either indicate which one applies most to him or her (forced choice, then scored ipsatively) or how strongly each individual statement applies to him or her (on a scale, then scored normatively). Personality inventories are not ‘tests’ with right or wrong answers. The applicant's answers are compared to a norm group (for example, a representative group of managers) to determine the applicant's position on several scales. There are many personality inventories. Well-known examples are the NEO-PI, the HEXACO-PI-R, the Ten-Item Personality Inventory (TIPI) (see Figure 5.5), the Occupational Personality Questionnaire (OPQ), the California Personality Inventory (CPI), and Hogan Personality Inventory. The website https://ipip.ori.org/ includes over 250 measures of broad and narrow personality traits that can be used without paying a fee. Figure 5.5: The Ten-Item Personality Inventory (TIPI)171. Here are a number of personality traits that may or may not apply to you. Please write a number next to each statement to indicate the extent to which you agree or disagree with that statement. You should rate the extent to which the pair of traits applies to you, even if one characteristic applies more strongly than the other. Disagree Disagree Disagree a Neither Agree a Agree Agree strongly moderately little agree nor little moderately strongly disagree 1 2 3 4 5 6 7 I see myself as: 1. ______ Extraverted, enthusiastic. 2. ______ Critical, quarrelsome. 3. ______ Dependable, self-disciplined. 4. ______ Anxious, easily upset

Use Quizgecko on...
Browser
Browser