Personnel Selection Methods PDF
Document Details
Uploaded by EyeCatchingHarmony4749
Tags
Summary
This document explores various personnel selection methods, encompassing recruitment strategies, techniques for evaluating applications, and the identification of potential biases in the process. The content highlights the influence of different factors, including applicant characteristics (like gender, age and weight), on hiring decisions. It emphasizes the significance of fair and unbiased selection methods in achieving optimal workforce composition and productivity.
Full Transcript
4.3C: Personnel Selection Theme 1: Foundations: Predicting, deciding, and common selection methods Chapter 1: Old and new selection methods Clark Hull’s early work in psychology underscored that productivity differences between workers are substantial, with the best employees often being twice as pr...
4.3C: Personnel Selection Theme 1: Foundations: Predicting, deciding, and common selection methods Chapter 1: Old and new selection methods Clark Hull’s early work in psychology underscored that productivity differences between workers are substantial, with the best employees often being twice as productive as the worst. This highlights the critical role of HR in selecting top talent, despite some seeing HR as not directly adding value since it doesn’t produce or sell goods. Techniques like the Rational Estimate method quantify this impact, showing that a top employee can contribute as much additional value annually as their salary. For example, a high-performing employee in a £50,000 job can add an extra £50,000 in value compared to a lower-performing counterpart. Hunter and Hunter’s (1984) research on public sector savings emphasizes the financial benefit of good selection methods, such as Philadelphia’s police force potentially saving $18 million yearly and the US Federal Government, $16 billion. However, critics argue that these benefits only increase productivity for companies that use these methods, not national productivity overall, since not every company can hire the top talent. At present, employers can use selection methods to secure a larger share of high-performing talent, and this book explores how to achieve this advantage. RECRUITMENT The advertisement attracts applicants, who complete and return an application form. Some applicants’ references are taken up; the rest are excluded from further consideration. Applicants (As) with satisfactory references are shortlisted, and invited for interview, after which the post is filled. The employer tries to attract as many As as possible, then pass them through a series of filters, until the number of surviving As equals the number of vacancies. Recruitment Sources Employers can attract applicants through various methods: ads, public or private agencies, word of mouth, walk-ins, job fairs, and the internet. To improve recruitment effectiveness, employers should analyze these sources to identify which methods bring in reliable, long-term employees and ensure that applicant pools are diverse in gender, ethnicity, and disability. Research by Newman and Lyon (2009) suggests that certain advertisement language (e.g., "results-oriented" or "innovative") can attract applicants with traits like conscientiousness or mental ability, both linked to job performance. This targeted language can help attract quality candidates while supporting workforce diversity. Realistic Job Previews (RJPs) Realistic Job Previews (RJPs) provide applicants with a truthful view of a job, even if it's demanding or monotonous, like call center work. Earnest, Allen, and Landis (2011) found that RJPs modestly reduce turnover and are cost-effective. RJPs may work by boosting the employer's image as honest and transparent. Informal Recruitment Word-of-mouth recruitment, often through current employees, is a low-cost approach associated with lower turnover rates and slightly higher job performance (Zottoli & Wanous, 2000). However, organizations like the British Equality and Human Rights Commission critique this method, as it can limit diversity by reinforcing homogeneity within the workforce. A study by Weller et al. (2009) of nearly 3,000 individuals in Germany confirmed that informal recruitment generally leads to longer employee retention than recruitment through agencies or advertisements. New Technology and Recruitment Technology has transformed recruitment, enabling processes like advertising, application submission, sifting, and assessment to be conducted electronically, which can significantly speed up hiring. With digital tools, some companies are even making "same-day offers." Increasingly, jobs are advertised online, either on the employer's website or on recruitment platforms, and job seekers can post their profiles for employers to review. This broader reach can bring in a larger pool of applicants, which helps employers find high-quality candidates, though it also means more applications to review. Application Sifting The application form (or its digital equivalent) serves as the first filter in recruitment, helping narrow down applicants in a process known as "sifting." Efficient sifting saves time for HR departments, provided it remains fair and accurate. However, research suggests sifting isn’t always done effectively. Machwirth, Schuler, and Moser (1996), using policy capturing analysis, found that HR staff often made decisions based on criteria different from what they claimed to use. While managers stated they focused on proven ability and past achievements, they frequently rejected candidates based on minor factors like a messy or poorly written application. Further, McKinney et al. (2003) studied how US campus recruiters used grade point average (GPA) in interview selection. While some recruiters logically selected candidates with high GPAs (linked to mental ability and job performance), others ignored GPA, and a third group even screened out high-GPA candidates, disregarding the link between mental ability and job performance. These strategies appeared random and were unrelated to the specific job or employer type. Accuracy and Honesty Many surveys report that application forms (AFs), résumés, and CVs often contain inaccurate or even false information. However, most such surveys are conducted by companies that offer verification services, with limited independent research available. Goldstein (1971) found that some nursing applicants exaggerated experience and salary, with 25% giving reasons for leaving that their previous employers disagreed with, and 17% listing false former employers. McDaniel, Douglas, and Snell (1997) found that 25-33% of professionals in marketing, accounting, management, and computing admitted to misrepresenting their experience or salary, or hiding damaging information like past terminations. Keenan (1997) surveyed British graduates, revealing that while they rarely lied about degrees, 73% admitted being dishonest about why they wanted to work for a specific employer, and 40% weren’t honest about their hobbies. Electronic applications do not solve these issues, as dishonesty is just as easy online as on paper. Research Agenda The accuracy of CV and AF information Types of information commonly misreported Characteristics of individuals who provide false information Reasons for misrepresentation Trends in the frequency of incorrect information Influence of careers advice, coaching, and self-help resources on honesty Fairness and Sifting Equal opportunities (EO) agencies in the US have developed lists of questions that should not appear on application forms to prevent discrimination. Certain questions— such as those about ethnicity, gender, and disability—are legally prohibited. Other questions, like those on driving offences, arrests, or military discharge, may unintentionally discriminate against minorities. Questions about holiday or weekend availability could discourage religious minorities from applying. Surveys (reviewed by Kethley & Terpstra, 2005) indicate that many US employers ignore or are unaware of this guidance, continuing to ask potentially problematic questions. Kethley and Terpstra’s review of 312 US Federal cases shows complaints about AF questions mostly centered on sex (28%), age (25%), and race (12%), though certain questions (e.g., on military discharge or marital status) have never led to court cases. Internet recruitment may introduce new fairness issues, as not all applicants have online access. Bias in Sifting Studies using "paper applicant" methods—where HR staff rate equally qualified applicants who differ only by a characteristic like gender, age, or physical appearance—often reveal biases in hiring: Gender Bias: Davison and Burke (2000) reviewed 49 studies, finding both male and female sifters were biased against female applicants, especially when job information was limited. Weight Bias: Ding and Stillman (2005) found that overweight female applicants in New Zealand were more likely to be sifted out. Parental Bias: Correll, Benard, and Paik (2007) showed that mothers were more likely to be rejected, while fathers were sometimes favored. While "paper applicant" studies have limitations (sifters know they're being observed), other methods reveal similar biases in real-world hiring. Bertrand and Mullainathan (2004) used an "audit" method, sending real applications to real employers using names associated with white or African American applicants. They found that for every 10 white applicants shortlisted, only 6.7 African American applicants were. Additional studies using this approach show similar biases: Hoque and Noon (1999) found that British employers gave longer, more helpful responses to enquiries from "Evans" (a typically white name) than from "Patel" (a South Asian name). McGinnity and Lunn (2011) found Irish applicants were twice as likely as African, Asian, or German applicants to be interviewed in Ireland. Derous, Ryan, and Nguyen (2012) found Dutch applicants with Arab-sounding names were four times less likely to get callbacks. Agerström et al. (2012) reported similar discrimination in Sweden, even when applicants had positive qualities (like commitment) highlighted. Social Class Applicants with high-status names (e.g., "Charles Bartle-Jones" vs. "Gary Rodgers"), prestigious hobbies (e.g., polo vs. darts), or private (public) school backgrounds receive slightly more favorable responses from employers. Pregnancy Morgan et al. (2013) identify four biases held by some US employers against pregnant women: perceptions of lower competence, lack of commitment, inflexibility, and a need for "accommodation" (e.g., adjustments in work hours). Using an audit method, women sometimes wore a pregnancy prosthesis and used specific scripts to counter these stereotypes (e.g., "I can work whenever you need me"). These scripts somewhat reduced discrimination. Age Ng and Feldman (2012) list six common stereotypes about older workers: that they are less motivated, less trusting, less healthy, resistant to change, more prone to work- family imbalance, and less willing to train. These biases often lead to older applicants being filtered out early. However, Ng and Feldman found only the stereotype about training was accurate; older workers were indeed less likely to seek training. Earlier research by Ng and Feldman (2008) showed no significant age differences in core performance, creativity, training, safety, or counterproductive behaviors. In fact, older workers often perform better in areas like attendance and going beyond job requirements, with correlations between age and avoiding absence/lateness (0.26– 0.28) comparable to some selection tests. Their findings suggest older workers can be strong employees and counter common stereotypes. Weight Agerström and Rooth (2011) conducted an audit study in Sweden where employers received paired applications for real jobs, each with a photo of a similar person, one obese and one not. The obese applicants were less than half as likely to be invited for an interview. Later, some HR managers involved took an implicit association test, which revealed unconscious biases, linking obesity with traits like being ineffective, slow, lazy, and lacking initiative. This test highlights automatic associations that HR managers may not be aware of. Gender and ‘Backlash’ In Western cultures, stereotypes often expect men to be ambitious ("agentic") and women to be caring ("communal"). Carlsson et al. (2014) tested if applicants who didn’t fit these stereotypes—caring men and assertive women—faced a "backlash" in hiring. In their study of over 5,000 applicants for 3,000 jobs in Sweden, they found no evidence of such bias. Improving Application Sifting Behavioral competences can improve sifting by asking applicants to describe past actions related to key job skills. For instance, applicants might recount a time they persuaded others to follow an unpopular decision to demonstrate their influencing skills. However, no research currently supports the effectiveness of this method in improving application filtering. Weighted Application Blanks (WABs) and Biodata Application forms can be adapted into Weighted Application Blanks (WABs) by analyzing employee data to identify characteristics that predict job success. For instance, a study found American female bank clerks under 25, single, living at home, and with several prior jobs were more likely to leave their position early (Robinson, 1972). Although such criteria might not be legal today, WABs can help reduce turnover. While WABs are often paper-based, they could be applied to digital applications as well. Biodata also uses personal history to aid selection but gathers this information via a separate questionnaire instead of the application form. Training and Experience (T&E) Ratings Previously common in the U.S., T&E ratings provided a structured way to assess applicants' training and experience, using rating systems instead of subjective judgments. Research has shown that T&E ratings predict job performance (McDaniel, Schmidt, and Hunter, 1988). While Internet-based application systems are now more common, similar validation research for these systems is still pending. Minimum Qualifications (MQs) Job ads often set minimum qualifications (e.g., a civil engineering degree and five years of experience) to screen applicants. While MQs help filter applications, they may unintentionally exclude some groups, such as minorities or women with career breaks. In the U.S., MQs can be legally challenged, so they require careful, evidence-based justification. Buster, Roth, and Roth (2005) discuss using expert panels and rating systems to set fair and relevant MQs, encouraging the bracketing of qualifications (e.g., considering two to four years of experience instead of arbitrarily setting three). Background Investigation Employers may conduct background checks on applicants, covering criminal records, driving and credit history, education, employment, and even personal reputation. Background checks are increasingly popular in the U.S., used by 85% of employers by 2007, up from 51% in 1996, driven partly by cases of executive resume fraud. In Britain, background checks are common for childcare workers and government employees handling confidential information. However, little research exists on whether background checks effectively select suitable candidates and filter out unsuitable ones. Isaacson et al. compared applicants who failed background checks with those who passed, finding that those who failed scored slightly higher on a risk- taking test. In a computer simulation of manufacturing work, this group worked a bit faster but made more errors. Roberts et al. (2007) followed 930 New Zealanders and found no link between criminal convictions before age 18 and self-reported counterproductive work behaviors. Internet Tests Many employers now use online tests in place of paper applications to assess job knowledge and screen out those lacking essential skills or mental ability (like knowledge of Microsoft Excel). Using such tests early in the hiring process can improve candidate quality by filtering out unsuitable applicants sooner, although there’s little research yet on their effectiveness. Some tests also assess personality or job fit directly, rather than relying on candidate self-descriptions. Application Scanning Software Software systems now scan applications and CVs for job requirement matches, streamlining the process and reducing biases tied to ethnicity, age, disability, or gender by ignoring these factors. However, biases can persist indirectly (e.g., through hobbies associated with certain demographics). Consistent but not necessarily more accurate, these systems often rely on simple keyword searches rather than complex decision- making, raising concerns about their actual value. Psychologists are skeptical, questioning whether the creators of such software have adequate insights for sophisticated candidate evaluation. There’s no evidence that human application sifters use complex processes that software can replicate, nor has research identified any subtle relationships in sifting that could be coded into software. Stone et al. (2013) found little research on critical areas like keyword searching in applications. OVERVIEW OF SELECTION METHODS What is Assessed in Personnel Selection? Personnel selection assesses an applicant’s ability to perform the job. Job analysis provides a detailed answer, outlining essential attributes for success (explored in Chapter 3). Table 1.2 summarizes key assessment categories. Mental Ability: Includes general mental ability (intelligence) and specific abilities, like problem-solving, clerical skills, or mechanical comprehension. Some roles require sensory abilities (e.g., good hearing, balance, or hand-eye coordination). Physical Characteristics: Certain jobs require physical abilities like strength, endurance, or dexterity. Other roles may implicitly require specific height or appearance. Personality: Psychologists identify 5 to 30 personality traits influencing how people think, feel, and behave. For example, an extroverted person enjoys social interactions, making them easier to place in a sales role than a shy person. Interests, Values, and Fit: People find work fulfilling when it aligns with their values; for example, someone wanting to help others may prefer charity work over retail. "Fit" describes how well a candidate’s outlook or behavior aligns with the organization’s culture, whether explicitly stated or implicitly understood. Knowledge: Jobs require specific knowledge, from basic tasks like using a phone to advanced skills like statistical analysis. Knowledge can often be taught, so it isn’t always a selection criterion, though complex knowledge may need a higher level of mental ability. Work Skills: Skills refer to performing tasks effectively, like driving a bus or diagnosing an illness. Some skills are selected for, while others are taught, depending on the job's demands and required abilities. Social Skills: These are crucial for many jobs, especially in service and teamwork roles. Skills like communication, negotiation, and leadership are increasingly valued. Hogan and colleagues suggest that compatibility with colleagues is often underrated in hiring. Construct and Method: Arthur and Villado (2008) note that reviews often mix up the "construct" (what is measured, like personality) with the "method" (how it’s measured, like a questionnaire). Nature of Information Collected Discussions on selection methods often focus on the effectiveness of personality questionnaires, structured interviews, and work samples but overlook the types of information these methods gather. Table 1.3 categorizes selection methods into five distinct types of information. Self-Report Evidence: This includes information the applicant provides through forms, interviews, and questionnaires, such as personality tests, attitude measures, and biographical inventories. Self-reports can be unstructured (like open-ended interviews) or structured (like personality questionnaires). Some are "transparent," meaning applicants can easily infer what responses are desired, while others, like biodata or projective tests, are less straightforward. Self-reports offer advantages: they’re cost-effective, convenient, and often respectful to applicants, as they allow individuals to describe themselves. Self-reports can be the best way to assess certain qualities, such as job satisfaction. However, self-reports have limitations: employers cannot always verify this information, and responses may be influenced by coaching or lack of self-awareness. Many resources guide applicants on answering interview questions, which can affect the authenticity of their responses. Problem of Lack of Self-Insight: Some applicants may believe they possess qualities like leadership, popularity, or creativity and reflect this belief in their applications, personality questionnaires, or interviews. However, by objective measures like tests or others’ opinions, they may not actually have these qualities. This self-insight issue, often linked to the Dunning-Kruger effect (where individuals with lower abilities overestimate their skills), is rarely studied in the selection context but highlights the need for verification from additional sources. Self-Report and Common Method Variance: When all information on an applicant’s personality and job performance comes solely from their self- reports, the data may reflect biases like optimism, pessimism, or dishonesty rather than true traits or abilities. To avoid these issues, it’s best to gather information from different sources or validate the applicant’s claims. Other Report Evidence: Information about the applicant from others, like references or peer ratings, varies in reliability based on the source’s expertise. General references require little expertise, while some evaluations use experts, like psychologists. Demonstrated Evidence: This involves applicants performing tasks or tests to showcase their abilities, such as general mental ability tests or job knowledge assessments. This type of evidence is more reliable, as ability tests are difficult to fake, but collecting it is typically more costly and complex. Recorded Evidence: Certain information, like degrees or certifications, is recorded and verifiable. However, some employers mistakenly rely on self-reported data without checking official records, risking accuracy. Work History as Evidence: An applicant's work history can demonstrate achievement, such as holding positions like CEO or Managing Director, or earning prizes and awards. This recorded evidence often holds more weight than self- or other-reported claims. Demonstrated and recorded evidence generally has an asymmetric relationship with self-reports: if someone fails a task, it disproves any claim that they can perform it, but a claim that someone "cannot do something" doesn't negate proven evidence that they can. Involuntary Evidence Some evidence comes from the applicant indirectly, not from what they say or do intentionally. For example, polygraph tests assess truthfulness by measuring physiological responses, not verbal answers. Polygraphs can help assess which self- reports to trust. Other involuntary methods include graphology (inferring traits from handwriting) and drug tests. These methods rely on indicators beyond self-reporting and sometimes have a certain allure due to their perceived objectivity. Work Performance Selection research compares predictors, such as selection tests, with criteria like work performance indices. Performance metrics can be straightforward, such as units produced or sales achieved. Supervisor ratings are commonly used in the U.S. due to availability and perceived objectivity. However, defining a good performance criterion can be complex, as performance may not always be unidimensional. Views on successful performance may differ among supervisors, workers, management, and customers, raising questions about what truly defines effective performance and who gets to decide. Metrics like turnover, punctuality, absences, and accidents are easy to collect but sometimes hard to interpret. Lent, Aurbach & Levin (1971) found that global supervisor ratings were the most common performance metric in 60% of studies, as discussed further in Chapter 12. Fair Employment Law Employment laws prohibit discrimination against protected classes, including women, ethnic minorities, and people with disabilities. While direct discrimination is illegal, indirect discrimination, or adverse impact, is a key concern in selection. Adverse impact occurs when a selection process unintentionally favors the majority group, potentially leading to a legal presumption of discrimination. For instance, if an employer screens out applicants unemployed for over six months, it might disproportionately affect certain ethnic groups with higher unemployment rates. Employers must minimize adverse impact to avoid costly legal consequences and negative publicity, as covered in Chapter 13. Current Selection Practices Employer surveys on selection practices are common but may be unreliable, with response rates often around 20% (Piotrowski and Armstrong, 2006). Some selection practices are unethical or even illegal, and employers may be reluctant to disclose them. For example, some employers are rumored to access criminal records via former police officers or use credit checks, though these are restricted by credit agencies in the UK. Secret databases listing applicants to avoid due to union activism or safety complaints are also rumored. While many organizations forbid telephone references, Andler and Herbst (2002) report that managers may still request and provide them. Selection in Britain Three UK surveys by IRS (Murphy, 2006), CIPD (2006), and Zibarras and Woods (2010) show that most employers in the UK continue to rely on interviews, references, and occasionally tests, though online testing is less common. About half use assessment centers or group exercises, while few use biodata. Two surveys didn’t report response rates. Zibarras and Woods examined organizations of varying sizes and found no significant differences in selection methods by company size, though public and voluntary sectors tend to favor formal methods like application forms and structured interviews over CVs and unstructured interviews. Selection in Europe European countries emphasize a social negotiation approach to selection, valuing employee rights, privacy, and fairness. Salgado and Anderson (2002) found that mental ability (MA) tests are now more common in Europe than in the U.S. A large survey by Price Waterhouse Cranfield in the early 1990s (Dany & Torchy, 1994) covering 12 Western European countries highlighted some national preferences: Graphology was favored only in France. Application forms are widely used, except in the Netherlands. References are common across Europe but less popular in Spain, Portugal, and the Netherlands. Spain and the Netherlands show a preference for assessment centers and aptitude testing, which are among the most popular methods there, while these methods are less commonly used in other countries. Group selection methods are rare overall but are most commonly used in Spain and Portugal. Selection Trends in Germany: Schuler et al. (2007) conducted a survey of 125 German HR managers, comparing with data from 1993. References, unstructured interviews, biographical questionnaires, and medical exams have decreased in popularity, while structured interviews and assessment interviews have gained favor. The survey also measured perceived validity and applicant acceptability. High-validity methods include structured interviews, assessment centers, group discussions, and work samples, while graphology and online personality questionnaires are seen as having low validity. Selection methods vary by position: references are used primarily for executives and rarely for clerical roles, while assessment centers are common for apprentices, trainees, and management positions. Mental ability (MA) tests are only used for apprentices. Selection in the USA: According to Piotrowski and Armstrong (2006), a survey of Fortune 1000 companies shows that U.S. employers nearly universally use application forms, résumés, and reference checks. About half of companies conduct skills testing, and a significant minority use personality tests, biodata, and some drug testing. The survey did not inquire about interview usage. Selection Practices in New Zealand and Australia: Surveys in New Zealand (Taylor, Keelty & McDonnell, 2002) and Australia (Di Milia, 2004) reveal selection practices similar to the UK, with interviews, references, and applications being nearly universal. Personality tests, ability tests, and assessment centers are less common but are growing in popularity. Selection in Nigeria and Ghana: Arthur et al. (1995) reported that selection practices in Nigeria and Ghana heavily favor interviews (used by 90%) and references (46%). Paper-and-pencil tests, work samples (19%), and work simulations (11%) are used less frequently. Ryan et al.’s (1999) Survey of 20 Countries Ryan et al. surveyed selection practices across 20 countries, noting differences in the use of various methods. Mental ability tests were most popular in Belgium, the Netherlands, and Spain but least used in Italy and the U.S. Personality tests were widely used in Spain and minimally in Germany and the U.S. Projective tests were favored in Portugal, Spain, and South Africa, with low usage in Germany, Greece, Hong Kong, Ireland, Italy, and Singapore. Drug testing was most common in Portugal, Sweden, and the U.S., and least used in Italy, Singapore, and Spain. Ryan linked these findings to Hofstede's (2001) theory, suggesting countries high in uncertainty avoidance (like Greece and Portugal) tend to use more selection methods, especially structured interviews, to reduce unpredictability. Huo, Huang, and Napier’s (2002) Survey of 13 Countries This study, which included countries like Australia, Canada, China, Indonesia, Taiwan, Japan, South Korea, and Mexico, found interviews to be very common, though less so in China and South Korea. Some nations, including Mexico, Taiwan, and China, partially base selection on personal connections (e.g., family, school, region). Japan’s selection focuses on interpersonal compatibility, likely reflecting Japan's traditional practice of lifelong employment. Economic Climate and Selection Methods The hypothesis suggests that during high unemployment, employers may be less concerned with applicants’ views on selection methods, potentially increasing the use of less favored methods like mental ability tests or personality questionnaires. This hypothesis could have been tested between 2007 and 2015 during economic downturns, but this opportunity seems to have been missed. Uncertainty Avoidance Countries high in uncertainty avoidance, like Greece and Portugal, prefer formal, predictable procedures to manage unpredictability. Low uncertainty avoidance countries, like Singapore, are more flexible with selection methods. Reasons for Choice of Selection Method Harris, Dworkin, and Park (1990) found that personnel managers weigh factors like fakability, offensiveness to applicants, and industry norms when selecting methods. Despite recognizing interviews as imperfect and easily faked, managers continue using them for purposes beyond assessment. Terpstra and Rozell (1997) also explored reasons managers avoid certain methods, adding further insight into selection preferences. Useful Selection Methods Structured interviews and mental ability tests are seen as effective by HR managers. However, some managers avoid mental ability tests due to legal concerns, and biodata is often unfamiliar. Muchinsky (2004) observed that U.S. managers commonly prioritize questions like "How long will this take?" and "How much will it cost?" over accuracy. Swiss HR Perspectives König, Jöri, and Knüsel (2011) used the Repertory Grid method (Box 1.2) to explore views on 11 selection methods among 40 Swiss HR practitioners. Distinctions made by practitioners ranged from practical aspects (e.g., spoken vs. written, internal vs. outsourced) to information type (self-view vs. others’ view of the candidate). Topics valued by work psychologists, like validity, were mentioned by only a few HR practitioners, with just five citing validity and only four noting fakability. Surprisingly, no one mentioned legal concerns, applicant reactions, or cost, likely due to Switzerland's favorable economic conditions and relatively low legal pressures. Eleven practitioners considered a temporal perspective, linking selection methods to present vs. long-term relevance, possibly influenced by psychoanalysis. König et al. note that more research is needed to understand these perspectives. Feedback from Applicants Billsberry (2007) reviewed 52 accounts of selection experiences from applicants in the UK. The accounts reveal issues often hidden in employer surveys, including rudeness, unprofessional behavior, blatant lying, bias, and even harassment. Interviews were commonly cited as the preferred assessment method but often conducted poorly. Billsberry’s data suggest that a comprehensive survey of applicants is necessary to gauge the extent of poor employer behavior. Role Repertory Grid Method The Role Repertory Grid is a tool used to delve deeper into perceptions of selection methods by comparing sets of three methods at a time (e.g., interview, reference, personality questionnaire). HR professionals identify differences between two of the methods and explain the distinction, such as "traditional vs. recent invention." This process reveals the person’s own conceptual framework for selection, rather than imposing a predefined structure. This approach can uncover unique insights into how HR practitioners think about selection methods. Chapter 2: Validity of selection methods Reliability means consistency. For example, physical measurements like the dimensions of a chair are usually so consistent they are taken for granted. However, selection assessments are often less consistent, and at their worst, may provide little or no useful information. Several types of reliability are important in selection research: 1. Retest Reliability: Compares two sets of scores from the same individuals, taken at two different times (usually about a month apart. Examples include interview ratings, ability test scores, or personality questionnaires. If the test measures a stable trait, the results should be similar. Retest reliability is also used for work performance measures, like monthly sales or supervisor ratings, which should remain fairly stable over time. 2. Inter-rater Reliability: Compares ratings given by two assessors for the same individuals, such as during interviews or job supervision. If assessors disagree, at least one is incorrect. Inter-rater reliability should be calculated from independent ratings that have not been influenced by prior discussion, which is challenging in real-world scenarios. 3. Split half reliability: The test is divided in two, each half scored separately and the two halves correlated, across a large sample. If the test is too short, the halves will not correlate well. The usual way of splitting the test is to separate odd‐numbered items from even numbered. 4. Internal Consistency Reliability measures whether the questions in a psychological test all assess the same thing. For example, if a personality test has 10 questions, each targeting a different trait, the overall score would be meaningless, and the internal consistency reliability would be near zero. Similarly, if a test includes irrelevant questions, it will lack internal consistency. Poor internal consistency may also indicate that the test is too short. While earlier research used split-half reliability, modern methods rely on the alpha coefficient, which averages the reliability of all possible ways to split a test. Unlike retest reliability, which requires repeating a test with the same individuals, internal consistency reliability is calculated from a single dataset. This makes it more convenient for test publishers. However, the two methods measure different aspects of reliability and cannot replace each other. In supervisor ratings with multiple scales (e.g., assessing different aspects of performance), internal consistency reliability is often much higher than inter- rater reliability. Key Terms: Alpha Coefficient: Measures how well each question contributes to the total score, equivalent to the average of all possible split-half reliabilities. Standard Deviation (SD): Summarizes how scores vary. It describes both individual differences (e.g., how one person compares to others) and the overall variability in the dataset. For normal (bell-shaped) distributions, the SD allows for comparisons across different measurement systems. For example, someone 6′2′′ tall is 2 SDs above the mean, regardless of whether the measurement is in feet, meters, or another unit. Error of Measurement: This concept estimates how much test scores might change if the test is taken again. It’s calculated using the test's reliability and standard deviation. For example, an IQ test with a reliability of 0.90 has an error of measurement of 5 points. This means that one in three retests may vary by 5 or more points. For instance, someone scoring 119 might drop to 116 on a retest, while someone scoring 118 could rise to 121. This variability highlights why psychologists avoid treating IQ scores as overly precise. Untrained individuals may misunderstand this variability, which is why proper training is essential for administering psychological tests. Validity: A valid selection method measures what it claims to measure and predicts something useful. Creating a valid test requires rigorous research, including testing large groups and collecting follow-up data. For example, while anyone can create a list of 20 questions about diversity, only careful study can turn that into a reliable and valid selection tool. Up to 11 types of validity exist, differing in their persuasiveness, suitability for different sample sizes, legal acceptability, and relevance to selection decisions. Criterion Validity: This measures whether a test predicts work productivity. For example, in 1918, Link showed that the Woodworth Wells Cancellation test strongly correlated (0.63) with monthly production for munitions inspectors. Criterion validity focuses on whether high test scores predict better performance, regardless of the test’s content, format, or appearance. Its two main types are: 1. Predictive Validity: Measures if the test predicts future performance. This mirrors real-world hiring, where HR selects candidates and later evaluates their success. 2. Concurrent Validity: Measures if the test predicts current performance. Test results and performance data are collected simultaneously, making it faster and easier than predictive validation. However, some researchers criticize this method for being less scientifically rigorous. Problems with Concurrent Validation: 1. Missing Persons: It excludes individuals who have left, been promoted, or dismissed, leading to incomplete performance data and possibly lower validity. 2. Unrepresentative Samples: Current employees may not reflect the diversity of potential applicants, such as women or minorities, if the workforce lacks representation. 3. Direction of Cause: Employees may have adapted to the job through training or experience, which might skew results. Despite being quicker, concurrent validation is less reliable for assessing a test’s true predictive power. t might seem obvious that successful managers are dominant because they develop the ability to influence and command respect. However, proving that dominant applicants (As) become good managers shows dominance is truly important. This issue is particularly relevant for personality traits but can also apply to abilities. For example, would 10 years of numerical or financial work improve numerical ability test scores? While laypeople might say yes, test designers usually assume no. Advantage of Using Present Employees: Current employees are less likely to fake responses on personality questionnaires (PQs) or self-reports since they already have the job and don’t need to exaggerate their qualities. Chapter 7 explores predictive and concurrent validity for PQs. Adverse Impact Considerations: When analyzing adverse impact, it’s important to distinguish between applicants and current employees. For example, finding no test score differences between white and non-white current employees might seem positive but could be misleading if more low- scoring non-white applicants were excluded earlier. Roth, Buster, and Bobko (2011) highlight this issue, calling it the "Bobko-Roth fallacy." Selective Reporting and 'Fishing Expeditions': Psychologists often use statistical significance to evaluate research, disregarding results likely to occur by chance more than 1 in 20 times and focusing on those occurring less than 1 in 100. However, this approach can be misleading. For example, using a 16PF personality questionnaire and 10 supervisor ratings could generate 160 correlations. If 8 of these are significant at the 5% level, it doesn’t necessarily indicate meaningful findings—it may simply reflect chance. Researchers sometimes fall into "fishing expeditions," where they highlight random significant results, such as a link between 16PF dominance and politeness ratings, to support their claims. While reputable journals reject such practices, unscrupulous researchers or test publishers might still use them as evidence of validity. Additionally, some researchers omit tests or outcomes that fail to produce significant results, making their studies appear more focused and successful than they are. Effect Size: Wiesner and Cronshaw (1988) found a correlation of 0.11 between traditional selection interviews and work performance. A correlation of 0.11 means the interview accounts for just 1% of the variation in work performance (calculated by squaring the correlation: 0.112=0.010.11^2 = 0.010.112=0.01). This suggests that traditional interviews provide very little information about how employees will perform. The 0.30 Barrier: Critics of psychological testing argue that tests rarely achieve correlations above 0.30 with real-world outcomes like work performance. A 0.30 correlation explains less than 10% of the variance, leading some to question the usefulness of tests. In the U.S., the Equal Employment Opportunities Commission often considers correlations below 0.30 insufficient to establish validity. Even the strongest correlations in selection research (0.50–0.60) explain only 25–36% of the variance in performance. This reflects the influence of many other factors—like management, organizational climate, coworkers, and economic conditions—on work performance. For example, Romanian data by Sulea et al. (2013) show personality tests predict counterproductive work behavior (CWB) modestly (correlations of 0.14– 0.24). However, "abusive supervision" (e.g., bullying, ridicule) has a much stronger correlation with CWB (0.46). The d Statistic: The d statistic measures the difference between groups in terms of standard deviations (SDs). For instance, informal recruitment (word of mouth) vs. formal recruitment (press advertisements) shows a small effect size (d=0.08d = 0.08d=0.08), meaning the difference in work performance between the groups is less than one-tenth of an SD. Small effect sizes, like a correlation of 0.11 or a ddd of 0.08, indicate that the selection or recruitment method has minimal impact. While this may suggest the need for better methods, modest improvements can still be worthwhile if they are achieved easily and at a low cost. For example, informal recruitment may only slightly improve performance but is simple and inexpensive to implement. Content Validity: This measures whether a test appears plausible and relevant to experts. It involves analyzing the job, selecting appropriate questions, and designing the test. Borrowed from educational testing, content validity ensures the test items represent tasks employees need to perform, not traits they possess. For example, a firefighter test might include: 1. Creating Test Items: An expert panel, assisted by HR and psychologists, drafts questions based on tasks firefighters need to know (e.g., "Which materials produce toxic gases when burning?") or perform (e.g., "Connect fire appliance to hydrant"). 2. Reviewing Importance: A second panel rates how often these tasks arise and how essential they are. 3. Finalizing the Test: Questions are rewritten into a standardized format (e.g., rating performance from 1 = fails completely to 5 = performs quickly and accurately). Advantages: It’s easily defensible, as it directly relates to the job. It doesn’t require a large sample of current jobholders, unlike criterion validity. It’s clear and credible to applicants. Limitations: Works best for jobs with specific, limited tasks. More suited for promotion tests than initial selection. Subordinate to criterion validity, which confirms whether the test truly predicts job performance. Content validation designs a test that should work, but criterion validation ensures it does work. Construct Validity: This ensures the test measures something meaningful about applicants. It answers questions like, “What does this test assess?” and “What kind of people will score well?” Ideally, the answer should align with job performance. Construct validity looks deeper into what the test measures, such as abilities, personality, or skills. This is important because: If the test measures personality but the company already uses a personality test, the new test might add little value, even if it’s called something else (e.g., emotional intelligence or sales aptitude). Exploring construct validity helps ensure the test provides unique and relevant insights for selection. If a two-day assessment center evaluates the same qualities as a 30-minute ability test, it’s more cost-effective to use the shorter test. Considerations for Construct Validity: If the new test primarily assesses mental ability, HR should anticipate potential adverse impacts on certain groups. If applicants challenge the selection methods in court, HR must clearly explain what the test measures and what it does not. Lacking this clarity can undermine their defense. Construct validity is usually evaluated by comparing one selection method (e.g., interview ratings) with others (e.g., psychological tests). It helps reveal what a method actually measures, which might differ from its intended purpose. For instance, unstructured interviews often measure mental ability more than expected. Convergent and Divergent Validity: Assessment centers (ACs) aim to measure multiple traits (e.g., problem-solving, influence, empathy) using various exercises like group discussions and presentations. Convergent Validity: High correlations between the same trait assessed through different exercises (e.g., empathy in both group discussions and presentations) indicate convergent validity. Discriminant Validity: Lower correlations between different traits assessed in the same exercise (e.g., empathy and problem-solving in a presentation) indicate discriminant validity. These correlations may not be zero because traits can overlap. Low/Zero Correlations: Different Traits assessed in different exercises (e.g., problem-solving in a group discussion and empathy in a presentation) should show very low or no correlation. These correlations confirm whether the assessment center accurately measures distinct traits as intended. In both AC research and broader selection studies, low convergent and divergent validity are common. Low Convergent Validity: When different methods to measure the same trait (e.g., "influence" in a personality questionnaire vs. group discussion) do not correlate, it suggests one method may be flawed, or something unexpected is influencing results. Low Divergent Validity: When tests aimed at measuring distinct traits fail to differentiate them (e.g., all group discussion scores highly correlate), it suggests the method is not effectively separating dimensions. Method vs. Construct: The distinction between method (how something is assessed) and construct (what is being measured) is critical but often confused. Methods like interviews or ACs can evaluate many constructs (e.g., ability, personality, social skills, or knowledge). Constructs, such as mental ability or personality, are often tied to specific methods (e.g., ability tests for mental ability, personality questionnaires for personality traits). Arthur and Villado (2008) caution against treating methods as inherently valid without specifying what they are assessing. For instance, meta-analyses often report "validity" for a method like ACs without clarifying which constructs were measured, leading to inconsistent validity estimates across studies. Some methods are more versatile (e.g., interviews can assess many constructs), while others are more specific (e.g., ability tests assess only mental ability). When standard methods face issues, alternative approaches may be considered. Cross-Validation: Cross-validation ensures the validity of a test by re-evaluating it with a different sample. This is especially crucial for methods prone to capitalize on chance, like personality questionnaires that produce multiple scores. Locke (1961) demonstrated the risks of skipping cross-validation. He found seemingly plausible patterns (e.g., students with long surnames were less impulsive, liked vodka, but didn’t smoke) that disappeared entirely when tested on a second sample. This shows how unvalidated findings can mislead if not confirmed through cross-validation. Incremental Validity: A selection test, such as a reference check, may not be very accurate on its own but can improve predictions when combined with other methods. For example, a reference might capture aspects of work performance that other tests, like a mental ability test, overlook. However, when two methods are highly correlated, such as job knowledge and mental ability tests, they overlap in what they measure, so the second test adds little value. Two predictors (e.g., mental ability and references) are not highly correlated, so they add more to the prediction of work performance. Two highly correlated predictors (e.g., mental ability and job knowledge) overlap significantly, meaning they contribute less to predicting work performance. Importance: Incremental validity is crucial for building effective selection systems, ensuring tests measure different aspects of performance rather than the same traits repeatedly. This requires data on how tests interrelate, which varies significantly across methods (e.g., mental ability tests and interviews often have intercorrelation, while references may not). Differential Validity: This occurs when a test predicts outcomes better for one group than another. For instance, a test might predict work performance accurately for men but not for women. Differential validity is undesirable in personnel selection, as it can make a test unusable. Historically, it was believed differential validity was not an issue. Schmidt and Hunter (1998) argued that 85 years of research showed no differential validity by race or sex. However, recent studies have revisited this concern, as discussed in Chapter 13. Adverse Impact vs. Differential Validity: Adverse Impact: Occurs when one group (e.g., men) scores higher on a test (e.g., physical strength) than another. Differential Validity: Occurs when a test predicts outcomes for one group (e.g., men) better than for another (e.g., women). For example, British military research (Rayson, Holliman & Belyavin, 2000) found strength tests predicted per Marginal Types of Validity: 1. Face Validity: o A test appears plausible because of its name or content (e.g., a "Dominance Test" with questions about dominant behavior). o Face validity doesn’t prove the test is actually valid but makes it more acceptable to employers and applicants. 2. Factorial Validity: o Indicates how many factors a test measures (e.g., five distinct traits). o While helpful, it doesn’t explain what the factors are or how well they predict performance. 3. Mythical Validity: o Assumptions that a test is valid due to its popularity, heavy marketing, or long-standing use. o Sometimes validity claims are exaggerated because supporting research is inaccessible or misunderstood. o Example: Goleman (1995) claimed top performers at Bell Laboratories had higher emotional intelligence (EI). This was widely cited as proof of EI’s importance, but the original study lacked any formal EI testing, making the claim speculative—a clear case of mythical validity. Chapter 3: Work Analysis The Future of Work Analysis Is Work Analysis Always Necessary? Pearlman, Schmidt, and Hunter (1980) showed that mental ability tests predict performance across many clerical jobs without requiring detailed work analysis. However, in the U.S., relying on generalizations may not satisfy legal requirements, like those of the Equal Employment Opportunities Commission. A detailed work analysis, such as the PAQ, might still be needed to legally prove a job's classification. Is Work Analysis Becoming Obsolete? Traditional work analysis, like the exhaustive janitor rating schedule (e.g., item 132: Places a deodorant block in urinal), is being questioned. Rapid workplace changes demand general abilities like problem-solving, adaptability, teamwork, and self- direction, rather than narrowly defined, task-specific skills. Backward vs. Forward-Looking: Work analysis focuses on past job tasks, promoting uniformity, whereas modern organizations need agility and innovation. De-jobbing: Employees now engage in fluid, rapidly changing tasks without fixed job descriptions, reducing the relevance of traditional work analysis. Trend Shift: Emphasis is shifting from specific skills to broader traits like flexibility and resilience. Competency Modelling (CM) Competency modelling has become popular over the past 25 years, emphasizing observable skills or abilities needed for successful job performance. Unlike traditional work analysis, CM integrates broader organizational strategies. Key Features of CM: Includes a mix of specific skills, general abilities, aptitudes (e.g., quick learning), and personality traits (e.g., resilience). Aims to identify top performers rather than just describe tasks. Aligns with business objectives and applies across all levels, with competencies evolving by grade or salary. Links HR systems, like selection, training, evaluation, and promotion. Focuses on driving change, though this could raise ethical concerns (e.g., enforcing conformity or identifying "wrong attitudes"). Challenges with CM: Subjectivity: Evaluating traits like "strategic vision" or "diversity awareness" is less straightforward than verifying technical skills. Reliability Issues: Poor inter-rater reliability can arise unless users are properly trained. Wish Lists: Some CM systems are criticized as vague collections of desired traits, lacking validation or clear behavioral indicators. Overlap with Work Analysis: Some competency models are essentially repackaged work analyses. Conclusion: While CM offers flexibility and strategic alignment, it often sacrifices objectivity and rigor compared to traditional work analysis. Proper validation and training are critical for its success. Chapter 7: Assessing personality by questionnaire Defining Personality Guilford (1959) defined personality as “any distinguishable, relatively enduring way in which one person differs from another,” typically excluding mental abilities and physical attributes. Psychologists recognize up to eight models of personality (Cook, 2013): 1. Trait: 5 to 10 key traits. 2. Factor: 16 statistically derived abstractions. 3. Social Learning: Bundles of learned habits. 4. Motives: A profile of individual needs. 5. Phenomenological: How one views the world. 6. Self: How one sees oneself. 7. Psychoanalytic: A system of defenses. 8. Constitutional: Inherited neuropsychological traits. Trait and Factor Models Work psychologists often use trait or factor models: Traits are internal mechanisms that influence how people respond to situations, summarizing past behavior and predicting future actions. For example, assertive people have acted assertively in the past and are expected to do so in the future. Factors are similar but derived through statistical methods. Both models simplify personality into 5–20 broad characteristics common to everyone, making them practical for selection processes due to their simplicity and universal application. Mischel’s Criticism Mischel (1968) challenged the trait and factor models, arguing that behavior is too inconsistent to support broad generalizations about personality. For example, assessing a trait like honesty, despite its popularity in the U.S., might not reliably predict consistent behavior across different situations. Trait Limitations: The Character Education Inquiry (Hartshorne & May, 1928) showed that seven measures of honesty correlated poorly, suggesting that calling someone “honest” is only meaningful when considering the specific context—when, where, and with whom. Mischel reviewed similar inconsistencies in traits like extraversion, punctuality, curiosity, persistence, and attitude to authority, which are common in job descriptions. He advocated for a habit model of personality, focusing on specific, situational habits. While more detailed, this model is less suited to broad, all-purpose assessments. Mischel also highlighted the "0.30 barrier," showing personality tests rarely predict real-world outcomes beyond a correlation of 0.30. Methods for Measuring Personality Personality can be assessed through various approaches: Self-report questionnaires (most common). Other reports, such as peer ratings or references. Demonstrations, like assessment center exercises. Recorded achievements, such as prior roles or qualifications. Involuntary methods, like graphology. Personality Questionnaires (PQs) PQs, often what people think of as “personality tests,” are the most widely used method. They come in different formats: Endorsement (yes/no) is simple and quick. Rating scales provide more detailed scores. Forced-choice limits faking by balancing the attractiveness of answer options. Advantages for HR: 1. Self-contained: All information comes from the applicant. 2. Easy to use: HR staff can administer them with minimal training. 3. Cost-effective: Large groups can be tested, even online. 4. Efficient: PQs collect a large amount of information quickly. For example, completing the Eysenck PQ averages 15–16 questions per minute. 5. Comprehensive: PQs can explore thoughts, feelings, and behaviors (e.g., items 5 and 6 in Table 7.1). PQs are accessible, fast, and practical tools for selectors, though their limitations in predictive accuracy remain a concern. Personality Questionnaires (PQs) Adverse Impact: PQs are less likely than mental ability tests to create adverse impact on ethnic minorities. This has contributed to their growing popularity in U.S. HR practices. Keying and Validation: Four main methods are used to design (construct) and validate PQs: 1. Acceptance/Face Validity: o People find the results believable. o Weak validity, as it relies on the "Barnum effect," where generic profiles feel personal. 2. Content Validity: o Questions appear relevant. o Example: Woodworth’s 1917 Personal Data Sheet for U.S. Army recruits used psychiatric symptoms to ensure relevance. 3. Empirical Validity: o Questions are included because they predict specific traits or behaviors. o Example: The CPI Dominance scale was based on answers distinguishing student leaders from followers. 4. Factorial Validity: o Questions are grouped by themes using factor analysis, which estimates the number of factors in the PQ. o Example: Cattell’s 16PF grouped 16 factors from 187 questions, discarding unrelated items. Interpreting PQ Scores: Raw scores mean little until compared to a relevant population (e.g., general public, managers, or students). Standardized scores like T scores are commonly used for interpretation. Reliability: PQs have lower reliability than mental ability tests. Retest Reliability: PQ scales average between 0.73 and 0.78. Expected Changes: In one in three retests, scores may change by over 0.5 SD (5 T points), and in one in 20 retests, by over 1 SD (10 T points). This variability means PQ scores can fluctuate over relatively short periods. The Five-Factor Model (FFM) of Personality Early personality questionnaires (PQs) like the 16PF, California Psychological Inventory, and Eysenck Personality Questionnaire assessed between 2 and 20 traits, varying widely. Tupes and Christal (1961/1992) used factor analysis to identify five core personality factors. Costa and McCrae later developed the NEO, a five-factor PQ, popularizing the FFM. The "Big Five" traits are widely recognized across cultures, including North America, Europe, Israel, Russia, Japan, and China, suggesting a universal model of personality. The FFM is now the standard for analyzing personality and work behavior. Using PQs in Selection: The value of PQs in hiring has evolved. Dismissed as ineffective in the 1960s and 1970s, they have gained popularity since 1990, with many new PQs emerging. PQs aim to address five key questions: 1. Does the applicant (A) have the right personality for the job? 2. Will A perform the job well? 3. Does A exhibit a positive attitude or organizational citizenship behavior (OCB)? 4. Will A engage in counterproductive work behavior (CWB)? 5. Will A fit and collaborate well within the team? Key Distinction: Question 1 compares A’s traits to the general population. Question 2 evaluates A against the traits of successful and less successful employees in similar roles (e.g., bank managers). The A-S-A Model The A-S-A model explains how certain personalities are drawn to psychology (Attraction), how some are chosen to become psychologists (Selection), and how others leave because the field doesn’t suit them (Attrition). The Perfect Profile Approach Some employers seek ideal personality profiles for roles like manager, salesperson, or engineer. PQ manuals cater to this by providing occupational norms, but this approach has notable limitations: 1. Small Samples: o Sample sizes used to create profiles are often too small, and cross- validation is rarely done. For example, a valid profile for cost accountants would require large, independent samples. 2. Job Performance Overlooked: o Profiles are typically based on people in the role, regardless of their performance quality. 3. Adaptation vs. Prediction: o Profiles may reflect how individuals have adapted to the job, rather than predicting how well someone with that profile would perform. 4. Cloning Effect: o Selecting candidates who closely resemble current employees fosters harmony but may hinder adaptability and innovation in a changing environment. 5. Multiple Success Profiles: o Success in some roles, like management, can be achieved by different personality types, making a single "perfect" profile inadequate. Question 2: Will They Do the Job Well? This question evaluates whether PQs can predict job performance, often measured by supervisor ratings, sales figures, or training grades. Early Reviews: In the 1960s and 1970s, studies cast doubt on PQs: Lent, Aurbach, and Levin (1971) found only 12% of validity coefficients were significant. Guion and Gottier (1965) found just 10%. These findings, combined with Mischel’s (1968) critique of the trait model, led to skepticism about PQs in selection. Guion and Gottier concluded that personality tests could rarely be recommended for hiring decisions. Modern Meta-Analyses: Since 1990, numerous meta-analyses have reevaluated PQs within the Five-Factor Model (FFM) framework, including studies by Hough (1992, 1998), Barrick & Mount (1991), and others. Barrick, Mount, and Judge (2001) summarized these in a "meta-meta-analysis," showing conscientiousness as the strongest predictor of work performance, followed by low neuroticism and extraversion. Openness and agreeableness showed near-zero correlations. Critics like Morgeson et al. (2007) note PQs rarely correlate with performance above 0.20, falling short of the "0.30 barrier." Schmidt, Shaffer, and Oh (2008) confirmed that correcting for range restriction doesn’t significantly improve PQ validity, unlike for mental ability tests. Occupational Differences: Barrick et al.’s meta-meta-analysis highlighted small differences across job types: Low neuroticism is more predictive in police work and skilled or semi-skilled jobs. Extraversion predicts success in management and sales (Hough, 1998; Vinchur et al., 1998). Agreeableness correlates with performance in roles requiring teamwork and cooperation (Mount, Barrick & Stewart, 1998) but less so in jobs involving only client interaction, like hotel work. Openness and agreeableness show stronger correlations in customer service roles (Hurtz & Donovan, 2000). Emotional Labour: Some roles require managing emotions, such as delivering difficult news (e.g., a failed vehicle inspection). Joseph and Newman (2010) tested whether emotional labour moderates PQ validity, but found no confirmation. Conclusion: While conscientiousness and certain traits show predictive value, PQs generally provide limited correlations with job performance, particularly compared to mental ability tests. Specific roles and contexts may influence their effectiveness. Themes in Work Performance Table 7.4 summarizes research on personality and leadership, expatriate work, and success as an entrepreneur. Leadership For some jobs, the ability to lead others is critical. Judge et al. (2002) conducted a meta‐analysis on leadership and the Five-Factor Model (FFM) of personality, finding modest correlations between leadership and low neuroticism as well as extraversion. In civilian management roles, openness correlated positively with leadership, while in military and government roles, conscientiousness was a stronger positive correlate (as shown in Table 7.4). Hoffman et al. (2011) also conducted a meta‐analysis, comparing distal factors such as personality, which can be selected for, with proximal factors such as interpersonal skills, which can be developed. Their findings revealed both types of factors are equally strongly linked to leadership. This evidence supports both the "Great Man" theory (leaders are rare, born rather than made, and predominantly male) and the "leadership as a role" theory, which suggests that anyone can assume leadership roles. Hoffman et al. also observed that personality traits were more strongly linked to leadership in lower‐level managers compared to top managers. They suggested this discrepancy might occur because success in top leadership roles often depends on external events beyond the leader’s control. Expatriate Work: Mol et al. (2005) analyzed 12 studies and found expatriate performance was modestly linked to extraversion and conscientiousness, weakly linked to agreeableness and low neuroticism, and surprisingly unrelated to openness. Entrepreneurship: Entrepreneurs, compared to managers, score higher on extraversion, openness, and conscientiousness, but lower on neuroticism and agreeableness (Zhao & Seibert, 2006). The conscientiousness difference is primarily in achievement orientation rather than dependability. The overall correlation between personality and entrepreneurial status is relatively strong (0.37). Combat Effectiveness: Low neuroticism improves combat effectiveness, as shown by Hough (1998) in the US and Salgado (1998) in Europe. Research on conscientiousness is less clear: Mount and Barrick (1995) found a strong correlation (0.47), but Hough found weaker links, particularly when breaking conscientiousness into subtraits like achievement and dependability. Different Measures of Work Performance Barrick et al. (2001) identified three types of work performance: supervisor ratings, training grades, and objective measures like sales (see Table 7.5). Extraversion had little connection to overall performance but did relate to training performance, possibly because extraverts enjoy new experiences, meeting people, or time away from regular work. Openness also correlated with training performance, suggesting open-minded people enjoy learning new skills. Conscientiousness correlated with all three measures, leading Barrick et al. to suggest it might be the key personality trait for motivation in work performance, though correlations rarely exceed 0.20. Vocational Interest Questionnaires (VIQs) VIQs, or career interest tests, are designed to help people find suitable work rather than assist employers in hiring. Early reviews, such as Hunter and Hunter (1984), concluded VIQs were not useful for selection because they measure interest in a job, not the ability to perform it. A meta-analysis by Van Iddekinge et al. (2011) supported these findings: VIQs showed weak correlations with work performance (0.14) and turnover, though training performance had a slightly better correlation (0.26). They also examined "uninterests," where poor fit based on VIQs predicted worse performance or higher turnover, but the correlation strength was similar, just reversed in sign. Incremental Validity Conscientiousness and mental ability both predict work performance but are unrelated to each other. This means they provide incremental validity for each other. Schmidt and Hunter (1998) estimated that conscientiousness adds 18% incremental validity to mental ability and ranked this combination among the four most effective predictors. Salgado (1998), analyzing European data, confirmed this finding and suggested that neuroticism also adds incremental validity. Direction of Cause Self-efficacy, the belief in one’s ability to perform a job or learn its skills, is often linked to work success. However, skeptics argue that self-efficacy might reflect past success rather than predict future performance. Sitzmann and Yeo (2013) found that when past performance is accounted for, the 0.23 correlation between self-efficacy and work performance becomes insignificant, suggesting self-efficacy describes past achievements rather than forecasting future ones. Conclusion Personality questionnaires (PQs) are not highly effective at predicting job performance. Ability tests may be better suited for this purpose. Despite decades of research, the results for PQs remain unimpressive compared to early findings from the 1960s and 1970s. However, advocates argue that even low predictive validity can improve workforce performance when applied across large groups, especially since PQs are quick and inexpensive to administer. Question 3: Has He/She Got a Good Attitude to Work? Organizational Citizenship Since the 1990s, interest in work performance has expanded beyond output or supervisor ratings to include broader outcomes like attitude. Employers value employees who are keen, cooperative, and helpful—referred to as organizational citizenship behavior (OCB) or contextual performance. In the U.S., this shift is also driven by equal opportunity concerns, as ability-based measures like written tests or training grades can create adverse impact issues. Measuring OCB through tools like personality questionnaires (PQs) may help reduce these disparities. Work performance can be divided into "can do" (ability and proficiency) and "will do" (motivation and attitude). Table 7.6 shows that attitude-related performance is linked to conscientiousness, like task performance, but may have stronger ties to neuroticism (low) and agreeableness. Links to extraversion vary: it appears connected to attitudes toward co-workers (e.g., teamwork and interpersonal facilitation) but less so to attitudes toward the job itself (e.g., dedication). Validation Studies The U.S. military's PQ, the Assessment of Background and Life Experiences (ABLE), was better at predicting "will do" criteria—effort and leadership, personal discipline, and physical fitness—than "can do" criteria like technical proficiency (Hough et al., 1990; McHenry et al., 1990). The uncorrected correlation for "will do" criteria was 0.16. Mount and Barrick (1995) conducted a meta-analysis on conscientiousness and "will do" criteria such as reliability, effort (e.g., hard work and persistence), and quality. They found corrected correlations of 0.40 to 0.50, with an overall validity of 0.45 for "will do" criteria compared to 0.22 for "can do" criteria. Question 4: Will He/She Behave Badly at Work? Counterproductive Work Behavior (CWB) Personality questionnaires (PQs) can be used not to select the best candidates but to screen out problematic ones, similar to a driving test. Anderson (1929) found 20% of Macy’s employees fell into a "problem" category due to issues like poor health, trouble with colleagues, or inability to adjust to work. CWBs cover a range of misbehaviors, including rule-breaking, theft, violence, drug use, and lateness. Several meta-analyses have examined these behaviors (see Table 7.7). Violence and Aggression Negligent hiring claims, such as a case where Avis was sued for hiring an employee with a history of violence, have increased in the U.S. Hershcovis et al. (2007) found trait anger strongly correlates with workplace aggression, especially toward people (0.43). Negative affectivity, related to neuroticism, shows weaker links to aggression and overlaps with trait anger. Their definition of aggression includes behaviors like rudeness and retaliation. Accidents and Safety Beus et al. (2015) linked conscientiousness and agreeableness to safer workplace behavior and fewer accidents. Extraversion and neuroticism, on the other hand, correlate with riskier behavior and more accidents, reinforcing the value of conscientiousness. Absence Absenteeism is a costly issue for employers. Salgado (2002) reviewed studies but found no strong correlations between absenteeism and any of the Big Five traits. However, separating voluntary from involuntary absence remains a challenge in research. Turnover Turnover, often seen as counterproductive, can stem from various reasons, including career advancement or dissatisfaction with the job. Zimmerman (2008) found that poor adjustment and impulsivity contribute to turnover. Law-Abiding Behavior Hough (1992) reported strong correlations between PQs and law-abiding behavior, but these may be inflated by comparing criminals to non-criminals. While personality differences between these groups are evident, causal direction is unclear. For example, convicted criminals may struggle to present themselves as honest. More convincing studies include correlations between PQ scores upon joining the army and subsequent absence without leave. Berry et al. (2007) noted high correlations between personality and deviance, though these may partly result from common method variance, as both are typically self-reported. Personality Questionnaires and Workplace Misbehavior Personality questionnaires (PQs) are strongly linked to tendencies to misbehave at work, with patterns of high neuroticism, low agreeableness, and low conscientiousness. However, these links do not necessarily translate to better predictions of workplace behavior because much of the research relies on concurrent validation and self-reported misbehavior. More research is needed to confirm whether PQs can reliably predict future misbehavior. The Saga of Honesty Tests (HTs) Employee theft is a significant issue in the U.S., contributing to up to 30% of company failures. Honesty tests (HTs), also known as integrity tests, became popular in the U.S. after the 1988 restrictions on polygraph use. HTs have not been widely used in Europe. These tests include questions about attitudes toward honesty and admissions of past dishonesty. Table 7.8 provides examples of typical HT questions. HTs have been validated using polygraph results, till shortages, or low takings (assumed to indicate theft). Ones, Viswesvaran, and Schmidt (1993) conducted a meta-analysis of 665 HT studies, covering 576,460 individuals, and revealed several important findings: HTs predict counterproductive work behaviors (CWBs) very well, with an operational validity of 0.47. Validity is higher for self-reported CWBs than for recorded CWBs. HTs are versatile, predicting not only CWBs but also workplace accidents, property damage, training performance, and output. HTs predict general work performance surprisingly well, with an operational validity of 0.34—higher than that of Five-Factor Model (FFM) PQs. Criticisms of Honesty Tests (HTs) Berry, Sackett, and Wiemann (2007) raised concerns about the validity of HTs. Many HTs include questions asking respondents to admit past dishonesty, which are then used to validate the rest of the test. This creates a weak form of validation, essentially asking the same question twice. Given this method, correlations of only 0.30 to 0.40 seem surprisingly low. Studies using behavioral measures of dishonesty are much fewer than the 665 reviewed by Ones et al. (1993), and only seven studies, covering 2,500 people, measured actual theft. These studies showed a much lower corrected validity of just 0.13. McDaniel, Rothstein, and Whetzel (2006) performed trim-and-fill analyses to identify reporting bias in data from four unnamed U.S. test vendors. They found evidence suggesting some datasets understated limitations in validity, questioning the reliability of past meta-analyses, including Ones et al.’s (1993) work. Additionally, Campion (Morgeson et al., 2007) noted that some HT publishers showed significant publication bias, avoiding or ignoring studies with negative findings. A New Meta-Analysis Van Iddekinge et al. (2012a) conducted a re-analysis of HTs nearly 20 years after Ones et al., with these key findings: 1. HTs predict counterproductive work behaviors (CWB) better (0.26) than work performance (0.16). 2. Research by test publishers shows better results for work performance but worse results for CWB compared to independent studies, highlighting no consistent pattern. 3. Despite two decades of research growth, Van Iddekinge et al. found no significant increase in available studies. Of 30 test publishers contacted, only two fully cooperated. Others declined to participate, imposed conditions, or failed to respond. One test publisher even claimed legal restrictions on technical reports used by Ones et al. (1993), barring Van Iddekinge et al. from using this data. As a result, they could not replicate much of the earlier meta-analysis, limiting their re-analysis. Sackett and Schmitt's (2012) Review of Honesty Tests (HTs) Sackett and Schmitt (2012) revisited the surprising claim from Ones et al.’s (1993) meta-analysis that HTs predict general work performance better than traditional personality questionnaires (PQs). They focused on methodologically superior studies that used predictive designs, real job applicants, and non-self-reported outcomes. Their findings: Ones et al. reported a corrected validity of 0.35 for HTs, higher than conventional PQs. Van Iddekinge et al. reported a much lower validity of 0.11, smaller than PQs. Both analyses used only 20 samples (not overlapping) with total sample sizes of 7,000–8,000, but the broader dataset in Ones et al.’s meta-analysis included many less useful studies. Sackett and Schmitt highlighted the challenges of using flawed research, such as small samples, inadequate criteria, and poor reporting. They proposed a fresh approach: a large-scale, multi-test, multi-organization study to establish the credibility of HTs, emphasizing the need to exclude poor-quality research in meta-analyses. Question 5: Will the Team Work Well? While selection research typically focuses on individual employees, teamwork is increasingly common, and personality plays a crucial role. Bell’s (2007) meta-analysis of 11 studies on work team performance and the Five-Factor Model (FFM) examined three aspects: average team personality, personality variation within teams, and extremes. Key findings: 1. Average Team Personality: o Higher team averages for conscientiousness and agreeableness predict better performance. o Agreeableness, which shows little correlation with individual performance, becomes crucial in team settings. 2. Personality Variation: o Greater variability in conscientiousness and openness within teams is linked to poorer performance. 3. Extremes (Bad Apple Effect): o Teams perform better when the lowest individual scores for conscientiousness and agreeableness are higher. o A single low-scoring team member in these traits can negatively impact the entire group. Complexities of PQ Validity Many people respond to research on personality and work performance by noting the complexities that may have been overlooked or could explain inconsistent results. While such complexities are easy to propose, they are challenging to resolve. Three key examples illustrate this point: Linearity Le et al. (2011) argue that relationships between personality traits and work performance may be non-linear: 1. Conscientiousness: o While low to average levels of conscientiousness correlate with better performance, excessively high levels can lead to rigidity, potentially lowering performance. o This effect may vary by job complexity: Low-complexity jobs prioritize speed over accuracy, where high conscientiousness might hinder performance. High-complexity jobs require accuracy and persistence, where high conscientiousness is more beneficial. 2. Neuroticism (or Emotional Stability): o A moderate level of emotional stability is needed to avoid distraction, but additional stability beyond this point provides no extra benefit. Le et al. confirmed both hypotheses in two large samples, demonstrating the importance of context and the possibility of diminishing returns for certain traits. Quadrant Analysis Some American employers historically viewed union membership as a sign of maladjustment and used personality questionnaires (PQs) to screen out such "misfits" (Zickar, 2001). Parkes and Razavi (2004) examined personality traits related to union membership using the Eysenck PQ: Neither extraversion nor neuroticism alone predicted union membership. Low extraversion combined with low neuroticism predicted not joining a union, suggesting stable introverts may not seek the reassurance or social opportunities unions provide. This approach exemplifies quadrant analysis, which categorizes people into four groups based on two personality dimensions (see Figure 7.1). However, quadrant analysis has a key limitation: it clashes with normal distributions. Most people score near the middle of personality traits and do not fit neatly into categories like extravert/introvert or neurotic/stable. On retesting, many may shift categories, reducing reliability. Situational Strength Mischel (1968) introduced the concept of situational strength to explain why personality often matters less than expected. In strong situations, external demands dictate behavior, leaving little room for personality to influence actions. For example: In a university lecture, students are expected to sit quietly and take notes, while the lecturer is expected to stand and speak, with behaviors driven by roles, not personality. In weak situations, where rules and expectations are unclear, personality plays a larger role in influencing behaviors, such as deciding who speaks first and what they say. The key question is whether work is a strong situation (where personality has minimal impact and may not need to be considered in selection) or a weak one (where personality could significantly influence behavior). Meyer, Dalal, and Bonaccio (2009) used 14 items from the O*NET work analysis system to assess situational strength in various jobs. Examples include responsibility for others’ health and safety or decision-making requirements. Key findings: High situational strength jobs include airline pilot, nuclear equipment operation technician, and subway operator. Low situational strength jobs include hairdresser, lyricist, creative writer, and personnel recruiter (despite legal compliance responsibilities). Their results confirmed that the link between conscientiousness and work performance is weaker in strong situations (0.09) and stronger in weak situations (0.23). Improving PQ Validity Personality questionnaires (PQs) often give weak predictions of workplace behavior compared to mental ability tests. Efforts to improve validity have focused on changing question formats and increasing the relevance of traits measured. Contextualization (Frame of Reference) Most PQs assess general personality, but employers need to know workplace-specific behaviors. General assessments can mislead if responses reflect behavior outside work. Robie et al. (2000) developed a contextualized PQ that adds "at work" to every question (e.g., "I take great care with detail at work"). Contextualization raised scores, possibly because people behave more carefully at work, necessitating new normative data. Shaffer and Postlethwaite (2012) conducted a meta-analysis on contextualized PQs, finding: o A 33% increase in validity for conscientiousness. o Significantly higher validity for neuroticism, extraversion, openness, and agreeableness. This small phrasing adjustment greatly improves validity. Another approach, forced- choice formats, aims to reduce faking and will be discussed further. More Than Five Scales – FFM Facets Some analyses, like Hough (1992), split the Five-Factor Model (FFM) traits into facets to capture more detailed trends. For example, conscientiousness can be divided into achievement and carefulness, which have different implications for work performance. Judge et al. (2013) conducted a meta-analysis of FFM facets and found: Each factor's six facets have varying correlations with work performance, differing by 0.15 on average. For example, the positive emotion facet of extraversion correlates 0.20 with work performance, while excitement-seeking shows no correlation. This suggests facets may provide more predictive value than the broader factors, opening possibilities for customized FFM models focusing only on relevant facets for specific jobs. Less Than Five? The FFM as a Whole The multiple correlation of all five FFM traits should theoretically predict work performance better than individual traits. Ones et al. (2007) estimated this value at 0.27 for Barrick et al.’s meta-analysis, still falling below the 0.30 benchmark. Some research supports a two-factor model: Plasticity (openness and extraversion). Stability (agreeableness, conscientiousness, and low neuroticism). The Dark Side of Personality Some research focuses on traits to avoid, particularly the dark triad: Machiavellianism, narcissism, and psychopathy. Machiavellianism: Manipulative, strategic lying, and ends-justify-means thinking. Narcissism: Inflated self-view, fantasies of control and success, and entitlement. Psychopathy: Impulsivity, thrill-seeking, low empathy, and low anxiety. Key findings: O’Boyle et al. (2012) found the dark triad generally unrelated to work performance. However, managers high in narcissism showed much poorer performance. The dark triad is linked to counterproductive work behaviors (CWBs), especially for narcissism, but less so for psychopathy. Kish-Gephart et al. (2010) found Machiavellianism correlates 0.27 with unethical decision-making. Challenges in Measuring the Dark Triad Dark triad traits may be difficult to assess with self-report PQs due to inherent biases: o Narcissists lack self-awareness. o Machiavellians lie strategically. o Psychopaths may not understand truth. Some dark triad PQs include questions directly linked to CWBs (e.g., arrests or troublemaking), creating circularity in results. Other Considerations Research on the dark triad is extensive and overlaps with studies of personality disorders, which are part of the U.S. psychiatric diagnostic system and subject to disability discrimination laws. Alternative, less clinical terms for the dark triad include "sceptical," "bold," and "mischievous" (Spain et al., 2014). PQs, Law, and Fairness Legal Challenges and Popularity Personality questionnaires (PQs) have faced surprisingly little legal trouble compared to mental ability tests, partly because they create less adverse impact, which contributes to their popularity. Gender Differences Men often report being more forceful and competitive, while women report being more caring and anxious (Hough, Oswald, & Ployhart, 2001). Gender differences pose a dilemma: o In the U.S., using separate norm tables for men and women is prohibited by the Civil Rights Act of 1991. o Using pooled norms may create adverse impact, such as a T = 60 dominance cut-off excluding more women than men. Ethnicity Foldes, Duehr, and Ones (2008) found that most personality scales show negligible differences across ethnic groups, reducing the likelihood of adverse impact. Exceptions: o White Americans scored higher than African Americans on sociability (d = 0.39). o European studies show few white/Afro differences but some white/Asian or white/Chinese differences favor minorities, especially in conscientiousness. o Dutch research found some immigrant groups scored higher on neuroticism and lower on extraversion. Age Significant differences occur with age; for example, extraversion declines steadily (d = 0.88) between ages 18 and 65 (Cook et al., 2007). Disability The Americans with Disabilities Act (ADA) restricts health-related questions before a job offer, which affects PQs: o Health-related or psychiatric-sounding items (e.g., neuroticism) are often removed or rephrased. o Emotional stability is a more acceptable alternative to neuroticism. Starting Afresh Attempts to edit out unwanted group differences often fail. Gill and Hodgkinson (2007) created a new Five-Factor Model PQ by pre- screening 533 items for gender, ethnicity, age, and education differences, keeping only unbiased items. The PQ is freely available for research and some commercial use. Privacy Concerns Some items may be too intrusive for selection, such as those about politics or religion. In the Soroka vs. Dayton-Hudson case, items deemed intrusive led to legal action. Although settled out of court, the case highlighted potential privacy issues. The Danger of Multi-Score PQs Most PQs assess all included traits regardless of relevance. For example, the 16PF evaluates 16 factors, even if only a few are job-related. Using irrelevant scores in selection decisions can lead to legal challenges. Assessing weaknesses or traits like psychopathic deviance or fantasy aggression can anger candidates and harm the employer’s case in disputes. Solution: Lewis Goldberg’s public-domain scale bank allows for creating job-specific PQs tailored to work analysis, avoiding irrelevant assessments. However, users must generate their own normative data. Highhouse et al: Chapter 6: Predicting future performance Unreliability Unreliability in either variable reduces the observed correlation, and this effect is systematic and correctable. Predictor unreliability: A constant issue in both research and decision-making contexts. Criterion unreliability: Influences research findings but not individual decisions, so corrections should only account for this. Key actions to maintain realism: 1. Overestimate criterion reliability to avoid overcorrecting, as unrealistically high corrections can lead to misleading conclusions. 2. Correct only statistically significant coefficients, as adjusting near-zero correlations is bad practice and can be highly misleading. Professional guidelines recommend reporting both corrected and uncorrected correlations to ensure transparency. Reduced Variance (Restriction of Range) When the variance of either variable is substantially lower in the sample than in the population, the sample correlation underestimates the population validity. Restriction of range occurs when one or both variables are truncated, narrowing the variability. This is often visualized in scatterplots: o High correlations create narrow ellipses relative to their length. o Low correlations create wider ellipses. o Removing the ends of the ellipse reduces variance and flattens the corr