Chapter 1: An overview of assessment: Definition and scope PDF
Document Details
Uploaded by Deleted User
Cheryl Foxcroft and Gert Roodt
Tags
Summary
This chapter provides an overview of assessment, introducing key terms and characteristics of assessment measures. It also discusses the multidimensional nature of the assessment process and explores the limitations of assessment results.
Full Transcript
Chapter 1 An overview of assessment: Definition and scope CHERYL FOXCROFT AND GERT ROODT CHAPTER OUTCOMES By the end of this chapter you will be able to: › distinguish between tests, assessment measures, testing and psychological and competency-b...
Chapter 1 An overview of assessment: Definition and scope CHERYL FOXCROFT AND GERT ROODT CHAPTER OUTCOMES By the end of this chapter you will be able to: › distinguish between tests, assessment measures, testing and psychological and competency-based assessment › name the characteristics of assessment measures › recognise the limitations of assessment results › explain assessment as a multidimensional process. 1.1 Introduction We constantly have to make decisions in our everyday lives: what to study, what career to pursue, whom to choose as our life partner, which applicant to hire, how to put together an effective team to perform a specific task, and so on. This book will focus on the role of psychological assessment in providing information to guide individuals, groups, and organisations to understand aspects of their behaviour and make informed and appropriate decisions. In the process you will discover that assessment can serve many purposes. For example, assessment can help to identify strengths and weaknesses, map development or progress, inform decisions regarding suitability for a job or a field of study, identify training and education needs, or it can assist in making a diagnosis. Assessment can also assist in identifying intervention and therapy needs, measuring the effectiveness of an intervention programme, and in gathering research data to increase psychology’s knowledge base about human behaviour or to inform policy-making. As the South African society is multicultural and multilingual in nature, this book also aims to raise awareness about the impact of culture and language on assessment and to suggest ways of addressing them. As we begin our journey in this chapter you will be introduced to key terms, the characteristics of assessment measures, the multidimensional nature of the assessment process, and the limitations of assessment results. 1.2 About tests, testing, and assessment One of the first things that you will discover when you start journeying through the field of psychological assessment is that many confusing and overlapping terms are used. As you travel through the Foundation Zone of this book, you must try to understand the more important terms and think about how they are interlinked. In essence, tools are available to make it possible for us to assess (measure) human behaviour. You will soon realise that various names are used to refer to these tools such as tests, measures, assessment measures , instruments, scales, procedures, and techniques. To ensure that the measurement is valid and reliable, a body of theory and research regarding the scientific measurement principles that are applied to the measurement of psychological characteristics has evolved over time. This subfield of psychology is known as psychometrics, and you will often hear of the psychometric properties of a measure or the term 19 psychometric measure. Psychometrics refers to the systematic and scientific way in which psychological measures are developed and the technical measurement standards (e.g. validity and reliability) required of such measures. When we perform an assessment, we normally use assessment tools as well as other information that we obtain about a person (e.g. school performance, qualifications, life and work history, family background). Psychological assessment is a process-orientated activity aimed at gathering a wide array of information by using psychological assessment measures (tests) and information from many other sources (e.g. interviews, a person’s history, collateral sources). We then evaluate and integrate all this information to reach a conclusion or make a decision. Seen from this perspective, testing (i.e. the use of tests, measures), which involves the measurement of behaviour, is one of the key elements of the much broader evaluative process known as psychological assessment. Traditionally, psychology professionals have performed all aspects of the assessment process (test selection and administration, scoring, interpreting and reporting/providing feedback). Furthermore, as the outputs of psychological assessment are in the form of psychological traits/constructs (such as personality and ability), the expertise to perform psychological assessment is a core competence of an appropriately registered psychology professional. However, in the modern testing era, there has been a shift in the role of the psychology professional in the assessment process in the work and organisational context in particular (Bartram, 2003). This shift has been due to the advent of both Internet-delivered computer-based tests (see Chapter 14) and competency-based assessment (see section 2.2.5 which focuses on assessing the skills, behaviours, attitudes/values required for effective performance in the workplace or educational/training settings, the results of which are directly linked to the competency language of the workplace or educational setting). These two factors have redefined who a test user is and the role of the psychology professional in assessment, given that computer-based tests require little, if any, human intervention in terms of administration and scoring, and computer-generated test interpretation and reports are often available. This results in some test users in the workplace only needing the skill to use and apply the competency-based outputs from a test, which does not require extensive psychological training. There is much controversy around the shifting role of the psychology professional and the role of non-professionals in assessment in some areas of psychology. This controversy will be debated further in Chapters 2, 8 and 14 and has even been raised in legal cases, e.g. Association of Test Publishers of South Africa and Another, Savalle and Holdsworth South Africa v. The Chairperson of the Professional Board of Psychology , unreported case number 12942/09 North Gauteng High Court, 2010). Having summarised the relationship between tests, testing, and assessment, let us now consider each aspect in more depth. However, you should be aware that it is almost impossible to define terms such as ‘test’, ‘testing’, ‘assessment measure’, and ‘assessment’. Any attempt to provide a precise definition of ‘test’ or of ‘testing’ as a process, is likely to fail as it will tend to exclude some procedures that should be included and include others that should be excluded (ITC International Guidelines for Test-use, Version 2000, p. 8). Instead of focusing on specific definitions, we will highlight the most important characteristics of assessment measures (tests) and assessment. 1.3 Assessment measures and tests 1.3.1 Characteristics of assessment measures Although various names are used to refer to the tools of assessment, this book will mainly use assessment measure and test. Preference is given to the term assessment measure as it has a broader connotation than the term test, which mainly refers to an objective, standardised measure that is used to gather data for a specific purpose (e.g. to determine what a person’s intellectual capacity is). The main characteristics of assessment measures are as follows: Assessment measures include many different procedures that can be used in psychological, occupational, 20 and educational assessment and can be administered to individuals, groups, and organisations. Specific domains of functioning (e.g. intellectual ability, personality, organisational climate) are sampled by assessment measures. From these samples, inferences can be made about both normal and abnormal (dysfunctional) behaviour or functioning. Assessment measures are administered under carefully controlled (standardised) conditions. Systematic methods are applied to score or evaluate assessment protocols. Guidelines are available to understand and interpret the results of an assessment measure. Such guidelines may make provision for the comparison of an individual’s performance to that of an appropriate norm group or to a criterion (e.g. a competency profile for a job), or may outline how to use test scores for more qualitative classification purposes (e.g. into personality types or diagnostic categories). Assessment measures should be supported by evidence that they are valid and reliable for the intended purpose. This evidence is usually provided in the form of a technical test manual. Assessment measures are usually developed in a certain context (society or culture) for a specific purpose and the normative information used to interpret test performance is limited to the characteristics of the normative sample. Consequently, the appropriateness of an assessment measure for an individual, group, or organisation from another context, culture, or society cannot be assumed without an investigation into possible test bias (i.e. whether a measure is differentially valid for different subgroups) and without strong consideration being given to adapting and re-norming the measure. In Chapter 7 you can read about cross-cultural test adaptation. Furthermore, in the historical overview of assessment provided in Chapter 2, you will see what a thorny issue the cross-cultural transportability of measures has become, especially in the South African context. Assessment measures vary in terms of: – how they are administered (e.g. group, individual, or on computer) – whether time limits are imposed. In a speed measure there is a large number of fairly easy items of a similar level of difficulty. These need to be completed within a certain time limit, with the result that almost no one completes all the items in the specified time. In power measures on the other hand, time limits are not imposed so that all test-takers may complete all the items. However, the items in a power measure get progressively more difficult – how they are scored (e.g. objectively with scoring masks or more subjectively according to certain guidelines) – how they are normed (e.g. by using a comparison group or a criterion) – what their intended purpose is (e.g. screening versus diagnostic, competency-based testing) – the nature of their items (e.g. verbal items, performance tasks) – the response required from the test-taker (e.g. verbally, via pencil-and-paper, by manipulating physical objects, by pressing certain keys on a computer keyboard) – the content areas that they tap (e.g. ability or personality-related). All the characteristics of measures outlined above will be amplified throughout this book. Furthermore, various ways of classifying assessment measures are available. In Chapter 8 we discuss the classification system used in South Africa as well as the criteria used in deciding whether a measure should be classified as a psychological test or not. You will also notice that the different types of measures presented in Chapters 10 to 14, have largely been grouped according to content areas. 1.3.2 Important issues There are two important aspects about assessment results that you should always keep in mind. Firstly, test results represent only one source of information in the assessment process. Unfortunately, assessment measures, because they offer the promise of objective measurement, often take on magical proportions for assessment practitioners who begin to value them above their professional judgement or opinion. Lezak (1987) reminds us that psychological assessment measures: 21 are simply a means of enhancing (refining, standardising) our observations. They can be thought of as extensions of our organs of perception … If we use them properly … they can enable us to accomplish much more with greater speed. When tests are misused as substitutes for, rather than extensions of, clinical observation, they can obscure our view of the patient (Lezak, 1987, p. 46). Secondly, we need to recognize the approximate nature of assessment (test) results. Why? The results obtained from assessment measures always need to be bracketed by a band of uncertainty because errors of measurement creep in during administration, scoring, and interpretation. Furthermore, the social, economic, educational and cultural background of an individual can influence his/her performance on a measure to the extent that the results present a distorted picture of the individual (see Chapter 17). Thorndike and Hagen (1977), however, point out that it is not just in the field of psychology where assessment information may be subject to error. The teacher’s perception of how well a child reads, the medical doctor’s appraisal of a person’s health status, the social worker’s description of a home environment, and so on, all represent approximate, informed, yet somewhat subjective, and thus potentially fallible (incorrect) opinions. 1.4 The assessment process The assessment process is multidimensional in nature. It entails the gathering and synthesising of information as a means of describing and understanding functioning. This can inform appropriate decision-making and intervention. The information-gathering aspect of the assessment process will be briefly considered. ‘Test performance in a controlled clinic situation with one person is not a representative sample of behavior!’ (Bagnato and Neisworth, 1991, p. 59). Information-gathering itself must be multidimensional. Table 1.1 highlights the varied aspects across which information could be purposefully gathered, with the purpose of the assessment determining which aspects are appropriate. By gathering a wide array of data in the assessment process, a richer and broader sampling of behaviour or functioning can be achieved. However, the assessment battery (i.e. the combination of measures used) must be tailored to an individual, group, or organisation’s needs (e.g. in terms of their age, level of ability and disability, capacity, job analysis) as well as to the purpose of the assessment. You can read more about the tailoring of assessment and the selection of tests for a battery in Chapter 9. After gathering the information, all the information must be synthesised, clustered together, and weighed up to describe and understand the functioning of an individual, group or organisation. Based on such descriptions, predictions can be made about future functioning, decisions can be made, interventions can be planned, and progress can be mapped, among other things. It is particularly in the synthesis and integration of assessment information that much skill and professional judgment is required to identify the underlying patterns of behaviour and to make appropriate deductions. This is why you need to draw on all your knowledge from all areas of psychology, and not just from the field of assessment, when you perform a psychological assessment. 22 Table 1.1 Multidimensional information-gathering SOURCES OF EXAMPLES INFORMATION Multiple measures Different types of assessment measures such as norm-based and criterion-referenced tests, interviews, behavioural observation, rating scales completed by teachers or supervisors, and ecologically-based measures that describe the social or occupational context of an individual could be used. Multiple domains The following could be assessed, for example: attention; motor, cognitive, language-related, non-verbal, and personality-related functioning; interest; scholastic achievement; and job performance. Multiple sources Consult with other professionals, teachers, parents, extended family members, and employers. Multiple settings Assessment could take place in a variety of settings (e.g. home, school, work, consulting rooms) and social arrangements (e.g. one-to-one, with peers, with parents present) to get as broad a perspective as possible of a person’s functioning and the factors that influence it. Multiple occasions For assessment to be relevant, valid, and accurate, patterns of functioning have to be identified over a period of time and not merely in a single assessment session. However, it is important to recognise the limits of human wisdom when reaching opinions based on assessment information. Why? When assessment practitioners synthesize and integrate assessment information, they do so in as professional and responsible a way as possible, using all the wisdom at their disposal. At the end of this process, they formulate an informed professional opinion. Even though it is an informed opinion, it is nonetheless an opinion which may or may not be correct. The assessment practitioner may be fairly certain that the correct opinion has been arrived at, but absolute certainty is not possible. Nonetheless, increasing emphasis is being placed on the consequences of the outcomes of assessment for individuals and organisations and the responsibility of the psychology professional in this regard (see Chapter 8). For example, if a wrong selection decision is made when hiring a senior level employee, possible negative outcomes for an organisation could be: added financial risk, lowering of effectiveness and efficiency, bad publicity and legal action (related to unfair labour practices if the employee is retrenched/fired). CRITICAL THINKING CHALLENGE 1.1 In your own words, explain what the assessment process involves and where assessment measures fit into the picture. As we conclude this section, another crucial aspect of the assessment process needs to be highlighted. Can you guess what it is? Various parties, often with competing motives and values, are involved in the assessment process. The person doing the assessment, the person being assessed, and external parties such as employers, education authorities, or parents, all have a stake in the outcome of the assessment. This is why it is important that the assessment practitioner performs assessment in a professional, ethical manner and that the rights, roles, and responsibilities of all the stakeholders involved are recognised and respected. This is will be explored in more detail in Chapter 8, including unpacking the Ethical Rules of Conduct for Professionals Registered under the Health Professions Act, 1974 (Department of Health, 2006) in terms of assessment. 23 CHECK YOUR PROGRESS 1.1 1.1 Define the following terms: Psychometrics Psychological assessment Testing Competency-based assessment Assessment measure Assessment battery. Another way more fun way to check your progress, is to see how you respond to the questions posed after reading through the following case study: ETHICAL DILEMMA CASE STUDY 1.1 Lindiwe, a Grade 12 learner consults a Registered Counsellor (RC) for career guidance. Based on an interview and the results of the Self-Directed Search (SDS) (see Chapter 13, section 13.2.3), the Basic Traits Inventory (section 13.2.5.3) and the Differential Aptitude Tests Form R (Chapter 10, section 10.5.1.1) it is found that Lindiwe has a strong interest in working in the legal field, has an above average verbal ability and has the general ability to succeed at further studies. The RC advises her to pursue a legal career and enrol for a law degree at a university. Lindiwe does so and emails the RC four years later with the news that she has successfully completed her studies and has been accepted to do her articles at a reputable law firm. Questions (a) What would have guided the RC in deciding how to address the referral question of providing Lindiwe with career guidance? (b) In what ways did the RC demonstrate responsible use of assessment measures in this case study? (c) Is there evidence of irresponsible assessment practices in this case study that could have resulted in negative consequences for Lindiwe? 1.5 Conclusion This chapter introduced you to key terms in the field of psychological assessment. You learned about the tools (assessment measures or tests) used in psychological testing and assessment, their characteristics and the psychometric properties (measurement standards) required of them. You further learned that test results are only one source of information in an assessment process and that the conclusions reached from them are always approximate in nature. The chapter concluded with a discussion on the multidimensional nature of the assessment process and the competing values and motives of those involved in the assessment process. 24 Chapter 2 Psychological assessment: A brief retrospective overview CHERYL FOXCROFT, GERT ROODT, AND FATIMA ABRAHAMS CHAPTER OUTCOMES By the end of this chapter you will be able to: › understand how assessment has evolved since ancient times › appreciate the factors that have shaped psychological assessment in South Africa › develop an argument about why assessment is still valued in modern society. 2.1 Introduction At the start of our journey into the field of psychological assessment, it is important to gain a perspective of its origins. This is the focus of this chapter. Without some idea of the historical roots of the discipline of psychological assessment, the great progress made by modern assessment measures cannot be fully appreciated. In this chapter you will also be introduced to some of the key concepts that we will be elaborating on in the Foundation Zone of this book. As you journey through the past with us, you should be on the lookout for the following: how difficult it was in ancient times to find an objectively verifi able way of measuring human attributes (ask yourself the reasons for this) how the most obvious things in the world of ancient philosophers and scientists (such as the human hand, head, and body, as well as animals) were used in an attempt to describe personal attributes the stepping stones that some of the ancient ‘measures’ provided for the development of modern psychological assessment the factors both within and outside of the discipline of psychology that have shaped the development of modern psychological assessment the factors that shaped the development and use of psychological assessment in South Africa. 2.2 A brief overview of the early origins of psychological assessment The use of assessment measures can be traced back to ancient times. One of the first recordings of the use of an assessment procedure for selection purposes can be found in the Bible in Judges Chapter 7, verses 1 to 8. Gideon observed how his soldiers drank water from a river so he could select those who remained on the alert. Historians credit the Chinese with having a relatively sophisticated testing programme for civil servants in place more than 4 000 years ago (Kaplan and Saccuzzo, 2009). Oral examinations were administered every third year and the results were used for work evaluations and for promotion purposes. Over the years, many authors, philosophers, and scientists have explored various avenues in their attempts to assess human attributes. Let us look at a few of these. 2.2.1 Astrology Most people are aware of the horoscopes that appear in daily newspapers and popular magazines. The 25 positions of planets are used to formulate personal horoscopes that describe the personality characteristics of individuals and to predict what might happen in their lives. The origin of horoscopes can be traced back to ancient times, possibly as early as the fifth century BCE (McReynolds, 1986). Davey (1989) concludes that scientists, on the whole, have been scathing in their rejection of astrology as a key to understanding and describing personality characteristics. Do you agree? State your reasons. 2.2.2 Physiognomy McReynolds (1986) credits Pythagoras for being perhaps the earliest practitioner of physiognomy, in the sixth century BCE. Later on, Aristotle also came out in support of physiognomy, which attempted to judge a person’s character from the external features of the body and especially the face, in relation to the similarity that these features had to animals. Physiognomy was based on the assumption that people who shared physical similarities with animals also shared some psychic properties with these animals. For example, a person who looked like a fox was sly, or somebody who looked like an owl was wise (Davey, 1989). What is your view on this? CRITICAL THINKING CHALLENGE 2.1 Many application forms for employment positions or for furthering your studies require that a photograph be submitted. It is highly unlikely that selection and admission personnel use these photographs to judge personal attributes of the applicants, as physiognomists would have done. So why do you think that photographs are requested? Try to interview someone in the Human Resources section of a company, or an admissions officer at an educational institution, to see what purpose, if any, photographs serve on application forms. 2.2.3 Humorology In the fifth century BCE, Hippocrates, the father of medicine, developed the concept that there were four body humours or fluids (blood, yellow bile, black bile, and phlegm) (McReynolds, 1986). Galen, a physician in ancient Rome, took these ideas further by hypothesising four types of temperament (sanguine, choleric, melancholic, and phlegmatic), corresponding to the four humours (Aiken and Groth-Marnat, 2005). The problem with the humoral approach of classifying personality types into one of four categories was that it remained a hypothesis that was never objectively verified. Today the humoral theory mainly has historical significance. However, based on the views of Hippocrates and Galen, Eysenck and Eysenck (1958) embedded the four temperaments within the introversion/extroversion and the emotionally stable/emotionally unstable (neurotic) personality dimensions which they proposed. Of interest is the fact that Eysenck and Eysenck’s (1958) two personality dimensions still form the basis for modern personality measures such as the Myers Briggs Type Indicator and the 16 Personality Factor Questionnaire. You can read more about these measures in Chapter 12. 2.2.4 Phrenology Franz Gall was the founder of phrenology, the ‘science’ of ‘reading people’s heads’ (McReynolds, 1986). Phrenologists believed that the brain consisted of a number of organs that corresponded with various personality characteristics (e.g. self-esteem, cautiousness, firmness) and cognitive faculties (e.g. language, memory, calculation). By feeling the topography of a person’s skull, phrenologists argued that it was possible to locate ‘bumps’ over specific brain areas believed to be associated with certain personality attributes (Aiken and Groth-Marnat, 2005). The fundamental assumptions underlying phrenology were later demonstrated to be invalid in research studies, consequently no one really places any value on phrenology today. 2.2.5 Chirology – Palmistry Bayne asserted that palm creases (unlike fingerprints) can change and he found that certain changes appeared to be related to changes in personality. He also believed that all hand characteristics should be taken into 26 consideration before any valid assessments could be made. However, to this day, no scientific evidence has been found that, for example, a firm handshake is a sign of honesty, or that long fingers suggest an artistic temperament (Davey, 1989). 2.2.6 Graphology Graphology can be defined as the systematic study of handwriting. Handwriting provides graphologists with cues that are called ‘crystallised gestures’ that can be analysed in detail. As handwriting is a type of stylistic behaviour, there is some logic to the argument that it could be seen to be an expression of personality characteristics. Graphologists hypothesise that people who keep their handwriting small are likely to be introverted, modest, and humble, and shun publicity. Large handwriting on the other hand, shows a desire to ‘think big’ which, if supported by intelligence and drive, provides the ingredients for success. Upright writing is said to indicate self-reliance, poise, calm and self-composure, reserve, and a neutral attitude (Davey, 1989). Davey (1989) concluded that efforts of graphologists to establish validity of such claims have yielded no or very few positive results. Although there are almost no studies in which it has been found that handwriting is a valid predictor of job performance, graphology is widely used in personnel selection to this day (Simner and Goffin, 2003). It is especially used in France, but also in other countries such as Belgium, Germany, Italy, Israel, Great Britain, and the United States (US). This has prompted Murphy and Davidshofer (2005) to ask why graphology remains popular. What reasons do you think they unearthed in attempting to answer this question? Murphy and Davidshofer (2005) concluded that there were three main reasons that fuelled the popularity of handwriting analysis in personnel selection: It has high face validity, meaning that to the ordinary person in the street it seems reasonable that handwriting could provide indicators of personality characteristics, just as mannerisms and facial expressions do. Graphologists tend to make holistic descriptions of candidates such as ‘honest’, ‘sincere’, and ‘shows insight’, which, because of their vagueness, are difficult to prove or disprove. Some of the predictions of graphologists are valid. However, Murphy and Davidshofer (2005) cite research studies which reveal that the validity of the inferences drawn by the graphologists was related more to what they gleaned from the content of an applicant’s biographical essay than the analysis of the handwriting! Despite having found reasons why graphology continues to be used in personnel selection, Murphy and Davidshofer (2005) concluded that, all things considered, there is not sufficient evidence to support the use of graphology in employment testing and selection. Simner and Goffin (2003) concur with this and argue that the criterion-related validity of graphology is lower and more variable than that of more widely known and less expensive measures. For example, whereas the criterion-related validity of graphology varies between.09 and.16 (Simner and Goffin, 2003), the criterion-related validity of general mental testing and structured interviews in job selection has been found to be.51, and when used in combination, the validity coefficient increases to.63 (Schmidt and Hunter, 1998). Simner and Goffin (2003) thus caution that the continued use of graphology for personnel selection could prove to be costly and harmful to organisations. 2.2.7 Summary All the avenues explored by the early philosophers, writers, and scientists did not provide verifiable ways of measuring human attributes. The common thread running through all these attempts (but probably not in the case of graphology), is the lack of proper scientific method and, ultimately, rigorous scientific measurement. 2.3 The development of modern psychological assessment: An international perspective 2.3.1 Early developments Psychology has only started to prosper and grow as a science since the development of the scientific method. 27 Underlying the scientific method is measurement. Guilford stated as long ago as 1936 that psychologists have adopted the motto of Thorndike that ‘whatever exists at all, exists in some amount’ and that they have also adopted the corollary that ‘whatever exists in some amount, can be measured’. It was perhaps the development of objective measurement that made the greatest contribution to the development of Psychology as a science. During the Italian Renaissance Huarte’s book was translated into English as The Tryal of Wits (1698). This book was a milestone in the history of assessment, because for the first time someone proposed a discipline of assessment, gave it a task to do, and offered some suggestions on how it might proceed. Huarte pointed out that: people differ from one another with regard to certain talents different vocations require different sets of talents a system should be developed to determine specific patterns of abilities of different persons so that they can be guided into appropriate education programmes and occupations. This system would involve the appointment of a number of examiners (triers) who would carry out certain procedures (tryals) in order to determine a person’s capacity (McReynolds, 1986). A further milestone in the development of modern psychological assessment came from the work of Thomasius, a professor of philosophy in Germany. According to McReynolds (1986), Thomasius made two main contributions to the emerging field of assessment. He was the first person to develop behavioural rating scales, and furthermore, the ratings in his scales were primarily dependent on direct observations of the subject’s behaviour. Another milestone was the coining of the term psychometrics by Wolff. This term was used throughout the eighteenth and nineteenth centuries, but was mainly applied to psychophysical measurements (McReynolds, 1986). In the twentieth century, with the shift towards the measurement of individual differences, the term was applied to a wider variety of measuring instruments, such as cognitive (mental ability) and personality-related measures. After the foundation had been laid by experimental psychologists such as Wundt, the latter half of the nineteenth century saw some promising developments in the field of assessment linked to the work of Francis Galton, James McKeen Cattell, and Alfred Binet. One of experimental psychology’s major contributions to the field of psychological assessment was the notion that assessment should be viewed in the same light as an experiment, as it required the same rigorous control. As you will discover, one of the hallmarks of modern psychological assessment is that assessment measures are administered under highly standardised conditions. 2.3.2 The early twentieth century The twentieth century witnessed genuine progress in psychological assessment. The progress has mainly been attributed to advances in: theories of human behaviour that could guide the development of assessment measures statistical methods that aided the analysis of data obtained from measures to determine their relationship to job performance and achievement for example, as well as to uncover the underlying dimensions being tapped by a measure the application of psychology in clinical, educational, military, and industrial settings. Other than these advances, there was another important impetus that fuelled the development of modern psychological assessment measures in the twentieth century. Do you have any idea what this was? During the nineteenth century and at the turn of the twentieth century in particular, a need arose to treat mentally disturbed and disabled people in a more humanitarian way. To achieve this, the mental disorders and deficiencies of patients had to be properly assessed and classified. Uniform procedures needed to be found to differentiate people who were mentally insane or who suffered 28 from emotional disorders, from those who were mentally disabled or suffered from an intellectual deficit. A need therefore arose for the development of psychological assessment measures. According to Aiken and Groth-Marnat (2005), an important breakthrough in the development of modern psychological assessment measures came at the start of the twentieth century. In 1904, the French Minister of Public Instruction appointed a commission to find ways to identify mentally disabled individuals so that they could be provided with appropriate educational opportunities. One member of the French commission was Binet. Together with Simon, a French physician, Binet developed the first measure that provided a fairly practical and reliable way of measuring intelligence. The 1905 Binet-Simon Scale became the benchmark for future psychological tests. The measure was given under standardised conditions (i.e. everyone was given the same test instructions and format). Furthermore, norms were developed, albeit using a small and unrepresentative sample. More important than adequacy of the normative sample, though, was Binet and Simon’s notion that the availability of comparative scores could aid interpretation of test performance. It is interesting to note that one of the earliest records of the misuse of intelligence testing involved the Binet-Simon Scale (Gregory, 2010). An influential American psychologist, Henry Goddard was concerned about what he believed to be the high rate of mental retardation among immigrants entering the US. Consequently, Goddard’s English translation of the Binet-Simon Scale was administered to immigrants through a translator, just after they arrived in the US. ‘Thus, a test devised in French, then translated to English was, in turn, retranslated back to Yiddish, Hungarian, Italian, or Russian; administered to bewildered laborers who had just endured an Atlantic crossing; and interpreted according to the original French norms’ (Gregory, 2000, p. 17). It is thus not surprising that Goddard found that the average intelligence of immigrants was low! The Binet-Simon Scale relied heavily on the verbal skills of the test-taker and, in its early years, was available in French and English only. Consequently, its appropriateness for use with non-French or non-English test-takers, illiterates, and with speech- and hearing-impaired test-takers was questioned (Gregory, 2010). This sparked the development of a number of non-verbal measures (e.g. Seguin Form Board Test, Knox’s Digit Symbol Substitution Test, the Kohs Block Design Test, and the Porteus Maze Test). World War I further fuelled the need for psychological assessment measures. Why? Large numbers of military recruits needed to be assessed, but at that stage only individually administered tests, such as the Binet-Simon scale, were available. So World War I highlighted the need for large-scale group testing. Furthermore, the scope of testing broadened at this time to include tests of achievement, aptitude, interest, and personality. Following World War I, with the emergence of group tests that largely used a multiple-choice format, there was widespread optimism regarding the usefulness of psychological tests (Kaplan and Saccuzzo, 2009). Samelson (1979, p. 154) points out that Cattell remarked that during the war period ‘the army testing put psychology on the map of the US’. Given that over a million people were tested on the Army Alpha and Army Beta tests in the US, Cattell’s observation has merit. Furthermore, the testing of pilots in Italy and France and the testing of truck drivers for the German army, for example, suggest that the world wars did not only put psychology on the map in the US but elsewhere in the world as well. 2.3.3 Measurement challenges Although the period between the two World Wars was a boom period for the development of psychological measures, critics started pointing out the weaknesses and limitations of existing measures. Although this put test developers on the defensive, and dampened the enthusiasm of assessment practitioners, the knowledge gained from this critical look at testing inspired test developers to reach new heights. To illustrate this point, let us consider two examples, one from the field of intellectual assessment and the other from the field of personality assessment. 29 The criticism of intelligence scales up to this point, i.e. that they were too dependent on language and verbal skills, reduced their appropriateness for many individuals (e.g. for illiterates). To address this weakness, Wechsler included performance tests that did not require verbal responses when he published the first version of the Wechsler Intelligence Scales in 1937. Furthermore, whereas previous intelligence scales only yielded one score (namely, the intelligence quotient), the Wechsler Intelligence Scales yielded a variety of summative scores from which a more detailed analysis of an individual’s pattern of performance could be made. These innovations revolutionised intelligence assessment. The use of structured personality measures was severely criticised during the 1930s as many findings of personality tests could not be substantiated during scientific studies. However, the development of the Minnesota Multiphasic Personality Inventory (MMPI) by Butcher in 1943 began a new era for structured, objective personality measures. The MMPI placed an emphasis on using empirical data to determine the meaning of test results. According to Kaplan and Saccuzzo (2009), the MMPI and its revision, the MMPI-2, are the most widely used and referenced personality tests to this day. World War II reaffirmed the value of psychological assessment. The 1940s witnessed the emergence of new test development technologies, such as the use of factor analysis to construct tests such as the 16 Personality Factor Questionnaire. During this period there was also much growth in the application of psychology and psychological testing. Psychological testing came to be seen as one of the major functions of psychologists working in applied settings. In 1954 the American Psychological Association (APA) pronounced that psychological testing was exclusively the domain of the clinical psychologist. However, the APA unfortunately also pronounced that psychologists were permitted to conduct psychotherapy only in collaboration with medical practitioners. As you can imagine, many clinical psychologists became disillusioned by the fact that they could not practise psychotherapy independently, and, although they had an important testing role to fulfil, they began to feel that they were merely technicians who were playing a subservient role to medical practitioners. Consequently, when they looked around for something to blame for their poor position, the most obvious scapegoat was psychological testing (Lewandowski and Saccuzzo, 1976). At the same time, given the intrusive nature of tests and the potential to abuse testing, widespread mistrust and suspicion of tests and testing came to the fore. So, with both psychologists and the public becoming rapidly disillusioned with tests, many psychologists refused to use any tests, and countries such as the US, Sweden, and Denmark banned the use of tests for selection purposes in industry. So it is not surprising that according to Kaplan and Saccuzzo (2009), the status of psychological assessment declined sharply from the late 1950s, and this decline persisted until the 1970s. 2.3.4 The influence of multiculturalism In the latter part of the twentieth century and during the first two decades of the twenty-first century, multiculturalism has become the norm in many countries. As a result, attempts were made to develop tests that were ‘culture-free’. An example of such a measure is the Culture-free Intelligence Test (Anastasi and Urbina, 1997). However, it soon became clear that it was not possible to develop a test free of any cultural influence. Consequently, test developers focused more on ‘culture-reduced’ or ‘culture-common’ tests in which the aim was to remove as much cultural bias as possible from the test by including only behaviour that was common across cultures. For example, a number of non-verbal intelligence tests were developed (e.g. Test of Non-verbal Intelligence, Raven’s Progressive Matrices) where the focus was on novel problem-solving tasks and in which language use, which is often a stumbling block in cross-cultural tests, was minimised. Furthermore, given that most of the available measures have been developed in the US or the United Kingdom (UK), they tend to be more appropriate for westernised English-speaking people. In response to the rapid globalisation of the world’s population and the need for measures to be more culturally appropriate and available in the language in which the test-taker is proficient in, the focus of psychological testing in the 1980s and 1990s shifted to cross-cultural test adaptation. Under the leadership of Ron Hambleton from the US, the International Test Commission (ITC) released their Guidelines for Adapting Educational and Psychological Tests (Hambleton, 1994, 2001). These guidelines have recently been revised (International Test Commission, 2010a) and have become the benchmark for cross-cultural test translation and adaptation around the world. They have also assisted in advocating against assessment practices where test-takers are 30 tested in languages in which they are not proficient, sometimes using a translator who translates the test ‘on the run’. In addition, many methodologies and statistical techniques (e.g. Structural Equation Modeling) have been developed to establish whether different language versions of a test are equivalent (Hambleton, Merenda, & Spielberger, 2005). Sparked by research stemming from large-scale international comparative tests such as the Trends in International Mathematics and Science Study (TIMSS) and the Progressive International Reading Literacy Study (PIRLS), the second decade of the twenty-first century has seen renewed interest in issues of bias and fairness when testing in linguistically diverse contexts. Consequently, the ITC has decided to develop guidelines for testing language minorities. You can read more about how and why measures are adapted for use in different countries and cultures in Chapter 6 and about language issues in assessment in Chapters 7 and 9. A new trend that is emerging in the twenty-first century is to approach the development of tests that are used widely internationally (e.g. the Wechsler Intelligence Scales) from a multicultural perspective. For example, when it comes to the Wechsler Intelligence Scales for Children (WISC), the norm through the years has been to first develop and standardise the measure for the US and thereafter to adapt it for use outside the US. However, for the development of the Wechsler Intelligence Scale for Children – Fourth Edition (WISC-IV), experts from various countries are providing input on the constructs to be tapped as well as the content of the items to minimise potential cultural bias during the initial redesign phase (Weiss, 2003). In the process, the development of the WISC-IV is setting a new benchmark for the development of internationally applicable tests. A further recent trend in multicultural and multilingual test development is that of simultaneous multilingual test development (Solano-Flores, Turnbull and Nelson-Barber, 2002; Tanzer, 2005). This differs from the process outlined for the development of the WISC IV where people from different cultural and language groups provide input on the construct(s) to be tapped but the items are still developed in English before they are translated into other languages. Instead, in simultaneous multilingual test development, once the test specifications have been developed, items are written by a multilingual and multicultural panel or committee where each member has a background in psychology (general and cross-cultural in particular), measurement and linguistics as well as with respect to the specific construct that the test will measure (e.g. personality, mathematics). Chapter 7 will provide more information on this approach. Non-Western countries are also rising to the challenge of not only adapting westernised measures for their contexts (e.g. Grazina Gintiliene and Sigita Girdzijauskiene, 2008, report that, among others, the WISC-III and the Raven’s Coloured Matrices have been adapted for use in Lithuania) but also to develop their own indigenous measures, which are more suited to their cultural contexts. For example, Cheung and her colleagues have developed the Chinese Personality Inventory (CPAI), which was revised in 2000 and is now known as the CPAI-2 (Cheung, Leung, Fan, Song, Zhang, and Zhang, 1996). This measure includes both indigenous (culturally relevant) and universal personality dimensions. Indigenous personality constructs were derived from classical literature, everyday descriptions of people, surveys and previous psychological research. Thereafter items and scales were developed according to the highest acceptable psychometric standards. You can get information on the CPAI and its development by consulting Cheung and Cheung (2003). The way in which the CPAI was developed is widely regarded as the benchmark to attain in the development of culturally relevant personality measures. How this approach is being successfully used in the South African context will be covered in Chapter 7. Other than multiculturalism impacting on the nature of how tests are developed and adapted, due to rapid globalisation which has led to increasing multiculturalism in most societies, the choice of which norm group to use to compare an individual’s performance to has also become an issue. As will be outlined in Chapter 3, norms provide a basis for comparing an individual’s performance on a measure with the performance of a well-defined reference group to aid in the interpretation of test performance. Norm or reference groups can be constituted in terms of various characteristics of people (e.g. age, gender, educational level, job level, language, clinical diagnosis). The key criterion when choosing an appropriate norm group to compare an individual’s performance to is linked to the purpose of the comparison. For example, if the intention is to compare the performance with others in the same cultural group of a similar age, culturally and age appropriate norms must be used. However, what norms does a multinational organisation use to make comparisons of its workforce across cultural and national boundaries? Bartram (2008a) argues that using 31 locally developed national norms are not appropriate in this instance. Instead, he argues that multinational norms should be developed and used provided that the mix of country samples is reasonable and that the samples have similar demographics. 2.3.5 Standards, training, computerised testing, and test users’ roles In an attempt to address issues of fairness and bias in test use, the need arose to develop standards for the professional practice of testing and assessment. Under the leadership of Bartram from the UK, the ITC has developed a set of International Guidelines for Test-use (Version 2000). Many countries, including South Africa, have adopted these test-user standards, which should ensure that, wherever testing and assessment is undertaken in the world, similar practice standards should be evident. You can read more about this in Chapter 8. During the 1990s, competency-based training of assessment practitioners (test users) fell under the spotlight. The British Psychological Society (BPS) took the lead internationally in developing competency standards for different levels of test users in occupational testing in the UK. Based on these competencies, competency-based training programmes have been developed and all test users have to be assessed by BPS-appointed assessors in order to ensure that a uniform standard is maintained. (You can obtain more information from the Web site of the Psychological Testing Centre of the BPS at www.psychtesting.org.uk). Building on the work done in the UK, Sweden, Norway, and the Netherlands, the Standing Committee of Tests and Testing (which has subsequently been renamed the Board of Assessment) of the European Federation of Psychology Associations (EFPA) developed standards of competence related to test use and a qualification model for Europe as a whole. A Test-user Accreditation Committee has been established to facilitate the recognition of qualifications that meet or can be developed to meet such standards (visit www.efpa.be for more information). Given that many countries have assessment competency models, a project is underway to see whether these national models in professional psychology can be integrated into a competence model that can be recognised internationally (Nielsen, 2012). With assessment being widely used in recruitment, selection, and training in work and organisational settings, a standard was ‘developed by ISO (the International Organisation for Standardisation) to ensure that assessment procedures and methods are used properly and ethically’ (Bartram, 2008b, p. 9) and that those performing the assessment are suitably qualified. Published in 2010, the ISO Standard for Assessment in Organisational Settings (PC 230) now provides a benchmark for providers of assessment services to demonstrate that they have the necessary skills and expertise to provide assessments that are ‘fit for purpose’. In addition, the standard can be used by individual organisations to certify internal quality assurance processes, or by those contracting in assessment services to ascertain the minimum requirements that need to be met by the organisation providing the services, or by professional bodies for credentialing purposes. Other than focusing on the competence of assessment practitioners, there has been increased emphasis on test quality in the last two decades. Evers (2012) argues that such an emphasis is not surprising given the ‘importance of psychological tests for the work of psychologists, the impact tests may have on their clients, and the emphasis on quality issues in current society’ (p. 137). Many countries (e.g. the UK, Netherlands, Russia, Brazil, and South Africa) have test review systems – a mechanism used to set standards for and evaluate test quality. Chapter 8 will provide more detail on test review systems. Advances in information technology and systems in the latter half of the twentieth century impacted significantly on psychological testing. Computerised adaptive testing became a reality as did the use of the Internet for testing people in one country for a job in another country, for example. Computerised testing and testing via the Internet have revolutionalised all aspects of assessment and have produced their own set of ethical and legal issues. These issues require the urgent attention of test developers and users during the early part of the twenty-first century. You can read more about computerised testing and its history in Chapter 14 and the future predictions regarding assessment and test development in Chapter 18. Linked to, but not restricted to computer-based and Internet-delivered testing, a trend has emerged around who may use psychological measures and what their qualifications should be. Among the more important issues is confusion regarding the roles and responsibilities of people involved in the assessment process and what knowledge, qualifications, and expertise they require, which has been fuelled by the distinction that is 32 drawn between competency-based and psychological assessment (Bartram, 2003). The block below presents a fairly simplistic description of these two types of assessment to illustrate the distinction between them. Psychological assessment requires expertise in psychology and psychological theories to ensure that measures of cognitive, aptitude, and personality functioning are used in an ethical and fair manner, right from the choice of which tests to use through to interpretation and feedback. Furthermore, the outputs of psychological assessment are in the form of psychological traits/constructs (such as personality and ability). The expertise to perform psychological assessment is clearly embodied in an appropriately registered psychology professional. Competency-based assessment focuses on the skills, behaviours, knowledge, and attitudes/values required for effective performance in the workplace or in educational settings (e.g. communication, problem-solving, task orientation). The assessment measures used are as directly linked as possible to the required competencies. Indirect methods such as simulations and Assessment Centres are used to conduct competency-based assessment. As the outputs of such assessment are directly linked to the language of the workplace or educational settings, the test user does not need expertise in psychological theories to be able to apply the results of competency-based assessments. What is required, however is that competency-based assessment needs to be performed by people with expertise in this area of assessment (e.g. skilled in job analysis and competency-based interviews). Bartram (2003) argues that there is reasonable international consensus regarding the distinction between psychological and competency-based assessment. Consequently, some countries (e.g. Finland) have used this to legally distinguish between assessments that only psychologists can do (psychological assessment) and those that non-psychologists can do (competency-based assessment). The advent of computer-based and Internet-delivered assessment has further meant that the test user in terms of competency-based assessment is not involved in test selection, administration, scoring, and interpretation. Instead, the test user receives the output of the assessment in a language congruent with the competencies required in the workplace. In view of the changing nature of the definition of a test user, it is important that the roles, responsibilities and required training of all those involved in the assessment process is clarified to ensure that the test-taker is assessed fairly and appropriately. To draw this brief international review of the history of psychological assessment to a close, one could ask what the status of psychological testing and assessment is a decade into the twenty-first century? Roe (2008) provided some insights at the closing of the sixth conference of the ITC that could provide some answers to this question. Roe (2008) asserts that the field of psychometrics and psychological testing is doing well as it continues to innovatively respond to issues (e.g. globalisation and increased use of the Internet); develop new test development methodologies; advance test theory, data analysis techniques and measurement models (Alonso-Arbiol & van de Vijver, 2010); and to be sensitive to the rights and reactions of test-takers. Among the challenges that Roe (2008) identifies are the fact that test scores are not easily understood by the public, that psychology professionals struggle to communicate with policymakers, and that test security and cheating on tests are growing issues. He also argues that for testing to make a better contribution to society, psychology professionals need to conceptualise the fact that testing and assessment (from test selection to reporting) are part of the services that they offer to clients, and that the added value of tests should not be confused with the added value of test-based services. 2.4 The development of modern psychological assessment: A South African perspective 2.4.1 The early years How did psychological assessment measures come to be used in South Africa? As South Africa was a British colony, the introduction of psychological testing here probably stems from our colonial heritage (Claassen, 33 1997). The historical development of modern psychological measures in South Africa followed a similar pattern to that in the US and Europe (see Section 2.2). However, what is different and important to note is the context in which this development took place. Psychological assessment in South Africa developed in an environment characterised by the unequal distribution of resources based on racial categories (black, coloured, Indian, and white). Almost inevitably, the development of psychological assessment reflected the racially segregated society in which it evolved. So it is not surprising that Claassen (1997) asserts that ‘Testing in South Africa cannot be divorced from the country’s political, economic and social history’ (p. 297). Indeed, any account of the history of psychological assessment in South Africa needs to point out the substantial impact that apartheid policies had on test development and use (Nzimande, 1995). Even before the Nationalist Party came into power in 1948, the earliest psychological measures were standardised only for whites and were used by the Education Department to place white pupils in special education. The early measures were either adaptations of overseas measures such as the Standford-Binet, the South African revision of which became known as the Fick Scale, or they were developed specifically for use here, such as the South African Group Test (Wilcocks, 1931). Not only were the early measures only standardised for whites, but, driven by political ideologies, measures of intellectual ability were used in research studies to draw distinctions between races in an attempt to show the superiority of one group over another. For example, during the 1930s and 1940s, when the government was grappling with the issue of establishing ‘Bantu education’, Fick (1929) administered individual measures of motor and reasoning abilities, which had only been standardised for white children, to a large sample of black, coloured, Indian, and white school children. He found that the mean score of black children was inferior to that of Indian and coloured children, with whites’ mean scores superior to all groups. He remarked at the time that factors such as inferior schools and teaching methods, along with black children’s unfamiliarity with the nature of the test tasks, could have disadvantaged their performance on the measures. However, when he extended his research in 1939, he attributed the inferior performance of black children in comparison to that of white children to innate differences, or, in his words, ‘difference in original ability’ (p. 53) between blacks and whites. Fick’s conclusions were strongly challenged and disputed. For example, Fick’s work was severely criticised by Biesheuvel in his book African Intelligence (1943), in which an entire chapter was devoted to this issue. Biesheuvel queried the cultural appropriateness of Western-type intelligence tests for blacks and highlighted the influence of different cultural, environmental, and temperamental factors and the effects of malnutrition on intelligence. This led him to conclude that ‘under present circumstances, and by means of the usual techniques, the difference between the intellectual capacity of Africans and Europeans cannot be scientifically determined’ (Biesheuvel, 1943, p. 91). In the early development and use of psychological measures in South Africa, some important trends can be identified which were set to continue into the next and subsequent eras of psychological assessment in South Africa. Any idea what these were? The trends were: the focus on standardising measures for whites only the misuse of measures by administering measures standardised for one group to another group without investigating whether or not the measures might be biased and inappropriate for the other group the misuse of test results to reach conclusions about differences between groups without considering the impact of inter alia cultural, socio-economic, environmental, and educational factors on test performance. 2.4.2 The early use of assessment measures in industry The use of psychological measures in industry gained momentum after World War II and after 1948 when the Nationalist Government came into power. As was the case internationally, psychological measures were developed in response to a societal need. According to Claassen (1997), after World War II, there was an urgent need to identify the occupational suitability (especially for work on the mines) of large numbers of 34 blacks who had received very little formal education. Among the better measures constructed was the General Adaptability Battery (GAB) (Biesheuvel, 1949, 1952), which included a practice session during which test-takers were familiarised with the concepts required to solve the test problems and were asked to complete some practice examples. The GAB was predominantly used for a preliterate black population, speaking a number of dialects and languages. Because of job reservation under the apartheid regime and better formal education opportunities, as education was segregated along racial lines, whites competed for different categories of work to blacks. The Otis Mental Ability Test, which was developed in the US and only had American norms, was often used when assessing whites in industry (Claassen, 1997). Among the important trends in this era that would continue into subsequent eras were: the development of measures in response to a need that existed within a certain political dispensation the notion that people who are unfamiliar with the concepts in a measure should be familiarised with them before they are assessed the use of overseas measures and their norms without investigating whether they should be adapted/revised for use in South Africa. 2.4.3 The development of psychological assessment from the 1960s onwards According to Claassen (1997), a large number of psychological measures were developed in the period between 1960 and 1984. The National Institute for Personnel Research (NIPR) concentrated on developing measures for industry while the Institute for Psychological and Edumetric Research (IPER) developed measures for education and clinical practice. Both of these institutions were later incorporated into the Human Sciences Research Council (HSRC). In the racially segregated South African society of the apartheid era, it was almost inevitable that psychological measures would be developed along cultural/racial lines as there ‘was little specific need for common tests because the various groups did not compete with each other’ (Owen, 1991, p. 112). Consequently, prior to the early 1980s, Western models were used to develop similar but separate measures for the various racial and language groups (Owen, 1991). Furthermore, although a reasonable number of measures were developed for whites, considerably fewer measures were developed for blacks, coloureds, and Indians. During the 1980s and early 1990s, once the sociopolitical situation began to change and discriminatory laws were repealed, starting with the relaxation of ‘petty apartheid’, applicants from different racial groups began competing for the same jobs and the use of separate measures in such instances came under close scrutiny. A number of questions were raised, such as: How can you compare scores if different measures are used? How do you appoint people if different measures are used? In an attempt to address this problem, two approaches were followed. In the first instance, measures were developed for more than one racial group, and/or norms were constructed for more than one racial group so that test performance could be interpreted in relation to an appropriate norm group. Examples of such measures are the General Scholastic Aptitude Test (GSAT), the Ability, Processing of Information, and Learning Battery (APIL-B), and the Paper and Pencil Games (PPG), which was the first measure to be available in all eleven official languages in South Africa. In the second instance, psychological measures developed and standardised on only white South Africans, as well as those imported from overseas, were used to assess other groups as well. In the absence of appropriate norms, the potentially bad habit arose of interpreting such test results ‘with caution’. Why was this a bad habit? 35 It eased assessment practitioners’ consciences and lulled them into a sense that they were doing the best they could with the few tools at their disposal. You can read more about this issue in Chapter 8. The major problem with this approach was that, initially, little research was done to determine the suitability of these measures for a multicultural South African environment. Research studies that investigated the performance of different groups on these measures were needed to determine whether or not the measures were biased. Despite the widespread use of psychological measures in South Africa, the first thorough study of bias took place only in 1986. This was when Owen (1986) investigated test and item bias using the Senior Aptitude Test, Mechanical Insight Test, and the Scholastic Proficiency Test on black, white, coloured, and Indian subjects. He found major differences between the test scores of blacks and whites and concluded that understanding and reducing the differential performance between black and white South Africans would be a major challenge. Research by Abrahams (1996), Owen (1989a, 1989b), Retief (1992), Taylor and Boeyens (1991), and Taylor and Radford (1986) showed that bias existed in other South African ability and personality measures as well. Other than empirical investigations into test bias, Taylor (1987) also published a report on the responsibilities of assessment practitioners and publishers with regard to bias and fairness of measures. Furthermore, Owen (1992a, 1992b) pointed out that comparable test performance could only be achieved between different groups in South Africa if environmentally disadvantaged test-takers were provided with sufficient training in taking a particular measure before they actually took it. Given the widespread use (and misuse) of potentially culturally biased measures, coupled with a growing perception that measures were a means by which the Nationalist Government could exclude black South Africans from occupational and educational opportunities, what do you think happened? A negative perception regarding the usefulness of psychological measures developed and large sections of the South African population began to reject the use of psychological measures altogether (Claassen, 1997; Foxcroft, 1997b). Issues related to the usefulness of measures will be explored further in the last section of this chapter. Remember that in the US testing came to be seen as one of the most important functions of psychologists, and only psychologists. During the 1970s important legislation was tabled in South Africa that restricted the use of psychological assessment measures to psychologists only. The Health Professions Act (No. 56 of 1974) defines the use of psychological measures as constituting a psychological act, which can legally only be performed by psychologists, or certain other groups of professions under certain circumstances. The section of the Act dealing with assessment will be explored further in Chapter 8. Among the important trends in this era were: the impact of the apartheid political dispensation on the development and fair use of measures the need to empirically investigate test bias growing scepticism regarding the value of psychological measures, especially for black South Africans. 2.4.4 Psychological assessment in the democratic South Africa 2.4.4.1 Assessment in education Since 1994 and the election of South Africa’s first democratic government, the application, control, and development of assessment measures have become contested terrain. With a growing resistance to assessment measures and the ruling African National Congress (ANC) express purpose to focus on issues of equity to redress past imbalances, the use of tests in industry and education in particular has been placed under the spotlight. School readiness testing, as well as the routine administration of group tests in schools, was banned in many provinces as such testing was seen as being exclusionary and perpetuating the discriminatory policies of the past. Furthermore, the usefulness of test results and assessment practices in educational settings has been strongly queried in the Education White Paper 6, Special Needs Education: Building an Inclusive Education and Training System (Department of Education, 2001), for example. Psychometric test results should contribute to the identification of learning problems and educational programme planning as well as informing the instruction of learners. Why is this not happening? Could it be that the measures used are not 36 cross-culturally applicable or have not been adapted for our diverse population and thus do not provide valid and reliable results? Maybe it is because the measures used are not sufficiently aligned with the learning outcomes of Curriculum 21? Or could it be that psychological assessment reports are filled with jargon and recommendations which are not always translated into practical suggestions on what the educator can do in class to support and develop the learner? Furthermore, within an inclusionary educational system, the role of the educational psychologist along with that of other professional support staff is rapidly changing. Multi-disciplinary professional district-based support teams need to be created and have been established in some provinces (e.g. the Western Cape) (Department of Education, 2005). Within these teams, the primary focus of the psychologists and other professionals is to provide indirect support (‘consultancy’) to learners through supporting educators and the school management (e.g. to identify learner needs and the teaching and learning strategies that can respond to these needs, to conduct research to map out needs of learners and educators, and to establish the efficacy of a programme). However, Pillay (2011) notes that psychologists often find it challenging to work in collaborative teams in educational contexts, which could be linked to the fact that they need in-service training to equip them in this regard. A further way that psychologists can support educators in identifying barriers to learning in their learners is to train them in the use of educational and psychological screening measures (see Chapter 15 for a discussion on screening versus diagnostic assessment). The secondary focus of the professional support teams is to provide direct learning support to learners (e.g. diagnostic assessment to describe and understand a learner’s specific learning difficulties and to develop an intervention plan). As is the case with the changing role of psychologists in assessment due to the advent of competency-based, computer-based and Internet-delivered testing, the policies on inclusive education and the strategies for its implementation in South Africa are changing the role that educational psychologists play in terms of assessment and intervention in the school system. This can be viewed as a positive challenge as educational psychologists have to develop new competencies (e.g. to train and mentor educators in screening assessment; to apply measurement and evaluation principles in a variety of school contexts). In addition, psychologists will have to ensure that their specialist assessment role is not eroded. However, instead of the specialist assessment being too diagnostically focused, it also needs to be developmentally focused so that assessment results can inter alia be linked to development opportunities that the educator can provide (see Chapter 15). 2.4.4.2 The Employment Equity Act To date, the strongest stance against the improper use of assessment measures has come from industry. Historically, individuals were not legally protected against any form of discrimination. However, with the adoption of the new Constitution and the Labour Relations Act in 1996, worker unions and individuals now have the support of legislation that specifically forbids any discriminatory practices in the workplace and includes protection for applicants as they have all the rights of current employees in this regard. To ensure that discrimination is addressed within the testing arena, the Employment Equity Act (No. 55 of 1998, section 8) refers to psychological tests and assessment specifically and states: Psychological testing and other similar forms or assessments of an employee are prohibited unless the test or assessment being used: (a) has been scientifically shown to be valid and reliable (b) can be applied fairly to all employees (c) is not biased against any employee or group. The Employment Equity Act has major implications for assessment practitioners in South Africa because many of the measures currently in use, whether imported from the US and Europe or developed locally, have not been investigated for bias and have not been cross-culturally validated here (as was discussed in Section 2.3.3). The impact of this Act on the conceptualisation and professional practice of assessment in South Africa in general is far-reaching as assessment practitioners and test publishers are increasingly being called upon to demonstrate, or prove in court, that a particular assessment measure does not discriminate against certain groups of people. It is thus not surprising that there has been a notable increase in the number of test 37 bias studies since the promulgation of the Act in 1998 (e.g. Abrahams and Mauer, 1999a, 1999b; Lopes, Roodt and Mauer, 2001; Meiring, Van de Vijver, Rothmann and Barrick, 2005; Meiring, Van de Vijver, Rothmann and Sackett, 2006; Schaap, 2003, 2011; Schaap and Basson, 2003; Taylor, 2000; Van Zyl and Visser, 1998; Van Zyl and Taylor, 2012; Visser & Viviers, 2010). A further consequence of the Employment Equity Act is that there is an emerging thought that it would be useful if test publishers and distributors could certify a measure as being ‘Employment Equity Act Compliant’ as this will aid assessment practitioners when selecting measures (Lopes, Roodt and Mauer, 2001). While this sounds like a very practical suggestion, such certification could be misleading. For example, just because a measure is certified as being compliant does not protect the results from being used in an unfair way when making selection decisions. Furthermore, given the variety of cultural and language groups in this country, bias investigations would have to be conducted for all the subgroups on whom the measure is to be used before it can be given the stamp of approval. Alternatively, it would have to be clearly indicated for what subgroups the measure has been found to be unbiased and that it is only for these groups that the measure complies with the Act. The advent of the Employment Equity Act has also forced assessment practitioners to take stock of the available measures in terms of their quality, cross-cultural applicability, the appropriateness of their norms, and the availability of different language versions. To this end, the Human Sciences Research Council (HSRC) conducted a survey of test use patterns and needs of practitioners in South Africa (Foxcroft, Paterson, le Roux, and Herbst, 2004). Among other things, it was found that most of the tests being used frequently are in need of adapting for our multicultural context or require updating, and appropriate norms and various language versions should be provided. The report on the survey, which can be viewed at www.hsrc.ac.za, provides an agenda for how to tackle the improvement of the quality of measures and assessment practices in South Africa as well as providing a list of the most frequently used measures which should be earmarked for adaptation or revision first. You can read more about this agenda in Chapter 18. Having an agenda is important, but having organisations and experts to develop and adapt tests is equally important. As was discussed in 2.3.3, the Human Sciences Research Council (HSRC), into which the NIPR and IPER were incorporated, developed a large number of tests during the 1960s through to the 1980s. For at least three decades the HSRC almost exclusively developed and distributed tests in South Africa. However, at the start of the 1990s the HSRC was restructured, became unsure about the role that it should play in psychological and educational test development, and many staff with test development expertise left the organisation. Consequently, since the South African adaptation of the Wechsler Adult Intelligence Scales-III (WAISIII) in the mid-1990s, the HSRC has not developed or adapted any other tests and its former tests that are still in circulation are distributed by Mindmuzik Media now. According to Foxcroft and Davies (2008), with the HSRC relinquishing its role as the major test developer and distributor in South Africa, other role players gradually emerged to fill the void in South African test development. Some international test development and distribution agencies such as SHL (www.shl.co.za) and Psytech (www.psytech.co.za) have agencies in South Africa. Furthermore, local test agencies such as Jopie van Rooyen & Partners SA (www.jvrafrica.co.za) and Mindmuzik Media (www.mindmuzik.com) have established themselves in the market place. Each of these agencies has a research and development section and much emphasis is being placed on adapting international measures for the South African context. However, there is still some way to go before all the tests listed on the websites of the agencies that operate in South Africa have been adapted for use here and have South African norms. The other encouraging trend is that there is a greater involvement of universities (e.g. Unisa and the Nelson Mandela Metropolitan, North-West, Rhodes, Witwatersrand, Pretoria, Johannesburg, Stellenbosch, and KwaZulu-Natal universities) in researching and adapting tests, developing local norms and undertaking local psychometric studies, and even developing indigenous tests (e.g. Taylor and de Bruin, 2003). Furthermore, some organisations such as the South African Breweries and the South African Police Services (SAPS) that undertake large scale testing have undertaken numerous studies to provide psychometric information on the measures that they use, investigate bias, and adapt measures on the basis of their findings (e.g. Meiring, 2007). South African assessment practitioners and test developers have thus not remained passive in the wake of legislation impacting on test use. Although the future use of psychological assessment, particularly in industry and education, still hangs in the balance at this stage, there are encouraging signs that progress is 38 being made to research and adapt more measures appropriate for our context and to use them in fair ways to the benefit of individuals and organisations. The way in which psychologists and test developers continue to respond to this challenge will largely shape the future destiny of psychological testing here. Consult Chapter 18 for future perspectives on this issue. 2.4.4.3 Professional practice guidelines According to the Health Professions Act (No. 56 of 1974), the Professional Board for Psychology of the Health Professions Council of South Africa (HPCSA), is mandated to protect the public and to guide the profession of psychology. In recent years, the Professional Board has become increasingly concerned about the misuse of assessment measures in this country, while recognising the important role of psychological assessment in the professional practice of psychology as well as for research purposes. Whereas the Professional Board for Psychology had previously given the Test Commission of the Republic of South Africa (TCRSA) the authority to classify psychological tests and oversee the training and examination of certain categories of assessment practitioners, these powers were revoked by the Board in 1996. The reason for this was that the TCRSA did not have any statutory power and, being a section 21 company that operated largely in Gauteng, its membership base was not representative of psychologists throughout the country. Instead, the Professional Board for Psychology formed the Psychometrics Committee, which, as a formal committee of the Board, has provided the Board with a more direct way of controlling and regulating psychological test use in South Africa. The role of the Psychometrics Committee of the Professional Board and some of the initiatives that it has launched will be discussed more fully in Chapter 8. The further introduction of regulations and the policing of assessment practitioners will not stamp out test abuse. Rather, individual assessment practitioners need to make the fair and ethical use of tests a norm for themselves. Consequently, the Psychometrics Committee has actively participated in the development of internationally acceptable standards for test use in conjunction with the International Test Commission’s (ITC) test-use project (see Section 2.2 and Chapter 8) and has developed certain competency-based training guidelines (see www.hpcsa.co.za). Furthermore, the various categories of psychology professionals who use tests have to write a national examination under the auspices of the Professional Board before they are allowed to practise professionally. Such a national examination helps to ensure that professionals enter the field with at least the same minimum discernible competencies. In Chapter 8, the different categories of assessment practitioners in South Africa will be discussed, as will their scope of practice and the nature of their training. To complicate matters, the introduction of computer-based and Internet-delivered testing has led to a reconceptualisation of the psychology professional’s role, especially in terms of test administration. There are those that argue that trained test administrators (non-psychologists) can oversee structured, computer-based testing and that the test classification system in South African should make allowance for this. Indeed, in the landmark court case which hinged on the definition of what test use means, the honourable Judge Bertelsmann concluded that the Professional Board needs to publish a list of tests that are restricted for use by psychologists as it ‘is unlikely that the primarily mechanical function of the recording of test results should be reserved for psychologists’ (Association of Test Publishers of South Africa; and Another, Savalle and Holdsworth South Africa v. The Chairperson of the Professional Board of Psychology, 2010, p. 20). Whether it is correct to conclude that test administration is purely mechanistic will be critically examined in Chapter 9. There is also a growing recognition among South African psychologists that many of the measures used in industry and the educational sector do not fall under the label of ‘psychological tests’. Consequently, as the general public does not differentiate between whether or not a test is a psychological test, there is a need to set general standards for testing and test use in this country. This, together with the employment equity and labour relations legislation in industry, has led to repeated calls to define uniform test-user standards across all types of tests and assessment settings and for a central body to enforce them (Foxcroft, Paterson, le Roux and Herbst, 2004). To date, such calls have fallen on deaf ears. As you look back over this section, you should become aware that psychological assessment in South Africa has been and is currently being shaped by: 39 legislation and the political dispensation of the day the need for appropriate measures to be developed that can be used in a fair and unbiased way for people from all cultural groups in South Africa the role that a new range of test development and distribution agencies and universities are playing to research, adapt and develop measures that are appropriate for our multicultural context the need for assessment practitioners to take personal responsibility for ethical test use training and professional practice guidelines provided by statutory (e.g. the Professional Board for Psychology) and other bodies (e.g. PsySSA, SIOPSA). CRITICAL THINKING CHALLENGE 2.2 Spend some time finding the common threads or themes that run through the historical perspective of assessment from ancient to modern times, internationally and in the South African context. You might also want to read up on the development of psychological testing in Africa (see for example Foxcroft, C.D., 2011). 2.5 Can assessment measures and the process of assessment still fulfil a useful function in modern society? One thing that you have probably realised from reading through the historical account of the origins of psychological testing and assessment is that its popularity has waxed and waned over the years. However, despite the attacks on testing and the criticism levelled at it, psychological testing has survived, and new measures and test development technologies continue to be developed each year. Why do you think that this is so? In the South African context do you think that, with all the negative criticisms against it, psychological testing and assessment can still play a valuable role? Think about this for a while before you read some of the answers to these questions that Foxcroft (1997b); Foxcroft, Paterson, Le Roux, and Herbst (2004), and others, have come up with for the South African context. Ten academics involved in teaching psychological assessment at universities were asked whether there is a need for psychological tests in present day South Africa and they all answered ‘Yes’ (Plug, 1996). One academic went so far as to suggest that ‘the need for tests in our multicultural country is greater than elsewhere because valid assessment is a necessary condition for equity and the efficient management of personal development’ (Plug, 1996, p. 3). In a more recent survey by Foxcroft, Paterson, Le Roux, and Herbst (2004), the assessment practitioners surveyed suggested that the use of tests was central to the work of psychologists and that psychological testing was generally being perceived in a more positive light at present. Among the reasons offered for these perceptions was the fact that tests are objective in nature and more useful than alternative methods such as interviews. Further reasons offered were that tests provide structure in sessions with clients and are useful in providing baseline information, which can be used to evaluate the impact of training, rehabilitation or psychotherapeutic interventions. Nonetheless, despite psychological testing and assessment being perceived more positively, the practitioners pointed out that testing and assessment ‘only added value if tests are culturally appropriate and psychometrically sound, and are used in a fair and an ethical manner by well-trained assessment practitioners’ (Foxcroft, Paterson, Le Roux and Herbst, 2004, p. 135). Psychological testing probably continues to survive and to be of value because of the fact that assessment has become an integral part of modern society. Foxcroft (1997b) points out that in his realistic reaction to the anti-test lobby, Nell (1994) argued: psychological assessment is so deeply rooted in the global education and personnel selection systems, and in the administration of civil and criminal justice, that South African parents, teachers, employers, work seekers, and lawyers will continue to demand detailed psychological assessments (p. 105). 40 Furthermore, despite its obvious flaws and weaknesses, psychological assessment continues to aid decision-making, provided that it is used in a fair and ethical manner by responsible practitioners (Foxcroft, Paterson, Le Roux and Herbst, 2004). Plug (1996) has responded in an interesting way to the criticisms levelled at testing. He contends ‘the question is not whether testing is perfect (which obviously it is not), but rather how it compares to alternative techniques of assessment for selection, placement or guidance, and whether, when used in combination with other processes, it leads to a more reliable, valid, fair and cost-effective result’ (Plug, 1996, p. 5). Is there such evidence? In the field of higher education for example, Huysamen (1996b) and Koch, Foxcroft, and Watson (2001) have shown that biographical information, matriculation results, and psychological test results predict performance at university and show promise in assisting in the development of fair and unbiased admission procedures at higher education institutions. Furthermore, a number of studies have shown how the results from cognitive and psychosocial tests correlate with academic performance, predict success and assist in the planning of academic support programmes (e.g. Petersen et al., 2009; Sommer & Dumont, 2011). In industry, 90 per cent of the human resource practitioners surveyed in a study conducted by England and Zietsman (1995) indicated that they used tests combined with interviews for job selection purposes and about 50 per cent used tests for employee development. This finding was confirmed by Foxcroft, Paterson, Le Roux and Herbst (2004) when, as part of the HSRC’s test-use survey, it was found that 85 per cent of the industrial psychologists surveyed used psychological tests in work settings. In clinical practice, Shuttleworth-Jordan (1996) found that even if tests that had not been standardised for black children were used, a syndrome-based neuropsychological analysis model made it possible to make appropriate clinical judgements which ‘reflect a convincing level of conceptual validity’ (p. 99). Shuttleworth-Jordan (1996) argued that by focusing on common patterns of neuropsychological dysfunction rather than using a normative-based approach which relies solely on test scores, some of the problems related to the lack of appropriate normative information could be circumvented. More recently, Shuttleworth-Edwards (2012) reported on descriptive comparisons of performance on the South African adapted WAIS-III and the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV), which has not yet been adapted but for which she has developed normative guidelines. Based on a case study with a 20-year-old brain-injured Xhosa-speaking individual, she concluded: ‘Taking both clinical and statistical data into account, a neuropsychological analysis was elucidated … on the basis of which contextually coherent interpretation of a WAIS-IV brain-injury test protocol … is achieved’ (p. 399). Similarly, Odendaal, Brink and Theron (2011) reported on six case studies of black adolescents where they used quantitative data obtained using the Rorschach Comprehensive System (RCS) (Exner, 2003) together with qualitative follow-up interview information, observations and a culturally sensitive approach to RCS interpretation (CSRCS). Although cautioning that more research was needed, they concluded: ‘in the hands of the trained and experienced clinician … using the CSRCS potentially offers invaluable opportunities to explore young people’s resilience-promoting processes’ (p. 537). Huysamen (2002), however, cautions that when practitioners need to rely more on professional judgement than objective, norm-based scores, they need to be aware that the conclusions and opinions ‘based on so-called intuitions have been shown to be less accurate than those based on the formulaic treatment of data’ (p. 31). The examples cited above suggest that there is South African research evidence to support the value of psychological test information when it is used along with other pertinent information and clinical/professional judgement to make decisions. Furthermore, Foxcroft (1997b) asserts that in the process of grappling with whether assessment can serve a useful purpose in South Africa: attention has shifted away from a unitary testing approach to multi-method assessment. There was a tendency in the past to erroneously equate testing and assessment. In the process, clinicians forgot that test results were only one source of relevant data that could be obtained. However, there now appears to be a growing awareness of the fact that test results gain in meaning and relevance when they are integrated with information obtained from other sources and when they are reflected against the total past and present context of the testee (Claassen, 1995; Foxcroft, 1997b, p. 231). To conclud