Ability Testing PDF
Document Details
Uploaded by UsableDysprosium2609
Silliman University
Margaret Helen U. Alvarez
Tags
Summary
This document discusses ability testing and psychological assessment, providing an overview of tests of intellectual ability, the nature of intelligence, and considerations in using the symbol "IQ".
Full Transcript
ABILITY TESTING PSYCHOLOGICAL ASSESSMENT MARGARET HELEN U. ALVAREZ TESTS OF INTELLECTUAL ABILITY All psychological tests are designed to measure behavior. Hence the selection of proper tests and the interpretation of test results require knowledge about human behavior. TESTS OF...
ABILITY TESTING PSYCHOLOGICAL ASSESSMENT MARGARET HELEN U. ALVAREZ TESTS OF INTELLECTUAL ABILITY All psychological tests are designed to measure behavior. Hence the selection of proper tests and the interpretation of test results require knowledge about human behavior. TESTS OF INTELLECTUAL ABILITY Familiarity with relevant behavioral research is needed not only by the test constructor but also by the test user. What can psychological research contribute to the understanding of the behavior measured by tests of cognitive abilities or “intelligence”? THE NATURE OF INTELLIGENCE A convenient index of intelligence is the intelligence quotient or the IQ. It expresses intelligence as a ratio of mental age to chronological age: MA IQ = x 100 CA MA is obtained by summing the number of items passed at each level. [Suggested by German psychologist William Stern, adopted by Lewis Terman] NOTE THAT THE IQ IS NO LONGER CALCULATED USING THIS EQUATION. Tables are used to convert raw scores on the test to standard scores that are adjusted so the mean at each age equals 100. IMPORTANT: When considering the numerical value of a given IQ, one should always specify the test from which it was derived. Different intelligence tests that yield an IQ differ in content and in other ways that affect the interpretation of their scores. SOME CONSIDERATIONS IN USING THE SYMBOL “IQ”: 1. Tested intelligence should be regarded as a descriptive rather than an explanatory concept. An IQ is an expression of an individual’s ability level at a given point in time, in relation to the available age norms. No intelligence test can indicate the reasons for one’s performance. Intelligence tests should be used not to label individuals but to help in understanding them. CONSIDERATIONS, CONT’D… 2. Intelligence is not a single, unitary ability, but a composite of several functions. The term is commonly used to cover that combination of abilities required for survival and advancement within a particular culture. Implications: specific abilities in this composite vary with time and place. In different cultures and at different historical periods within that culture, the qualifications for successful achievement differs. One individual varies from infancy to adulthood. ADDITIONAL CONSIDERATIONS: To base decisions on tests alone, and especially on one or two tests alone, is clearly a misuse of tests. Decisions must be made by persons. Tests represent one set of data utilized in making decisions; they are not themselves decision-making instruments. At a broader level, all test results can be best understood within a contextual framework. DEFINING INTELLIGENCE: BINET’S VIEWPOINT Alfred Binet defined intelligence as the capacity 1. To find and maintain a definite direction or purpose, 2. To make necessary adaptations—that is, strategy adjustments—to achieve that purpose, and 3. To engage in self-criticism so that necessary adjustments in strategy can be made. In developing tasks to measure judgment, attention, and reasoning, Binet was guided by two principles. BINET’S PRINCIPLES OF TEST CONSTRUCTION Principle 1: Age Differentiation Refers to the simple fact that one can differentiate older children from younger children by the former’s greater capabilities. Principle 2: General Mental Ability Refers to the total product of the various separate and distinct elements of intelligence. SPEARMAN’S MODEL OF GENERAL MENTAL ABILITY Binet was not alone in his conception of general mental ability. Before Binet, the idea was propounded by Francis Galton (1869) in his book Hereditary Genius. Working independently of Binet, in Great Britain, Charles Spearman (1904, 1927) advanced this idea: Intelligence consists of one general factor (g) plus a large number of specific factors. SPEARMAN’S MODEL OF GENERAL MENTAL ABILITY Spearman’s general mental ability, which he referred to as psychometric g (or simply g), was based on the well-documented phenomenon that when a set of diverse ability tests are administered to large unbiased samples of the population, almost all of the correlations are positive. This phenomenon is called positive manifold, which according to Spearman resulted from the fact that all tests, no matter how diverse, are influenced by g. FACTOR ANALYSIS To support the notion of g, Spearman developed a statistical technique called factor analysis—a method for reducing a set of variables or scores to a smaller number of hypothetical variables called factors. Through factor analysis, one can determine the common variance of all factors. This common variance represents the g factor. Today, Spearman’s g is the most established and ubiquitous predictor of occupational and educational performance. IMPLICATIONS OF GENERAL MENTAL INTELLIGENCE (g) 1. A person’s intelligence can best be represented by a single score, g, that presumably reflects the shared variance underlying performance on a diverse set of tests. 2. Differences in unique ability stemming from the specific task tend to cancel each other, and overall performance comes to depend most heavily on the general factor. THE gf-gc THEORY OF INTELLIGENCE Recent theories of intelligence have suggested that human intelligence can best be conceptualized in terms of multiple intelligences rather than a single score. One such theory is called the gf-gc theory. According to this theory, there are two basic types of intelligence: fluid (f) and crystallized (c). Fluid intelligence can best be thought of as those abilities that allow us to reason, think, and acquire new knowledge. Crystallized intelligence represents the knowledge and understanding that we have acquired. INDIVIDUAL TESTS 1. Stanford-Binet Intelligence Scale For use from the age of two years to the adult level 15 tests representing five major cognitive areas: Fluid reasoning (FR) Knowledge (KN) Quantitative reasoning (QR) Visual/Spatial reasoning (VS) Working memory (WM) STANFORD-BINET INTELLIGENCE TEST 1. Stanford-Binet Intelligence Scale 1905 Binet-Simon Scale 1908 Binet-Simon Scale 1916 Stanford-Binet Intelligence Scale 1937 Stanford-Binet 1960 Stanford-Binet 1986 Stanford-Binet (4th edition) Current edition: 5th edition, 2003 INTELLIGENCE TESTS, CONT’D… 2. The Wechsler Scales (original 1939) Why was the Wechsler developed? a. Overemphasis on speed in most tests; this handicaps the older adult b. Routine manipulation of words received undue weight in the traditional intelligence test c. Inapplicability of mental age norms to adults; few adults had previously been included in the standardization samples for intelligence tests. THE WECHSLER SCALES Although both Binet and Terman considered the influence of nonintellective factors on results from intelligence tests, David Wechsler, author of the Wechsler scales, has been perhaps one of the most influential advocates of the role of nonintellective factors in these tests. Throughout his career, Wechsler emphasized that factors other than intellectual ability are involved in intelligent behavior. POINT AND PERFORMANCE SCALE CONCEPTS Two of the most critical differences between the Wechsler and the original Binet scale were: 1. Wechsler’s use of the point scale concept rather than an age scale; 2. Wechsler’s inclusion of a performance scale. THE POINT SCALE CONCEPT Credits or points are assigned to each item. An individual receives a specific amount of credit for each item passed. The point scale offers an inherent advantage: It makes it easy to group items of a particular content together (Binet did not do this until the 1986 version). By arranging items according to content and assigning a specific number of points to each item, the Wechsler yielded not only a total overall score but also scores for each content area. THE PERFORMANCE SCALE CONCEPT The early Binet scale was criticized for emphasizing language and verbal skills. Wechsler included a measure of nonverbal intelligence: a performance scale. Consisted of tasks that require the person to do something (e.g., copy symbols or point to a missing detail) rather than merely answer questions. The performance scale attempts to overcome biases caused by language, culture, and education. THE WECHSLER SCALES Wechsler-Bellevue Scale (1939) Wechsler Adult Intelligence Scale (WAIS) (1955) WAIS-R (1981) WAIS-III (1997) Wechsler Intelligence Scale for Children (WISC) (1949) Current edition: WISC-IV (2003) Wechsler Preschool and Primary Scale of Intelligence (WPPSI) (1967) Current edition: WPPSI (2002) WECHSLER’S DEFINITION OF INTELLIGENCE Like Binet, Wechsler defined intelligence as the capacity to act purposefully and to adapt to the environment. In his words, intelligence is the aggregate or global capacity of the individual to act purposefully, to think rationally and to deal effectively with his/her environment. Intelligence comprises several specific interrelated functions or elements and general intelligence results from the interplay of these elements. WECHSLER SUBTESTS Subtest Major function measured Verbal Scales Vocabulary Vocabulary level Similarities Abstract thinking Arithmetic Concentration Digit Span Immediate memory, anxiety Information Range of knowledge Comprehension Judgment Letter-number sequencing Freedom from distractibility Performance Scales Picture Completion Alertness to details Digit symbol-coding Visual-motor functioning Block design Nonverbal reasoning Matrix reasoning Inductive reasoning Picture arrangement Planning ability Symbol search Information-processing speed Object assembly Analysis of part-whole relationships OTHER INDIVIDUAL TESTS OF ABILITY Both Binet & Wechsler Scales are exceptionally good instruments for assessing intelligence in relatively normal individuals. However, both have their limitations. Standardization samples do not include individuals with sensory, physical, or language handicaps. Cattell Scales McCarthy Scales of Children’s Abilities (MSCA) Kaufman Assessment Battery for Children (K-ABC) DISADVANTAGES OF ALTERNATIVES 1. Weaker standardization sample 2. Less stable 3. Less documentation on validity 4. Limitations in test manual 5. Not as psychometrically sound 6. IQ scores not interchangeable with Binet or Wechsler ADVANTAGES OF ALTERNATIVES 1. Can be used for specific populations and special purposes: Sensory limitations, Physical limitations, Language limitations, Culturally deprived people, Foreign-born individuals, Non-English-speaking people 2. Not as reliant on verbal responses. 3. Not as dependent on complex visual motor integration 4. Useful for screening, supplement, and reevaluations 5. Can be administered nonverbally. 6. Less variability due to scholastic achievement. INDIVIDUAL VS. GROUP TESTS Individual Tests Group Tests One subject is tested at Many subjects are tested at a a time. time. Examiner records Subjects record own responses. responses. Scoring requires Scoring is straightforward and considerable skill. objective. Examiner flexibility can There are no safeguards. elicit maximum performance if permitted by standardization. ADVANTAGES & DISADVANTAGES OF GROUP TESTS ADVANTAGES DISADVANTAGES Large-scale or mass testing Less opportunity to establish Eliminate need for a one-to-one rapport, obtain cooperation, relationship and maintain the interest of the Greatly simplifying examiner’s client role Any temporary condition (e.g., More uniform conditions than in illness, fatigue, worry) is less individual testing readily detected Provision of better established Lack of flexibility (individual tests norms typically provide for the Testing of large, representative examiner to choose items on the samples in the standardization basis of the test-taker’s prior process is possible. performance) GROUP ABILITY TESTS: INTELLIGENCE, ACHIEVEMENT, AND APTITUDE Intelligence Tests: measure general ability; they also predict future performance, but predict generally and broadly. Achievement tests: assess what a person has learned. Aptitude tests: assess potential for learning. ACHIEVEMENT TESTS VERSUS APTITUDE TESTS Achievement Tests 1. Evaluate the effects of a known or controlled set of experiences. 2. Evaluate the product of a course of training. 3. Rely heavily on content validation procedures. Aptitude Tests 1. Evaluate the effects of an unknown, uncontrolled set of experiences. 2. Evaluate the potential to profit from a course of training. 3. Rely heavily on predictive criterion validation procedures. GROUP TESTS 1. Differential Aptitude Tests (DAT) 2. *Raven Progressive Matrices (RPM) 3. *IPAT Culture Fair Intelligence Test 4. Wonderlic Personnel Test (WPT) 5. *Purdue Non-Language Test (PNLT) 6. *Goodenough-Harris Drawing Test (G-HDT) 7. Kuhlmann-Anderson Test (KAT) 8. Henmon-Nelson Test (H-NT) 9. Cognitive Abilities Test (COGAT) *Nonverbal Group Ability Tests *** BASIC CONCEPTS IN TEST CONSTRUCTION AND INTERPRETATION: NORMS Psychology 31 Psychological Assessment There are various concepts that facilitate a test user’s interpretation of test scores. A raw score is meaningless without additional interpretive data. There is a need for a uniform frame of reference. NORMS Norms constitute the most common way by which psychological tests may be interpreted through the use of a reference. Norms represent the test performance of the standardization sample. NORMS, Cont’d… The norms are empirically established by determining what persons in a representative group actually do on the test. Any individual’s raw score is then referred to the distribution of scores obtained by the standardization sample, to discover where the person falls in the distribution. The raw score is converted into some relative measure. These derived scores have a dual purpose: 1. To indicate the individual’s relative standing in the normative sample, therefore permitting an evaluation of his or her performance in reference to other persons. 2. To provide comparable measures permitting a direct comparison of the individual’s performance on different tests. Ways that raw scores may be converted to fulfill the objectives: 1. Developmental level attained: called developmental norms - e.g., age and group norms 2. Relative position within a specified group: called within-group norms - e.g., percentile and standard score Main Types of Norms for Educational and Psychological Tests Type of Norm Type of Comparison Type of Group AGE NORMS Individual matched to Successive age group whose performance groups he or she equals GRADE NORMS Same as above Successive grade groups PERCENTILE NORMS Percent of group Single age or surpassed by grade group to individual which individual belongs STANDARD SCORE Number of standard Same as above NORMS deviations individual falls above or below average group STATISTICAL CONCEPTS Statistics is meant to organize and summarize quantitative (i.e., pertaining to a large sum) data in order to facilitate understanding. I. Descriptive Statistics A. Univariate Descriptive Statistics B. Bivariate Descriptive Statistics II. Inferential Statistics Univariate Descriptive Statistics Bivariate Descriptive Statistics: Correlation Most common: Pearson r – Studying the relationship between variable x and variable y: x y – Values: -1 to +1 – Absolute value: intensity of the relationship – Algebraic sign: direction of the relationship Inferential Statistics We use descriptive statistics to infer. Theoretical tools: – Normal curve – Sampling distribution – Null hypotheses (reject or accept null) – z distribution – t distribution – t-tests (correlated means or independent means) Inferential Statistics, Cont’d… Statistical tools that help us examine the relationship of one dependent variable and one or more independent variables: – ANOVA – Regression Statistical Methods (techniques of managing masses of data) do one of two things: 1. Describe a set of data [Descriptive Methods] - they point up a characteristic of the group being discussed. 2. Provide a basis for making generalization about a large group of individuals when only a selected portion of such a group has been observed [Inferential Procedures] - they allow us to make inferences about large numbers of individuals when only a small sample from the larger group has been observed. Four Classes of Measurement Scales NOMINAL SCALE – Merely assigning numbers to masses to identify them. E.g., plate numbers, numbers on athletes’ uniforms – Does not tell us about varying degrees of quality. Measurement Scales, Cont’d… ORDINAL SCALE – Larger scores reflect quantity or quality, but units along the scale are unequal in size. E.g., the order of winning a race (the difference in the distance between runner 1 and 2 is not necessarily equal to the difference in the distance between runner 2 and 3). E.g., a teacher-made math test (the difficulty level of one problem is not presumed to be equal to the difficulty of other problems) Measurement Scales, Cont’d… INTERVAL SCALE – Have equal units – Are more precise than ordinal scales, but we do not know where true zero is on the scale – E.g., Fahrenheit thermometer (zero does not mean and absence of heat) – E.g., standard scores on an intelligence test Measurement Scales, Cont’d… RATIO SCALE – Has equal units, but zero on the scale means an absence of the quality being assessed. – E.g., measures of length and weight Frequency Distributions One of the most fundamental techniques for putting order into a disarray of data. It is a systematic procedure for arranging individuals from least to most in relation to some quantifiable characteristic. Frequency Distributions, Cont’d… Constructed primarily for two reasons: 1. They put the data into order so that a visual analysis can be made of the results of the measurements that have been made. 2. They provide a convenient structure for simple computations. Graphic representations of data Histogram Frequency polygon Ogive (cumulative percentage data) – “lazy” S shape Measures of Central Tendency Because so many distributions cluster near the middle of the distribution: – We think of this central point as representing the typical characteristic of the group. – They identify that point in the distribution around which other scores seem to group. Measures of Central Tendency: Mode That scale value that occurs more frequently (than any other) in a distribution. Bimodal: two separate points compete to be designated the mode. Measures of Central Tendency: Median That point in the distribution that divides the total observations into two parts that are equal in number. 50% of the cases fall above it; 50% of the cases fall below it. Measures of Central Tendency: Mean Found simply by adding together the values of the quantities we have and then dividing this sum by the number of quantities that were so added. It is simply the arithmetic average. Measures of Variability Whereas measures of central tendency tell us what typical performance is for a group, measures of variability tell us how widely scores are dispersed around that central point. Other terms: dispersion, spread, scatter Most widely used term: Variability Measures of Variability: Range Its numerical value rests upon two, and only two, scores in a distribution: the most extreme scores. Simply computed by subtracting the smallest score from the largest and adding one point. HS – LS + 1 It establishes the amount of dispersion in a set of scores by noting how many scale points are included from the lowest value to the highest, inclusive. Measures of Variability: Standard Deviation The best indicator of dispersion of scores that begins with the mean, the most stable average from sample to sample. Distances of scores above and below this central point are then calculated. Specifically, it is the root-mean-squared deviation. Standard Deviation Formula Measures of Variability: Variance The mean-squared deviation σ2 = variance ∑ (X - µ)2 = The sum of (X - µ)2 for all datapoints X = individual data points µ = mean of the population N = number of data points Significance of Measures of Variability The magnitude of a given measurement or similar observation is not readily apparent when only a raw score is available. However, when a raw score can be shown in terms of its deviation from the mean, its magnitude begins to be evident. From raw score to standard score… In psychological testing, there is frequent need for changing raw-measurement data to some type of standard scale. Anytime we express a score in terms of its variation from the mean, with the standard deviation, or multiples of it, as the measure of variation, we have a standard score. Standard Scoring Methods include z scores and T scores z scores – Nothing more than a raw score converted to standard deviation units. – Since standard deviations are measured from the mean, z scores begin at that point and range up and down the scale. – A score that is one SD above the mean would have a z score of +1; if a raw score were a half SD below the mean, the z score would be -.5, and so on. – We equate raw scores to a scale with a mean of 0 and an SD of 1. Standard Scoring Methods, Cont’d… T scores – Take care of the inconvenience of working with a scale that has zero in the middle and half the scores are minus values. – The T-score scale places the mean raw score equal to a T score of 50 and equating the raw score SD to 10 T- score points. – Thus, a score one SD above the mean (z score of 1) would have a T score of 60, and a score, one SD below the mean (z score of -1) would have a T score of 40. Percentiles Can be expressed in terms of the percentage of persons in the standardization sample who fall below a given raw score. Can also be regarded as ranks in a group of 100, except that in ranking, the best person in the group gets a rank of one. With percentiles, the lower the percentile, the poorer the individual’s standing. Do not confuse percentiles with percentages. Percentages are raw scores expressed in terms of the percentage of correct items. Percentiles are derived scores expressed in terms of percentage of persons. *** Scales of Measurement Continuous scales – theoretically possible to divide any of the values of the scale. Typically having a wide range of possible values (e.g. height or a depression scale). Discrete scales – categorical values (e.g. male or female) Error – the collective influence of all of the factors on a test score beyond those specifically measured by the test Scales of Measurement (cont’d.) Nominal Scales - involve classification or categorization based on one or more distinguishing characteristics; all things measured must be placed into mutually exclusive and exhaustive categories (e.g. apples and oranges, DSM- IV diagnoses, etc.). Ordinal Scales – Involve classifications, like nominal scales but also allow rank ordering (e.g. Olympic medalists). Scales of Measurement (cont’d.) Interval Scales - contain equal intervals between numbers. Each unit on the scale is exactly equal to any other unit on the scale (e.g. IQ scores and most other psychological measures). Ratio Scales – Interval scales with a true zero point (e.g. height or reaction time). Psychological Measurement – Most psychological measures are truly ordinal but are treated as interval measures for statistical purposes. Describing Data Distributions - a set of test scores arrayed for recording or study. Raw Score - a straightforward, unmodified accounting of performance that is usually numerical. Frequency Distribution - all scores are listed alongside the number of times each score occurred Describing Data Frequency distributions may be in tabular form as in the example above. It is a simple frequency distribution (scores have not been grouped). Describing Data Grouped frequency distributions have class intervals rather than actual test scores Describing Data A histogram is a graph with vertical lines drawn at the true limits of each test score (or class interval), forming a series of contiguous rectangles Describing Data Bar graph - numbers indicative of frequency appear on the Y -axis, and reference to some categorization (e.g., yes/ no/ maybe, male/female) appears on the X -axis. Describing Data frequency polygon - test scores or class intervals (as indicated on the X - axis) meet frequencies (as indicated on the Y -axis). Types of Distributions Measures of Central Tendency Central tendency - a statistic that indicates the average or midmost score between the extreme scores in a distribution. Mean - Sum of the observations (or test scores), in this case divided by the number of observations. Median – The middle score in a distribution. Particularly useful when there are outliers, or extreme scores in a distribution. Mode – The most frequently occurring score in a distribution. When two scores occur with the highest frequency a distribution is said to be bimodal. Measures of Variability Variability is an indication of the degree to which scores are scattered or dispersed in a distribution. Distributions A and B have the same mean score but Distribution has greater variability in scores (scores are more spread out). Measures of Variability Measures of variability are statistics that describe the amount of variation in a distribution. Range - difference between the highest and the lowest scores. Interquartile range – difference between the third and first quartiles of a distribution. Semi-interquartile range – the interquartile range divided by 2 Average deviation – the average deviation of scores in a distribution from the mean. Variance - the arithmetic mean of the squares of the differences between the scores in a distribution and their mean Standard deviation – the square root of the average squared deviations about the mean. It is the square root of the variance. Typical distance of scores from the mean. Measures of Variability Skewness - the nature and extent to which symmetry is absent in a distribution. Positive skew - relatively few of the scores fall at the high end of the distribution. Negative skew – relatively few of the scores fall at the low end of the distribution. Kurtosis – the steepness of a distribution in its center. Platykurtic – relatively flat. Leptokurtic – relatively peaked. Mesokurtic – somewhere in the middle. The Normal Curve The normal curve is a bell-shaped, smooth, mathematically defined curve that is highest at its center. Perfectly symmetrical. Area Under the Normal Curve The normal curve can be conveniently divided into areas defined by units of standard deviations. Standard Scores A standard score is a raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set mean and standard deviation. Z-score - conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution. T scores - can be called a fifty plus or minus ten scale; that is, a scale with a mean set at 50 and a standard deviation set at 10 Stanine - a standard score with a mean of 5 and a standard deviation of approximately 2. Divided into nine units. Normalizing a distribution - involves “stretching” the skewed curve into the shape of a normal curve and creating a corresponding scale of standard scores Correlation and Inference A coefficient of correlation (or correlation coefficient) is a number that provides us with an index of the strength of the relationship between two things. Correlation coefficients vary in magnitude between -1 and +1. A correlation of 0 indicates no relationship between two variables. Positive correlations indicate that as one variable increases or decreases, the other variable follows suit. Negative correlations indicate that as one variable increases the other decreases. Correlation between variables does not imply causation but it does aid in prediction. Correlation and Inference Pearson r: A method of computing correlation when both variables are linearly related and continuous. Once a correlation coefficient is obtained, it needs to be checked for statistical significance (typically a probability level below.05). By squaring r, one is able to obtain a coefficient of determination, or the variance that the variables share with one another. Spearman Rho: A method for computing correlation, used primarily when sample sizes are small or the variables are ordinal in nature. Correlation and Inference Scatterplot – Involves simply plotting one variable on the X (horizontal) axis and the other on the Y (vertical) axis Scatterplots of no correlation (left) and moderate correlation (right) Correlation and Inference Scatterplots of strong correlations feature points tightly clustered together in a diagonal line. For positive correlations the line goes from bottom left to top right. Correlation and Inference Strong negative correlations form a tightly clustered diagonal line from top left to bottom right. Correlation and Inference Outlier – an extremely atypical point (case), lying relatively far away from the other points in a scatterplot Correlation and Inference Restriction of range leads to weaker correlations Meta-Analysis Meta-analysis allows researchers to look at the relationship between variables across many separate studies. Meta-analysis- a family of techniques to statistically combine information across studies to produce single estimates of the data under study. The estimates are in the form of effect size, which is often expressed as a correlation coefficient. PERSONALITY TESTING PSYCHOLOGICAL ASSESSMENT INTRODUCTION Tests of mental ability were created to distinguish those with subnormal mental abilities from those with normal abilities in order to enhance the education of both groups. However, it is not enough to know that a person is high or low in such factors as speed of calculation, memory, range of knowledge, and abstract thinking. To make full use of information about a person’s mental abilities, one must also know how that person uses those abilities. THE STUDY OF PERSONALITY The nonintellective aspects of human behavior, typically distinguished from mental abilities, are called personality characteristics. Personality is the relatively stable and distinctive patterns of behavior that characterize an individual and her or his reactions to the environment. STRUCTURED PERSONALITY TESTS Attempt to evaluate personality traits, personality types, personality states, and other aspects of personality, such as self-concept. Personality traits refer to relatively enduring dispositions—tendencies to act, think, or feel in a certain manner in any given circumstance and that distinguish one person from another. PERSONALITY TYPES AND PERSONALITY STATES Personality types - general descriptions of people For example, avoiding types have low social interest and low activity and cope by avoiding social situations. Personality states - emotional reactions that vary from one situation to another. SELF-CONCEPT Self-concept - a person’s self-definition, or, according to Rogers, an organized and relatively consistent set of assumptions that a person has about himself or herself. HISTORICAL DEVELOPMENT OF PERSONALITY TESTING Binet and others (Terman, Spearman, Thorndike) believed that a person’s pattern of intellectual functioning might reveal information about personality factors. However, specific tests of personality were not developed until World War I when there was a need to distinguish people on the basis of their emotional well-being. HISTORICAL DEVELOPMENT OF PERSONALITY TESTING, CONT’D… Psychologists used self-report questionnaires that provided a list of statements and required people to respond in some way, e.g., “True” or “False” to indicate whether the statement applied to them. SELF-REPORT: STRUCTURED PERSONALITY TESTS The general procedure in which the person is asked to respond to a written statement is known as the structured or objective method of personality assessment, as distinguished from the projective method. A clear and definite stimulus is provided and the requirements for responding are evident and specific. For example, to respond “yes” or “no” to the statement, “I am happy.” STRATEGIES FOR STRUCTURED PERSONALITY TEST CONSTRUCTION Like measures of mental ability, personality measures evolved through several phases. Deductive strategies comprise the logical-content and the theoretical approach. Empirical strategies comprise the criterion-group and the factor analysis method. Some procedures combine two or more of these strategies. STRATEGIES FOR STRUCTURED PERSONALITY TEST CONSTRUCTION DEDUCTIVE APPROACH TO CONSTRUCTING PERSONALITY TESTS Deductive strategies use reason and deductive logic to determine the meaning of a test response. The logical-content method has designers select items on the basis of simple face validity. In the theoretical approach, test construction is guided by a particular psychological theory. EMPIRICAL APPROACH Empirical strategies rely on data collection and statistical analysis to determine the meaning of a test response or the nature of personality. These strategies retain the self-report features of the deductive strategies in that persons are asked to respond to items that describe their own views, opinions, and feelings. However, empirical strategies use experimental research to determine empirically the meaning of a test response, the major dimensions of personality, or both. EMPIRICAL APPROACH, CONT’D… In the criterion-group approach, test designers choose items to distinguish a group of individuals with certain characteristics, the criterion group, from a control group. The factor analytic approach uses the statistical technique of factor analysis to determine the meaning of test items. All available structured personality tests can be classified according to whether they use one or some combination of the four strategies: Logical-content Theoretical Criterion-group Factor analytic THE LOGICAL-CONTENT STRATEGY The first personality test ever developed was the Woodworth Personal Data Sheet (1920), based on the logical-content strategy. It was developed during World War I and published at the end of the war. Its purpose was to identify military recruits who would likely break down in combat. WOODWORTH QUESTIONS: Items were selected from lists of known symptoms of emotional disorders and from questions asked by psychiatrists in a screening interview. Final form contained 116 items “Do you wet the bed at night?” “Do you usually feel in good health?” “Do you frequently daydream?” “Do you usually sleep soundly at night?” A single score provided a global measure of functioning. OTHER TESTS USING LOGICAL- CONTENT STRATEGY Bell Adjustment Inventory Evaluated person’s adjustment in areas such as home life, social life, and emotional functioning. Bernreuter Personality Inventory included items related to six personality traits including introversion, confidence, and sociability. Mooney Problem Checklist (1950) Lists problems that recur in clinical case history data and in written statements of problems listed by 4000 high school students (U.S.). THE CRITERION-GROUP STRATEGY Main idea: assume nothing about the meaning of a person’s response to a test item. Minnesota Multiphasic Personality Inventory by Hathaway and McKinley Minnesota Multiphasic Personality Inventory 2 (1989) Sample statements: “I like good food.” ‘I never have trouble falling asleep.” Raw scores are converted to T scores. MMPI AND MMPI 2 Contains a validity scale that provides information on the person’s approach to testing “fake bad” – endorsing more items of pathological content than any person’s actual problems could justify. “fake good” – avoiding pathological items. Like the Woodworth, the purpose of MMPI & MMPI 2 is to assist in distinguishing normal from abnormal groups. MMPI AND MMPI 2 University of Minnesota Hospital patients (n=800) divided into eight groups according to psychiatric diagnosis, and compared with controls (n=700) composed of relatives and visitors of the patients. Final criterion groups: Hypochondriacs Depressed patients Hysterics Psychopathic deviates Paranoids Psychasthenics Schizophrenics Hypomanics ORIGINAL VALIDITY SCALES OF THE MMPI Lie scale (L) 15 rationally derived items to evaluate naïve attempt to present oneself in a favorable light. People who score high on L are unwilling to acknowledge minor flaws (weaknesses). Example: “I never lose control of myself when I drive.” Infrequency scale (F) Items that are scored infrequently (less than 10%) by the normal population. High scores invalidate the profile. Example: “I am aware of a special presence that others cannot perceive.” K scale 30 items that detect attempt to deny problems and present oneself in a favorable light. Individuals attempt to project an image of self-control and personal effectiveness. MMPI AND MMPI 2 Symbol currently in Old name Number of Common interpretation of use items in scale elevation Validity Scales L Lie scale 13 Naïve attempt to fake good F K scale 30 Defensiveness K F scale 64 Attempt to fake bad Clinical Scales 1 Hypochondriasis 33 Physical complaints 2 Depression 60 Depression 3 Hysteria 60 Immaturity 4 Psychopathic deviate 50 Authority conflict 5 Masculinity-femininity 60 Masculine or feminine interests 6 Paranoia 40 Suspicion, hostility 7 Psychasthenia 48 Anxiety 8 Schizophrenia 78 Alienation, withdrawal 9 Hypomania 46 Elated mood, high energy 0 Social introversion 70 Introversion, shyness CALIFORNIA PSYCHOLOGICAL INVENTORY (CPI, 3RD EDITION) The CPI (1987) is a second example of a structured personality test constructed primarily by the criterion-group strategy. For 3 of the 36 CPI scales, criterion groups (men vs. women, homosexual vs. heterosexual men) were contrasted to produce measures of personality categorized as 1) introversion-extroversion, 2) conventional vs. unconventional, and 3) self- realization and sense of integration. In contrast to MMPI and MMPI 2, the CPI attempts to evaluate personality in normally adjusted individuals. CALIFORNIA PSYCHOLOGICAL INVENTORY (CPI, 3RD EDITION) 20 scales each if which is grouped into one of four classes. Class I: poise, self-assurance, interpersonal effectiveness Class II: socialization, maturity, responsibility Class III: achievement potential, intellectual efficiency Class IV: interest modes 13 scales are designed for special purposes: managerial potential, tough-mindedness, creativity THE FACTOR ANALYTIC STRATEGY Structured personality tests share one common set of assumptions: Humans possess characteristics or traits that are static, vary from individual to individual, and can be measured. Nowhere are these assumptions better illustrated than in factor analytic strategy GUILFORD’S PIONEERING EFFORTS J.R. Guilford determined the interrelationship (intercorrelation) of a wide variety of tests and factor analyzed the results to find the main dimensions underlying all personality tests. Came up with the Guilford-Zimmerman Temperament Survey (1956) 10 dimensions with 30 items each. GUILFORD-ZIMMERMAN TEMPERAMENT SURVEY DIMENSIONS General activity Restraint Ascendance (leadership) Sociability Emotional stability Objectivity Friendliness Thoughtfulness Personal relations Masculinity CATTELL’S CONTRIBUTION R.B. Cattell began with all adjectives applicable to human beings to determine the essence of personality. Allport and Odbert (1936) reduced an adjective list from a dictionary to 4504 traits. Cattell added to this list traits found in psychological literature, and reduced the list to 171 items. College students then rated their friends on the 171 traits and the results were factor analyzed. The 171 were reduced to 36 dimensions, called surface traits. Subsequent investigation by factor analysis produced 16 distinct factors which Cattell called source traits. In subsequent factor analysis, items that correlated highly with each of the 16 source traits were included and those with low correlations, excluded. SIXTEEN PERSONALITY FACTOR QUESTIONNAIRE (1972) Other parallel inventories developed: High School Personality Questionnaire Children’s Personality Questionnaire Clinical Analysis Questionnaire (CAQ) – for use with clinical populations THE THEORETICAL STRATEGY Items are selected to measure the variables or constructs specified by a major theory of personality. These questionnaires were based on Murray’s need theory: Edwards Personal Preference Schedule (1954) Personality Research Form (PRF) (1967) Jackson Personality Inventory (JPI) (1976) EDWARDS PERSONAL PREFERENCE SCHEDULE (EPPS) Based on the need system proposed by Alexander Murray (1936). Edwards selected 15 needs from Murray’s theory and constructed items with content validity for each. Edwards included a consistency scale to check on validity of EPPS results. 15 pairs of statements are repeated in identical form. TRAIT DESCRIPTIONS FOR THE JACKSON PERSONALITY INVENTORY Scale Trait Trait Description Anxiety Tendency to worry over minor matters Breadth of interest Curiosity; inquisitiveness Complexity Preference for abstract versus concrete thought Conformity Compliance; cooperativeness Energy level Energy; enthusiasm Innovation Originality; imagination Interpersonal affect Ability to identify with others Organization Playfulness; systematic versus disorganized Responsibility Responsibility; dependability Risk taking Reckless and bold versus cautious and hesitant Self-esteem Self-assured versus self-conscious Social adroitness Skill in persuading others Social participation Sociable & gregarious versus withdrawn and a loner Tolerance Broad-minded and open versus intolerant and uncompromising Value orthodoxy Moralistic &conventional versus modern and liberal Infrequency Validity of profile SELF-CONCEPT The set of assumptions individuals have about themselves. Q-sort Technique is based on Rogers’s theory of the self. Set of cards with self-statements are sorted into two: The first describes who the person really is (real self) The second describes what the person believes he or she should be (ideal self) Rogers’s theory predicts that large discrepancies between the real and ideal selves reflect poor adjustment and low self-esteem. COMBINATION STRATEGIES The modern trend is to use a mix of strategies for developing structured personality tests. Indeed, most of the personality tests use factor analysis regardless of their main strategy. NEO Personality Inventories is a good example of a test of personality characteristics that relies on a combination of strategies in scale development. POSITIVE PERSONALITY MEASUREMENT Early history of personality measurement focused on negative characteristics such as anxiety, depression, and other manifestations of psychopathology. Research suggests advantages in evaluating individuals’ positive characteristics to understand individual resources. Kobasa (1979) studied “hardiness” Bandura (1986) studied “self-efficacy” – strong belief in the ability to organize resources and manage situations. POSITIVE PERSONALITY MEASUREMENT AND THE NEO PERSONALITY INVENTORY-REVISED (NEO-PI-R) The developers of NEO-PI-R (Costa & McCrae, 1985) used both factor analysis and theory in item development and scale construction. A multipurpose inventory for predicting interests, health and illness behavior, psychological well- being, and characteristic coping styles. Based on review of factor analytic studies and personality theory, the authors identified 3 broad domains: neuroticism (N), extroversion (E), and openness (O). Each domain has six facets. FACETS OF EACH DOMAIN IN THE NEO-PI-R Neuroticism (N): anxiety, hostility, depression, self-consciousness, impulsiveness, and vulnerability. Extroversion (E): warmth, gregariousness, assertiveness, activity, excitement seeking, and positive emotions. Openness (O): fantasy, aesthetics, feelings (openness to feelings of self and others), actions (willingness to try new activities), ideas (intellectual curiosity), and values. THE FIVE-FACTOR MODEL OF PERSONALITY Research with the NEO has supported five dimensions (considered the minimum number of dimensions to describe the human personality): 1. Extroversion 2. Neuroticism 3. Conscientiousness 4. Agreeableness 5. Openness to experience THE FIVE-FACTOR MODEL OF PERSONALITY 1. Extroversion – the degree to which a person is sociable, leader-like, and assertive as opposed to withdrawn, quiet, and reserved. 2. Neuroticism – the degree to which a person is anxious and insecure as opposed to calm and self-confident. 3. Conscientiousness – the degree to which a person is persevering, responsible, and organized as opposed to lazy, irresponsible, and impulsive. 4. Agreeableness – the degree to which a person is warm and cooperative as opposed to unpleasant and disagreeable. 5. Openness to experience – the degree to which a person is imaginative and curious as opposed to concrete-minded and narrow in thinking. FREQUENTLY USED MEASURES OF POSITIVE PERSONALITY TRAITS Rosenberg Self-Esteem Scale General Self-Efficacy Scale Ego Resiliency Scale Dispositional Resilience Scale Hope Scale Life Orientation Test-Revised Satisfaction with Life Scale Positive and Negative Affect Schedule Coping Intervention for Stressful Situations Core Self-Evaluations