Psychological Testing and Assessment PDF

Chapter 1: Psychological Testing and The assessor is key to the process. Assessment Testing and Assessment It requires careful tool selection, evaluation skills, The roots of contemporary p...

Chapter Assessment Testing and Assessment The roots of contemporary psychological testing and assessment can be found in early France. 1905: Alfred Binet and a colleague published a test designed to help place Paris schoolchildren appropriate classes. - The test aimed to identify intellectually disabled children. Within a decade, an English version of Binet's test was created for US schools. In 1917, as the US entered WWI, the military used psychological testing to quickly screen recruits for intellectual and emotional issues. During WWII, the military depended even more on psychological tests to screen recruits for service. William Stern developed a refined method of scoring Binet’s test, the Intelligence Quotient (IQ). During WWI, the term “testing” described the group screening of thousands of military recruits. By WWII, a semantic distinction between testing and a more inclusive term, “assessment,” began to emerge. Psychological assessment: The gathering and integration of psychology-related data to make a psychological evaluation that is accomplished through the use of various tools. Psychological testing: The process of measuring psychology-related variables using devices or procedures designed to obtain a sample of behavior. Testing To obtain some gauge, usually numerical, about an ability or attribute. It can be done individually or in groups, results based on the number of correct The tester is not key to the process. It requires technician-like skills in administering and scoring as well as in interpreting a test result. It yields a test score or series of test scores. Assessment To answer a referral question, solve a problem, or decide on the use of tools of evaluation. It is individualized, focusing on how a person processes. Interview: A method of gathering information through direct communication involving reciprocal exchange. - Panel interview (or board interview): More than one interviewer participates in the assessment. - Motivational interviewing: A therapeutic dialogue that blends person-centered listening with cognition-altering techniques to enhance motivation and facilitate change. Portfolio: Samples of one’s ability and accomplishment. Case History Data: Records and transcripts in various forms that preserve archival information and data relevant to an assessee. - Case study (or case history): A report or illustrative account concerning a person or an event that was compiled based on the case history data. - Groupthink: The result of varied forces that drive decision-makers consensus. Behavioral observation: Monitoring actions through visual or electronic means while recording quantitative and qualitative data. - Natural observation: Observing in a setting in which the behavior would typically be expected to occur. Role-Play Test: Assessees are directed to act as if they were in a particular situation. Computer: - Local processing: on-site - Central processing: at some central location - Teleprocessing: phone lines - Simple scoring report: A mere listing of a score or scores. - Extended scoring report: statistical analyses of the test-taker’s performance. - Interpretive report: Includes numerical or narrative interpretive statements in the report. - Consultative report: Written in professional language to provide expert analysis of the data. - Integrative report: Employ previously collected data into the test report. - CAT (computer adaptive testing): The computer’s ability to tailor the test to the test-taker’s ability or test-taking pattern. - CAPA (computer-assisted psychological assessment): The assistance computers provide to the test user. Other tools: video and physiological devices Who, What, Why, How, and Where? - Governmental licensing, certification, or general credentialing of professionals. Academic research setting: - It is essential for measurement, and researchers should have a strong understanding of measurement principles and assessment tools. Other setting: court, program evaluation, health psychology - Health psychology: Focuses on the impact of psychological variables on the onset, course, treatment, and prevention of illness and disability. How Are Assessments Conducted? - Protocol: The form, sheet, or booklet on which a test-taker’s responses are entered. - Rapport: A working relationship between the examiner and the examinee. - Accommodation: The adaption of a test, procedure, or situation, or the substitution of one test for another, to make the assessment more suitable for an assessee with exceptional needs. - Alternative assessment: An evaluative procedure that deviates from standard measurement methods, either through special accommodations or alternative techniques to assess the same variables. Where To Go for Authoritative Information: Reference Sources Test catalogs: Contain only a brief description of the test and seldom contain the kind of detailed information that a prospective user might require. - The objective is to only sell the test. Test manuals: Detailed information concerning the development of a particular test and technical information relating to it. Professional books: An in-depth discussion of a test that offers assessment students and professionals insights and actionable knowledge from experienced practitioners. Reference volumes: Provides comprehensive information on assessment principles, tools, methodologies, and best practices for various contexts. Journal articles: Contain reviews of the test, updated or independent studies of its psychometric soundness, or examples of how the instrument was used in either research or an applied context. Online databases: Offers access to research articles, assessments, measurement tools, and relevant literature on assessment methodologies and practices. Other sources: Unpublished tests and measures, and university libraries - After the war, he developed a personality test for civilian use called the Woodworth Psychoneurotic Inventory, a self-report test. - Self-report: Assessees assessment-related information by responding to questions, keeping a diary, or self-monitoring thoughts or behaviors. Projective test: An individual is assumed to “project” onto some ambiguous stimulus his/her own unique needs, fears, hopes, and motivation. - The best-known of all projective tests is the Rorschach, a series of inkblots developed by Hermann Rorschach. - The use of pictures as projective stimuli was popularized in the late 1930s by Henry Murray and Christiana Morgan. Academic tradition: Researchers at universities throughout the world use the tools of assessment to help advance knowledge and understanding of human and animal behavior. Applied tradition: Help select applicants for various positions based on merit. Culture and Assessment Early psychological testing of immigrant populations by Henry Goddard was controversial; he found that the majority of the immigrant population was “feeble-minded”. - His findings stemmed from a translated Stanford-Binet intelligence test that overestimated mental deficiency in native English speakers and immigrants. Culture-specific tests: Designed for use with people from one culture but not from another. Affirmative action: Efforts by governments, employers, and schools to combat discrimination and promote equal opportunity in education and employment. Legal and Ethical Considerations Code of professional ethics: A set of guidelines and principles that govern the conduct of professionals within a specific field. - Standard of care: The level at which a reasonable professional provides diagnostic or therapeutic services under similar conditions. Minimum competency testing programs: Formal testing programs are designed to be used in decisions regarding various aspects of students’ education. Truth-in-testing legislation - The objective was to provide test-takers with a means of learning the which they are being judged. Chapter 3: A Statistics Refresher Scales of Measurement Measurement: The act of assigning numbers or symbols to characteristics of things according to rules. Scale: A set of numbers (or other symbols) whose properties model the empirical properties of the objects to which the numbers are assigned. - Continuous: It has infinite values within a range, allowing precise quantification of smoothly varying variables. - Discrete: Consists of distinct, separate values or categories, used for counting variables that cannot have fractional values. Error: The collective influence of all of the factors on a test score or measurement beyond those specifically measured by the test or measurement. Nominal Scales: Involve classification or categorization based on 1 or more distinguishing characteristics. Ordinal Scales: Rank-ordering characteristics is also permissible. Interval Scales: Contains equal intervals and has no absolute zero point. Ratio Scales: Has true zero point. Describing Data Distribution: A set of test scores arrayed for recording or study. Raw score: A straightforward, unmodified accounting of performance that is usually numerical. Frequency distribution: All scores are listed alongside the number of times each score occurred. - Simple frequency distribution: Indicates that individual scores have been used and the data have not been grouped. - Grouped frequency distribution: Test score intervals (or class intervals) replace the actual test scores. Graph: A diagram or chart composed of lines, points, bars, or other symbols that describe and illustrate data. - Histogram: It has vertical lines drawn at the true limits of each test score, forming a series of contiguous rectangles. - Abscissa: X axis - Ordinate: Y axis - Bar graph: Numbers indicative of frequency appear on the Y-axis, and reference to some categorization appears on the X-axis. - Frequency polygon: Expressed by a continuous line connecting the points where test scores meet frequencies. Karl Pearson: The first to refer to the curve as the normal curve. Normal curve: A bell-shaped, smooth, and mathematically defined curve that is highest at its center. Tail: The area on the normal curve between 2 and 3 standard deviations above the mean. Standard Score: A raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set mean and standard deviation. - Z score: Results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution. - T scores: (Also called fifty plus or minus ten scale) A scale with a mean set at 50 and a standard deviation set at 10. - Devised by W.A. McCall. - Stanine: A contraction of the words standard and nine. - Linear transformation: Retains a direct numerical relationship to the original raw score. - Nonlinear transformation: Required when the data under consideration are not normally distributed yet comparisons with normal distributions need to be made. Normalizing a distribution: Involves “stretching” the skewed curve into the shape of a normal curve. - Normalized standard score scale: A corresponding scale of standard scores. Correlation and Inference Coefficient of correlation (or correlation coefficient): Provides an index of the strength of the relationship between two things. Correlation: An expression of the degree and direction of correspondence between two things. Pearson r: Used when two variables being correlated are continuous and linear. - Devised by Karl Pearson. - Coefficient of determination: An indication of how much variance is shared by the X- and the Y-variables. Spearman Rho: Frequently used when the sample size is small (fewer than 30 pairs of measurements) and when both sets of measurements are in ordinal form. - Developed by Charles Spearman. Scatterplot: A simple graphing of the coordinate points for the X-variable and Y-variable values. - Curvilinearity: An “eyeball gauge” of how curved a graph is. - Outlier: An extremely atypical at a relatively long distance—an - Race norming: The controversial practice of norming based of race or ethnic background. User norms or program norms: Consist of descriptive statistics based on a group of test-takers in a given period of time rather than norms obtained by formal sampling methods. Standardization or test standardization: The process of administering a test to a representative sample of test-takers for the purpose of establishing norms. Sample: A portion of the universe of people deemed to be representative of the whole population. Sampling: The process of selecting the portion of the universe deemed to be representative of the whole population. - Stratified sampling: A method where a population is divided into subgroups (strata) based on specific characteristics, and samples are taken from each stratum to ensure representation. - Stratified-random sampling: If such sampling were random. - Purposive sampling: If we arbitrarily select some sample because we believe it to be representative of the population. - Incidental or convenience sample: One that is convenient or available for use. Types of Norms Percentile: An expression of the percentage of people whose score on a test or measure falls below a particular raw score. - Percentage correct: The distribution of raw scores—to the number of items that were answered correctly multiplied by 100 and divided by the total number of items. Age norms (or age-equivalent scores): The average performance of different samples of test-takers who were at various ages at the time the test was administered. Grade norms: The average test performance of test-takers in a given school grade. - Developmental norms: Norms developed based of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life. National norms: Derived from a normative sample that was nationally representative of the population at the time the norming study was conducted. National anchor norms: An equivalency table for scores on the two tests. - Equipercentile method: The equivalency of scores on different tests is calculated with Test-retest reliability: An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test. - Coefficient of reliability: The estimate of test-retest reliability. Coefficient of equivalence: The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability. Parallel forms: For each form of the test, the means and the variances of observed test scores are equal. - Parallel forms reliability: An estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal. Alternate forms: Simply different versions of a test that have been constructed so as to be parallel. - Alternate forms reliability: An estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error. Internal consistency estimate of reliability or estimate of inter-item consistency: An estimate of the reliability of a test that can be obtained without developing an alternate form of the test and without having to administer the test twice to the same people. Split-half reliability: Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. - Odd-even reliability: An estimate of split-half reliability. - Spearman-Brown formula: Estimate internal consistency reliability from a correlation between two halves of a test. Inter-item consistency: The degree of correlation among all the items on a scale. Coefficient alpha: The mean of all possible split-half correlations. - Developed by Cronbach. Inter-scorer reliability: The degree of agreement or consistency between 2 or more scorers with regard to a particular measure. - Coefficient of inter-scorer reliability: Measures the consistency of scores given by different evaluators. The Nature of the Test Homogeneity vs. heterogeneity of test items - Homogeneous: A test that is functionally uniform throughout. between an individual’s latent traits and their probability of answering items correctly on a test. - Also called latent-trait theory. - Discrimination: The degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured. - Dichotomous test items: Test items or questions that can be answered with only one of two alternative responses. - Polytomous test items: Test items or questions with 3 or more alternative responses. - Rasch model: It estimates the probability of a correct response based on the difficulty of an item and the ability of the respondent. Standard error of measurement: A tool used to estimate or infer the extent to which an observed test score deviates from a true score. - Standard error of a score: An index of the extent to which one individual’s scores vary over tests presumed to be parallel. - Confidence interval: A range or band of test scores that is likely to contain the true score. - Standard error of the difference: A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant. Chapter 6: Validity Validity: A judgment or estimate of how well a test measures what it purports to measure in a particular context. Inference: A logical result or deduction. Validation: The process of gathering and evaluating evidence about validity. Validation studies: Assess the accuracy and reliability of a test or measurement tool by examining how well it measures what it is intended to measure. - Local validation studies: Conducted in a specific setting or population to confirm that a test is valid and reliable group, as test performance can vary based on factors like culture, language, or demographics. Face validity: A judgment concerning how relevant the test items appear to be. Content validity: A judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample. - Test blueprint: A plan regarding the types of information to be covered by the items, the number of items tapping each area of Construct validity: A judgment about the appropriateness of inferences drawn from the test scores regarding individual standings on a variable called a construct. - Construct: An informed, scientific idea developed or hypothesized to describe or explain behavior. Evidence of Construct Validity Evidence of homogeneity - Homogeneity: How uniform a test is in measuring a single concept. Evidence of changes with age Evidence of pretest—post-test changes Evidence from distinct groups - Method of contrasted Demonstrate that scores on the test vary predictably as a function of membership in some groups. Convergent evidence: If scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same construct. Discriminant evidence: A validity coefficient showing little relationship between test scores and/or other variables with which scores on the test being construct-validated should not theoretically be correlated. - Multitrait-multimethod matrix: The matrix or table that results from correlating }|variables within and between methods. - Convergent validity: The correlation between measures of the same trait but different methods. - Discriminant validity: The correlations of different traits via different methods are near zero. - Method variance: The similarity in scores due to using of the same method. Factor analysis: A class of mathematical procedures designed to identify factors or specific variables that are typically attributes, characteristics, or dimensions on which people may differ. - Exploratory factor analysis: This entails estimating or extracting factors; deciding how many factors to retain; and rotating factors to an interpretable orientation. - Confirmatory factor analysis: Test the degree to which a hypothetical model fits the data. - Factor loading: Conveys information about how much the factor determines the test score or scores. Bias: A factor inherent in a test that systematically prevents accurate, impartial measurement. assessment) is adding to already established procedures. - Brogden-Cronbach-Gleser formula: Used to calculate the dollar amount of utility gain resulting from the use of a particular selection instrument under conditions. - Utility gain: An estimate of the benefit (monetary or otherwise) of using a particular test or selection method. - Productivity gain: An estimated increase in work output. - Decision theory and test utility Some practical considerations - The pool of job applicants - The complexity of the job - The cut score in use - Relative cut score: A reference point that is set based on norm-related considerations rather than on the relationship of test scores to a criterion. - Norm-referenced cut score: It is set with reference performance of a group (or some target segment of a group). - Fixed cut score: A reference point that is typically set with reference to a judgment concerning a minimum level of proficiency required to be included in a particular classification. - Multiple cut scores: The use of 2 or more cut scores with reference to one predictor for the purpose of categorizing test-takers. - Compensatory model of selection: An assumption is made that high scores on one attribute can “balance out” or compensate for low scores on another attribute. Methods for cutting scores - Angoff method: Devised by William Angoff, this method used for setting fixed cut scores can be applied to personnel selection tasks as well as to questions regarding the presence or absence of a particular trait, attribute, or ability. - Known groups method: Entails collection of data on the predictor of interest from groups known to process, and not possess, a trait, attribute, or ability of interest. - IRT-based methods - Item-mapping method: It entails the arrangement of items in a histogram, with each - Stanine scale: If all raw scores on the test are to be transformed into scores that can range from 1 to 9. Scaling methods - Summative scale: The final score is obtained by summing ratings across all the items. - Likert scale: It is used in surveys to measure attitudes or opinions, where respondents indicate their level of agreement or disagreement with a series of statements, typically on a 5- or 7-point scale. - Method of paired comparisons: Test-takers are presented with pairs of stimuli, which they asked to compare. - Sorting - Comparative scaling: It entails judgments of a stimulus in comparison with every stimulus on the scale. - Categorical scaling: Stimuli are placed into one of 2 or more alternative categories that differ quantitatively with respect to some continuum. - Guttman scale: Items on it range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured. - Scalogram analysis: item-analysis procedure approach to test development that involves a graphic mapping of a test-taker’s responses. Item format: The form, plan, structure, arrangement, and layout of individual test items. - Selected-response format: test-takers to select a response from a set of alternative responses. - Multiple-choice format: Has 3 elements: (1) a stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options variously referred to as distractors or foils. - Matching item: The test-taker is presented with two premises on the left and responses to the right. - Binary-choice item: multiple-choice item that contains only two possible responses. - True-false item: A type of selected-response usually takes of the percent of people who said yes to, agreed with, or otherwise endorsed the item. Item-reliability index: Provides an indication of the internal consistency of a test; the higher the index, the greater the test’s internal consistency. Item-validity index: A statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure. Item-discrimination index: performance on a particular item with performance in the upper and lower regions of a distribution of continuous test scores. Item-characteristic curve: A representation of item difficulty and discrimination. Other Considerations in Item Analysis - Guessing - Item fairness: The degree, if any, a test item is biased. - Biased test item: An item that favors one particular group of examinees in relation to another when differences in group ability are controlled. - Speed tests Qualitative item analysis: A general term for various nonstatistical procedures designed to explore how individual test items work. - “Think aloud” test administration: A qualitative research tool designed to shed light on the test-taker’s thought processes during the administration of a test. - Expert panels - Sensitivity review: A study of test items, typically conducted during the test development process, in which items are examined for fairness to all prospective test-takers and for the presence of offensive language, stereotypes, or situations. Test Revision Cross-validation: The revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion. Validity shrinkage: The decrease in item validities that inevitably occurs after cross-validation of findings. Co-validation: A test validation process conducted on 2 or more tests using the same sample of test-takers. Co-norming: When used in conjunction with the creation of norms or the revision of existing norms. Anchor protocol: A test protocol scored by a highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies. to think rationally, and to deal effectively with his environment. For Piaget, intelligence may be conceived of as a kind of evolving biological adaptation to the outside world. Factor-analytic theories: The focus is squarely on identifying the ability or groups of abilities deemed to constitute intelligence. Information-processing theories: The focus is on identifying the specific mental processes that occur when intelligence is applied to solving a problem. Factor-Analytic Theories of Intelligence As early as 1904, the British psychologist Charles Spearman pioneered new techniques to measure intercorrelations between tests. - Two-factor theory of intelligence - General factor g - Specific ability s - Measurement error e - Group factors: An intermediate class of factors common to a group of activities but not to all. Gardner developed a theory of multiple intelligence: - Logical-mathematical - Bodily-kinesthetic - Linguistic - Musical - Spatial - Interpersonal: The ability to understand other people: what motivates them, how they work, how to work cooperatively with them. - Successful sales people, politicians, teachers, clinicians, and religious leaders. - Intrapersonal: A capacity to form an accurate, veridical model of oneself and to be able to use that model to operate effectively in life. Through Gardner’s intrapersonal and interpersonal intelligence, Mayer and his colleagues proposed emotional intelligence. - They hypothesize the existence of specific brain modules that allow people to perceive, understand, use, and manage emotions intelligently. Raymond B. Cattell - General fluid intelligence (Gf): Its function is to identify novel patterns, solve unfamiliar problems, and acquire new knowledge. - General crystallized intelligence (Gc): A repository of knowledge and skills that have proved useful in solving problems in the past. Horn - Visual processing (Gv) - It was the first published intelligence test to provide organized and detailed administration and scoring instructions. - It was also the first American test to employ the concept of IQ. - It was the first test to introduce the concept of an alternate item, an item to be substituted for a regular item under specified conditions. - Revisions: - Innovations in the 1937 scale included the development of two equivalent forms, labeled L (for Lewis) and M (for Maud), as well as new types of tasks for use with pre-school level and adult-level test-takers. - The use of deviation IQ tables in place of the ratio IQ tables. - Ratio IQ was based on the concept of mental age (the age level at which an individual appears to be functioning intellectually as indicated by the level of items responded to correctly. - 𝑟𝑎𝑡𝑖𝑜 - The deviation IQ comparison of performance of the with the performance of others of the same standardization sample. - The third revision was criticized the manual was vague about the number of racially, socioeconomically, or diverse individuals standardization sample, stating that a “substantial portion” and Spanish-surnamed individuals was included. - The fourth edition employed scale, a test organized into by category of item, not scale) at which most test-takers presumed capable of responding in the way that is keyed as correct. - Test composite: A test score or index derived from the combination of, mathematical transformatio of, one or more subtest scores. - The fifth edition was designed for administration to assessees as 130-144 Gifted or very advanced 120-129 Superior 110-119 High average 90-109 Average 80-89 Low average 70-79 Borderline impaired or delayed 55-69 Mildly impaired or delayed 40-54 Moderately impaired or delayed The Wechsler Tests In the early 1930s, psychologist David Wechsler’s employer, Bellevue Hospital in Manhattan, needed an instrument for evaluating the intellectual capacity of its multilingual, multinational, and multicultural clients. Core subtest: One that is administered to obtain a composite score. Supplemental subtest (or optional subtest): It is used for purposes such as providing additional clinical information or extending the number of abilities or processes sampled. The WAIS-IV contains 10 core subtests: - Block Design - Similarities - Digit Span - Matrix Reasoning - Vocabulary - Arithmetic - Symbol Search - Visual Puzzles: The assessee’s task is to identify the parts that went into making a stimulus design. - Information - Coding And 5 supplemental subtests: - Letter-Number Sequencing - Figure Weights: The assessee’s task is to determine what needs to be added to balance a two-sided scale—one that is reminiscent of the “blind justice” type of scale. - Comprehension - Cancellation: A timed subtest used in calculating the Processing Speed Index, the assessee’s task is to draw lines through targeted pairs of colored shapes. Culture-fair intelligence test: A test or assessment process designed to minimize the influence of culture with regard to various aspects of the evaluation procedures, such administration instructions, item content, responses required of test-takers, and interpretations made from the resulting data. Flynn effect: The progressive rise in intelligence test scores that is expected to occur on a normed test of intelligence from the date when the test was first normed (by James R. Flynn). Chapter 10: Assessment for Education Achievement Tests Achievement tests: Designed to measure accomplishment/past learning. - Designed to measure the degree of learning that has taken place as a result of exposure to a relatively defined experience. - Adequately samples the targeted subject matter and reliably gauges the extent to which the examinees have learned it. - Help in making decisions about placements, gauging the quality of instruction in particular institution, screen for difficulties, etc. Wechsler Individual Achievement Test–III Edition (WIAT-III): Designed for use in the schools as well as clinical and research settings. - Potential to yield actionable data relating to student achievement in academic areas such as reading, writing, etc. The test most appropriate for use is the one most consistent with the educational objectives of the individual teacher or school system. Curriculum-based Assessment: Used to refer to assessment of information acquired from teaching at school. Curriculum-based Measurement: Characterized by the use of standardized measurement procedures to derive local norms to be used in the evaluation of student performance. Different types of achievement test: - Fact-based items: One that draws primarily on facts and how to apply those facts. - Conceptual items: Designed to measure mastery of the material. Raven Progressive Matrices (RPM): One of the best-known and most popular nonverbal group tests. - Suitable test anytime one needs an estimate of an individual’s general intelligence. - Original RPM has 60 items, which were believed to be of increasing difficulty. Gesell Developmental Schedule: One of the oldest and the most established infant intelligence measures. - Gesell Maturity Scale, Developmental Observation, Yale Tests of Child Development - Provide an appraisal of the developmental status of the children from 2.3 months to 6.3 years of age. - Developmental quotient: Determined by a test score, evaluated by assessing the presence and absence of behavior associated with maturation. Bayley Scales of Infant and Toddler Development: Assesses on maturational developmental data, designed for infants between 1 and 42 months old and assesses development across 5 domains: cognitive, language, motor, socioemotional, and adaptive. Cattell Infant Intelligence Scale: Designed as a downward extension of Stanford-Binet Scale for infants and preschoolers between 2 and 30 months of age, measure intelligence of infants and young children. Psychoeducational Batteries Psychoeducational test batteries: Generally contain two types: those that measure abilities related to academic success and those that measure educational achievement. Kaufman Assessment Battery for Children (K-ABC): Designed for ages 2½ through 12½ that measures both intelligence and achievement. - KABC-II was published in 2004 and the age range was extended up to 18 years old. Woodcock-Johnson IV (WJ IV): Consisting three co-normed batteries: achievement, cognitive, and oral language ability. Columbia Mental Maturity Scale-Third Edition 1: Psychological Testing and The assessor is key to the process. It requires careful tool selection, evaluation skills, and thoughtful data organization. 20th-century It involves a problem-solving approach using various data to address the referral question. in Varieties of assessment Retrospective: It involves using evaluative tools to determine a person's past psychological state. Remote: It involves using psychological tools to evaluate someone remotely. Ecological momentary assessment (EMA): It is the "in the moment" evaluation of specific problems and related cognitive and behavioral variables as they occur. Collaborative psychological assessment: The assessor and assessee may work as “partners” from initial contact through final feedback. Therapeutic psychological assessment: Therapeutic self-discovery and new understandings are encouraged throughout the assessment process (Stephen Finn). Dynamic assessment: An interactive approach to psychological assessment that usually follows a model of (1) evaluation, (2) intervention of some sort, and (3) evaluation. The Tools of Psychological Assessment Psychological test: A device or procedure designed to measure psychology-related variables. - Content: subject matter - Format: form, plan, structure, arrangement, and layout of test items as well as related considerations such as time limits. - Administration procedures: individual or group - Scoring and interpretation procedures: - Score: A code or summary with statement that reflects an evaluation answers. of performance on a test, task, interview, or behavior sample. - Scoring: The process of assigning evaluative codes or statements to performance on tests, tasks, interviews, or behavior samples. - Cut score: A numerical reference point used to classify a set of data into two or more categories. - Psychometric soundness or technical quality: How consistently and how accurately a psychological test measures what it purports to measure. - Utility: The usefulness or practical value that a test or other tool of assessment has for a particular purpose. Who? Test developer and publisher: Create tests or other methods of assessment. Test user: A trained professional who administers and interprets psychological or educational tests. Test-taker: Anyone who is the subject of an assessment or an evaluation. - Psychological autopsy: A reconstruction of a deceased individual's psychological profile based on archival records, artifacts, and interviews with those who knew them. Society at large: Community’s influence on psychological assessment standards and practices to support mental health and well-being. Other settings: Organizations, companies, and governmental agencies. In What Types of Settings Are Assessments Conducted, and Why? Educational setting; - As mandated by law, tests are administered to reach a early in school life to help identify children who may have special needs. - Achievement test: Evaluates accomplishment or the degree of learning that has taken place. - Diagnostic test: Help narrow down and identify areas of deficit to be targeted for intervention. - Informal evaluation: Nonsystematic assessment that leads to the formation of an opinion or attitude. Clinical setting: - Help screen for or diagnose behavior problems. - Testing is conducted individually, while group testing is mainly for screening individuals needing further evaluation. Includes Counseling setting: - The ultimate objective is the improvement of the assessee in terms of adjustment, productivity, or some related variable. Geriatric setting: - Quality of life: Evaluated variables related to perceived stress, loneliness, sources of satisfaction, personal values, quality of living conditions, and quality of friendships and other social support. Business and military setting: - It is used primarily for decision-making regarding personnel careers. - It is involved in the engineering and design of products and environments. - It is involved in taking the pulse of consumers. Governmental and organizational credentialing: Chapter 2: Historical, Cultural, and Lega/Ethical Considerations A Hisiorical Perspective The first systematic tests were developed in China as early as 2200 B.C.E. as a means of selecting people for government jobs. The Ancient Egyptian and Greco-Roman cultures also had specific ideas relating to mental health and personality but no formal means of psychological assessment. By the 18th century, Christian von Wolff had envisioned psychology as a science and psychological measurement as a specialized field within it. Darwin’s interest in individual differences led his half-cousin, Francis Galton, to devise several measures for psychological variables such as questionnaires, rating scales, and self-report inventories. In Germany, Wilhelm Wundt started the first experimental psychology laboratory and measured variables such as reaction time, perception, and attention span. 1890: James McKeen Cattell coined the term mental test. Charles Spearman: Originated the concept of test reliability and built the mathematical framework for the statistical technique of factor analysis. Victor Henri: Collaborated with Alfred Binet on papers suggesting how mental tests could be used to measure higher mental processes. Emil Kraepelin: An early experimenter with the word association technique as a formal test. Lightner Witmer: “Little-known founder of clinical psychology” The 20th century brought the first tests of abilities such as intelligence. 1905: Alfred Binet and Theodore Simon developed the first intelligence test to identify mentally retarded Paris schoolchildren. 1939: David Wechsler introduced a test designed to measure adult intelligence test. - Formerly known as the Wechsler-Bellevue Intelligence Scale and renamed now as the Wechsler Adult Intelligence Scale (WAIS). WWI and WWII brought the need for large-scale testing of the intellectual ability of recruits. After WWII, psychologists increasingly used tests in large corporations and private organizations. Robert Woodworth: Developed a measure of adjustment and emotional stability that could be administered quickly and efficiently to groups of recruits. - To disguise the true purpose of the test, the test was labeled as a “Personal Data Sheet”. Discrimination: The practice of favoring majority group members in hiring or promotion decisions, regardless of qualifications. Reverse discrimination: The practice of favoring supply diverse individuals in hiring or promotion decisions, regardless of qualifications. Disparate treatment: The result of an employer's hiring or promotion practice intentionally designed to produce discrimination. Disparate impact: The unintended discriminatory outcome of an employer's hiring or promotion practice. Test User Qualifications 1950: The APA committee published the Ethical Standards for the Distribution of Psychological Tests and Diagnostic Aids, defining three test levels based on the required knowledge of testing and psychology. - Level A: Tests that can be administered, scored, and interpreted using the manual and a basic understanding of the institution or organization. - Level B: Tests that require technical knowledge of test construction, use, and related fields like statistics, individual differences, and psychology. - Level C: Tests requiring extensive knowledge of testing, related psychological fields, and supervised experience. CAPA’s major issues: - Comparability of pencil-and-paper and computerized versions of tests - The value of computerized test interpretations - Unprofessional, unregulated “psychological testing” online The Rights of Test-Takers The right of informed consent - Informed consent: Test-takers have a right to know why they are being evaluated, how the test data will be used, and what information will be released to whom. The right to be informed of test findings The right to privacy and confidentiality - Privacy right: Affirms an individual's right to decide when and how much to share or withhold personal beliefs and opinions. - Privileged information: Parties who communicate with each other in the context of certain relationships. - Confidentiality: Concerns about matters of communication outside the courtroom. The right to the least stigmatizing label criteria by Measure of central tendency: Indicates the average or midmost score between the extreme scores in a distribution. - Mean: It is the average, calculated by dividing the sum of all values in a dataset by the number of values. - Interval and ratio data - Median: The middle score of the distribution. - Ordinal, interval, and ratio data - Useful in cases where relatively few scores fall at the high end of the distribution or relatively few scores fall at the low end of the distribution. - Mode: The most frequently occurring score in a distribution. - Bimodal distribution: Two scores that occur with the highest frequency. Variability: Indication of how scores in a on some distribution are scattered or dispersed. Measures of variability: Describes the amount of variation in a distribution. - Range: Equal to the difference between the highest and the lowest scores in a distribution. - Interquartile range: Equal to the difference between Q3 and Q1. - Semi-interquartile range: Equal to the interquartile range divided by 2. - Quartiles: The dividing points between the four quarters in the distribution. - Standard deviation: Equal to the square root of the average squared deviations about the mean. - Variance: Equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean. Skewness: The nature and extent to which symmetry is absent. - Positive skew: Relatively few of the scores fall at the high end of the distribution. - Negative skew: Relatively few of the scores fall at the low end of the distribution of test scores. Kurtosis: The steepness of a distribution. - Platykurtic: Relatively flat - Leptokurtic: Relatively peaked - Mesokurtic: Somewhere in the middle The Normal Curve Development of the concept of a normal curve began in the middle of the 18th century with the work of Abraham DeMoivre and, later, the Marquis de Laplace. Formerly known as “Laplace-Gaussian curve”. distance—from the other coordinate points in a scatterplot. Meta-analysis: A family of techniques used to statistically combine information across studies to produce single estimates of the data under study. - Effect size: The estimates derived. - Evidenced-based practice: It is based on clinical and research findings. Chapter 4: Of Tests and Testing Assumption 1: Psychological Traits and States Exist Trait: Any distinguishable, relatively enduring way in which one individual varies from another. States: Distinguish one person from another but are relatively less enduring. A psychological trait exists only as a construct. - Construct: An informed, scientific concept developed or constructed to describe or explain behavior. Overt behavior: An observable action or the product of an observable action, including test- or assessment-related responses. Assumption 2: Psychological Traits and States Can Be Quantified and Measured Cumulative scoring: A trait is measured by a series of test items. Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior Assumption 4: All Tests Have Limits and Imperfections Assumption 5: Various Sources of Error Are Part of the Assessment Process Error variance: The component of a test score attributable to sources other than the trait or ability measured. Assumption 6: Unfair and Biased Assessment Procedures Can Be Identified and Reformed Assumption 7: Testing and Assessment Offer Powerful Benefits to Society Norms Norm-referenced testing and assessment: A method of evaluation and a way of deriving meaning from test scores by evaluating an individual test-taker’s score and comparing it to scores of a group of test-takers. Norms: The test performance data of a particular group of test-takers that are designed for use as a reference when evaluating or interpreting individual test scores. Normative sample: The group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test-takers. Norming: The process of deriving norms. point located outlying reference to corresponding percentile scores. Subgroup norms: A normative sample that can be segmented by any of the criteria initially used in selecting subjects for the sample. Local norms: Provide normative information with respect to the local population’s performance on some test. Fixed reference group scoring systems: The distribution of scores obtained on the test from one group of test-takers—referred to as the fixed reference group—is used used as the basis for the calculation of test scores for future administrations of the test. Criterion-referenced testing and assessment: A method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard. - Also called as domain- or content-referenced testing and assessment. Chapter 5: Reliability Reliability coefficient: Quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable). Measurement error: The inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes. - Random error: Consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process. - Systematic error: Do not cancel each other out because they influence test scores in consistent direction. - Bias: The degree to which a measure predictably overestimates or underestimates a quantity. Variance: The standard deviation squared. - True variance: Variance from true differences. - Error variance: Variance from irrelevant, random sources. Reliability: The proportion of the total variance attributed to true variance. Sources of Error Variance Test construction - Item sampling or content sampling: Variation among items within a test as well as to variation among items between tests. Test administration - Test-taker’s attention or motivation. - Test environment - Test-taker variables - Examiner-related variables Test scoring and interpretation Reliability Estimates - Heterogeneous: An estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability. Dynamic vs. static characteristics - Dynamic characteristic: A trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences. - Static characteristic: A trait, state, or ability presumed to be relatively unchanging. Restriction or inflation of range Speed tests vs. power tests - Power test: When a time limit is long enough to allow test-takers to attempt all items, and if some items are so difficult that no test-taker is able to obtain a perfect score. - Speed test: Contains items of uniform level of difficulty so that, when given generous time limits, all test-takers should be able to complete all the test items correctly. Criterion-referenced tests - It provides an indication of where a test-taker stands with respect to some variable or criterion. Classical Test Theory: A framework used in psychometrics to assess the reliability and validity of psychological tests. Domain sampling theory: Seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score. Generalizability theory: A person’s test scores vary from testing to testing because of variables in the testing situation. - Universe: The details of the particular test situation. - Facets: Include considerations such as the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration. - Universe score: Analogous to a true score in the true score model. - Generalizability study: Examines how generalizable scores from a particular test are if the test is administered in different situations. - Coefficients of generalizability: Measure the extent to which test scores or measurements can be generalized across different conditions. - Decision study: Developers examine the usefulness of test scores in helping the test user make decisions. Item Response Theory: A framework used in psychometrics that models the relationship coverage, the organization of the items in the test, and so forth. Criterion-related validity: A judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest. - Concurrent validity: An index of the degree to which a test score is related to some criterion measure obtained at the same time. - Predictive validity: An index of the degree to which a test score predicts some criterion measure. - Base rate: The extent to which a particular trait, behavior, characteristic, or attribute exists in the population. - Hit rate: The proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute. - Miss rate: The proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute. - False positive: A miss wherein the test predicted that the test-taker did possess the particular characteristic or attribute being measured when in fact the test-taker did not. - False negative: A miss wherein the test predicted that the test-taker did not possess the particular characteristic or attribute being measured when the test-taker did. Criterion: The standard against which a test or a test score is evaluated. Characteristics of a criterion for that particular - Relevant - Valid - Uncontaminated - Criterion contamination: A criterion measure that has been based, at least in part, on predictor measures. Validity coefficient: A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure. Incremental validity: The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use. - Intercept bias: Occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes. - Slope bias: Occurs when a predictor has a weaker correlation with an outcome for specific groups. Rating: A numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a scale of numerical or word descriptors known as a rating scale. Rating error: A judgment resulting from the intentional or unintentional misuse of a rating scale. - Leniency error (or generosity error): An groups: error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading. - Severity error: At the other extreme. - Central tendency error: The rater, for some reason, exhibits a general and systematic reluctance to give ratings at either the positive or negative extreme. Rankings: A procedure that requires the rater to measure individuals against one another instead of against an absolute scale. - Halo effect: A tendency to give a particular ratee a higher rating than the ratee objectively deserves because the rater fails to discriminate among conceptually distinct and potentially independent aspects of a ratee’s behavior. Test fairness: The extent to which a test is used in an impartial, just, and equitable way. Chapter 7: Utility Test utility: The usefulness or practical value of testing to improve efficiency. Factors that affect a test’s utility - Psychometric soundness - Costs - Benefits Utility analysis: A family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment. How is a utility analysis conducted? - Expectancy data - Taylor-Russell tables: Provide an estimate of the extent to which the inclusion of a particular test in the selection system will improve selection. - Naylor-Shine tables: Entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test (or other tool of histogram containing items deemed to be of equivalent value. - Bookmark method: A standard-setting technique where experts cut scores by marking the specified point on an ordered list of test items where a specific performance level begins. - Other methods - Method of predictive yield: A technique for setting cut scores which took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores. - Discriminant analysis: Used to shet light on the relationship between identified variables and two naturally occurring groups. Chapter 8: Test Development Test Conceptualization Some preliminary questions - What is the designed to measure? to the - What is the objective of the test? - Is there a need for this test? - Who will use this test? - Who will take this test? - What content will the test cover? - How will the test be administered? - What is the ideal format of the test? - Should more than one form of the test be developed? - What special training will be required of test users for administering or interpreting the test? - What types of responses will be required of test-takers? - Who benefits from an administration of this test? - Is there any potential for harm as the result of an administration of this test? - How will meaning be attributed to scores on this test? Norm-referenced vs. criterion-referenced tests: Item development issues Pilot work: The preliminary research surrounding the creation of a prototype of the test. Test construction Scaling: The process of setting rules for assigning numbers in measurement. Types of scale - Age-based scale: If the test-taker’s test performance as a function of age is of critical interest. column in the sentence that requires the test-taker to indicate whether the statement is or is not a fact. - Completion item: Requires the examinee to provide a word or phrase that completes a sentence. - Short-answer item: A type of question that requires respondents to provide a brief, concise response, typically a few words or sentences, to assess knowledge or understanding of a specific topic. - Essay item: A test item that requires the test-taker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation. other - Constructed-response format: Require test-takers to supply or to create the correct answer, not merely to select it. Item bank: A relatively large and easily accessible collection of test questions. Computerized adaptive testing (CAT): An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker’s performance on previous items. - Floor effect: The diminished utility of an An assessment tool for distinguishing and test-takers at the low end of the ability, trait, or other attribute being measured. - Ceiling effect: The diminished utility of an assessment tool for distinguishing test-takers at the high end of the ability, trait, or other attribute being measured. Requires Item branching: The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items. Scoring items - Class scoring (or categorical scoring): Test-taker responses earn credit toward placement in a particular class or category with other test-takers whose pattern of responses is presumably similar in some way. columns: - Ipsative scoring: Compares a test-taker’s score on one scale within a test to another scale within that same test. A Test Tryout Item Analysis Item-difficulty index (or item-endorsement index): The statistic provides not a measure of the item percent of people passing the item but a measure the form of a - Scoring drift: A discrepancy between scoring in an anchor protocol and the scoring of another protocol. The Use of IRT in Building and Revising Tests - Evaluating the properties of existing tests and guiding test revision. - Determining measurement equivalence across test-taker populations. Compares - Differential item functioning: A phenomenon wherein an item functions differently in one group of test-takers as compared to another graphic group of test-takers known to have the same level of the underlying trait. - DIF analysis: Test developers scrutinize group-by-group item response curves, looking for DIF items. - DIF items: Those items that respondents from different groups at the same level of the underlying trait have different probabilities of endorsing as a function of their group membership. - Developing item banks. Chapter 9: Intelligence and its Measurement Intelligence consists of the ability to: - Understand complex ideas; - Adapt effectively to the environment; - Learn from experience; - Engage in various forms of reasoning; - Overcome obstacles by taking thought. A major thread running throough the theories of Binet, Wechsler, and Piaget is a focus on interactionism. - Interactionism: The complex concept by which heredity and environment are presumed to interact and influence the development of one’s intelligence. Louis L. Thurstone conceived of intelligence as composed of primary mental abilities (PMAs). - He developed and published Primary Mental Abilities test, which consisted of separate tests, each designed to measure one PMA: verbal meaning, perceptual speed, reasoning, number facility, rote memory, word fluency, and spatial relations. Galton believed that the roots of intelligence were to be found in the ability to discriminate between small differences in sensations. Although Binet never explicitly defined intelligence, he discussed its components in terms of reasoning, judgment, memory, and abstraction. Wechsler’s define intelligence as the aggregate or global capacity of the individual to act purposefully, - Auditory processing (Ga) - Quantitative reasoning (Gq) - Speed of processing (Gs) - Facility with reading and writing (Grw) - Short-term memory (Gsm) - Long-term storage and retrieval (GlrI) - Vulnerable abilities: They decline with age and tend not to return to preinjury levels following brain damage. - Maintained abilities: They tend not to decline with age and may return to preinjury levels following brain damage. John Carroll proposed the three-stratum theory of cognitive abilities because he thought intelligence is best described at three levels (or strata): general, broad, and narrow. - Cattell-Horn-Carroll theory of cognitive abilities (termed by Kevin McGrew) The Information-Processing View Aleksandr Luria - Her approach focuses on the mechanisms by which information is processed—how information is processed. - 2 basic types of information processing styles: - Simultaneous (or parallel) processing: Information is integrated all at one time. - Successive (or sequential) processing: Each bit of information is individually processed in sequence. PASS model of intellectual functioning - Planning: Strategy development for problem-solving - Attention (or arousal): Receptivity to information - Simultaneous - Successive Measuring Intelligence In infancy, intellectual assessment consists primarily of measuring sensorimotor development. The focus in evaluation of the older child shifts to verbal and performance abilities. According to Wechsler, adult intelligence scales should tap abilities such as retention of general information, quantitative reasoning, expressive language and memory, and social judgment. Tests of intelligence are seldom administered to adults for purposes of educational placement; rather, they may be given to obtain clinically relevant information or some measure of learning potential and skill acquisition. The Stanford-Binet Intelligence Scales: Fifth Edition (SB5) young as 2 and as old as 85 (or older). - The test yields a number of composite scores, including a Full Scale IQ derived from the administration of 10 subtests. - All composite scores have a mean set at 100 and a standard deviation of 15. - The test yields 5 Factor Index scores corresponding to each of the 5 factors that the test is presumed to measure. - Fluid reasoning - Knowledge - Quantitative reasoning - Visual-spatial processing - Working memory - Routing test: A task used to direct or route the examinee to a particular level of questions. - Teaching items: Designed to 𝑚𝑒𝑛𝑡𝑎𝑙 𝑎𝑔𝑒 𝐼𝑄 = 𝑐ℎ𝑟𝑜𝑛𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑎𝑔𝑒 𝑥 100 illustrate the task required and assure the examiner that reflects a the examinee understands. the - Floor: The lowest level of the items individual on a subtest. - Ceiling: The highest-level item of age in the the subtest. - Basal level: A base-level criterion as that must be met for testing on the subtest to continue. ethnically, - If and when examinees fall a certain culturally number of items in a row, a ceiling in the level is said to have been reached only and testing is discontinued. of Black - Adaptive testing: Testing individually tailored to the test-taker. - Extra-test behavior: The way the a point examinee copes with frustration; subtests how the examinee reacts to items by age (age considered easy; the amount of are support the examinee seems to require; the general approach to the task; how anxious, fatigued, cooperative, distractible, or compulsive the examinee appears to and/or be. n Measured IQ Range Category 145-160 Very gifted or highly advanced - Picture Completion The WAIS-IV standardization sample consisted of 2,200 adults from the age of 16 to 90 years, 11 months. Wechsler Intelligence Scale for Children (WISC) and Wechsler Pre-School and Primary Scale of Intelligence (WPPSI) The Wechsler tests are all point scales that yield deviation IQs with a mean of 100 and a standard deviation of 15. Short Forms of Intelligence Tests Short form: A test that has been abbreviated in length, typically to reduce the time needed for test administration, scoring, and interpretation. Group Tests of Intelligence Army Alpha test: The test would be administered to Army recruits who could read. Army Beta test: Designed for administration to foreign-born recruits with poor knowledge of English or to illiterate recruits. Screening tool: An instrument or procedure used to identify a particular trait or constellation of traits at a gross or imprecise level. Other Measures of Intellectual Abilities Cognitive style: A psychological dimension that characterizes the consistency with which one acquires and processes information. Measures of creativity: - Originality: The ability to produce something that is innovative or nonobvious. - Fluency: The ease with which responses are reproduced and is usually measured by the total number of responses produced. - Flexibility: The variety of ideas presented and the ability to shift from one approach to another. - Elaboration: The richness of detail in a verbal explanation or pictorial display. Convergent thinking: A deductive reasoning process that entails recall and consideration of facts as well as a series of logical judgments to narrow down solutions and eventually arrive at one solution. Divergent thinking: A reasoning process in which thought is free to move in many different directions, making several solutions possible. Issues in the Assessent of Intelligence Culture-free intelligence test: Designed to minimize cultural biases, aiming to assess cognitive abilities without being influenced by the test-taker’s cultural background, language, or social experiences. Culture loading: The extent to which a test incorporates the vocabulary, concepts, traditions, knowledge, and feelings associated with a particular culture. Goodenough-Harris Drawing Test (G-HDT): One of the quickest, easiest, and least expensive to administer of all ability tests. as - Draw a picture of a whole man and to do the best job possible. - Each detail is given one point. Culture Fair Intelligence Test (CFIT): Provide an estimate of intelligence relatively free of cultural and language differences. - Covers 3 levels and randomly selected adults. Aptitude Tests Aptitude tests: Tend to focus more on informal learning or life experiences. - Also referred to as prognostic tests: used to predict future behavior. Checklist: Questionnaire on which marks are made to indicate the presence or absence of a behavior. Rating scales: Form completed by an evaluator to make a judgment of relative standing with regard to specific variable or list of variables. Informal evaluation: Nonsystematic, relatively brief and off the record assessment leading to the formation of an opinion or attitude conducted by any person, in any way, for any reason, in an unofficial context that is not subject to ethics or other standards of an evaluation by a professional. Tests such as WPPSI-III and SB5 may be used to gauge developmental strengths and weaknesses by sampling children’s performance in cognitive, motor, and social/behavioral content areas. The most obvious example of aptitude test is Scholastic Aptitude Test (SAT). Diagnostic Tests Evaluative: Applied to tests or test data that are used to make judgments. Diagnostic information: Used in educational context is typically applied to test or test data used to pinpoint a student’s difficulty. Diagnostic test: Used to identify areas of deficit to be targeted for intervention. Woodcock Reading Mastery Tests-Revised (WRMT-III): Measure of reading readiness, reading achievement, and reading difficulties. Stanford Diagnostic Mathematic Test, Fourth Edition (SDMT4) KeyMath Diagnostic System Brazelton Neonatal Assessment Scale: Individual tests for infants between 3 days and 4 weeks of age to provide an index of newborn’s competence. - Reflexes, response to stress, startle reactions, cuddliness, motor maturity, ability to habituate to sensory stimuli, and hand-mouth coordination. Bender Visual Motor Gestalt Test: Consist of 9 geometric figures that the subjects is simply asked to copy. Gesell - Anyone older than 9 who cannot copy the figures may suffer from some type of deficit. Chapter 11 and 12: Personality Personality and Personality Assessment McClelland: Personality is the most adequate conceptualization of a person’s behavior in all its detail. Menninger: Personality as the individual as a whole, his height and weight and love and hates and blood pressure and reflexes; his smiles and hopes and bowed legs and enlarged tonsils. It normative means all that anyone is that and he is trying to become. Personality: Individual’s unique constellation of psychological traits that is relatively stable over time. Personality assessment: The measurement and evaluation of psychological traits, states, values, interests, attitudes, worldview, acculturation, sense of humor, cognitive and behavioral styles, and/or related individual characteristics. Personality traits: Real physical entities that are bona fide mental structures in each personality (Allport). - Trait is a generalized and focalized neuropsychic system with the capacity to render many stimuli functionally equivalent, and to initiate and guide consistent forms of adaptive and expressive behavior. - Any distinguishable, relatively enduring way in which one individual varies from another (Guilford). Personality type: Constellation of traits is similar in pattern to one identified category of personality within a tax

Psychological Testing and Assessment PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue