Psychological Testing and Assessment PDF
Document Details
Uploaded by RelaxedDifferential
National University
Tags
Summary
This document provides a comprehensive overview of psychological testing and assessment, tracing its historical roots from early 20th-century France to modern applications. It explores various assessment methods, including retrospective, remote, ecological momentary assessment, and collaborative assessment, along with different tools and procedures used in the process. The document also examines the roles of various stakeholders in assessment, such as test developers, test users, and test-takers.
Full Transcript
Chapter 1: Psychological Testing and The assessor is key to the process. Assessment Testing and Assessment It requires careful tool selection, evaluation skills, The roots of contemporary p...
Chapter 1: Psychological Testing and The assessor is key to the process. Assessment Testing and Assessment It requires careful tool selection, evaluation skills, The roots of contemporary psychological testing and thoughtful data organization. and assessment can be found in early 20th-century France. It involves a problem-solving approach using 1905: Alfred Binet and a colleague published a test various data to address the referral question. designed to help place Paris schoolchildren in appropriate classes. Varieties of assessment - The test aimed to identify intellectually Retrospective: It involves using evaluative tools to disabled children. determine a person's past psychological state. Within a decade, an English version of Binet's test Remote: It involves using psychological tools to was created for US schools. evaluate someone remotely. In 1917, as the US entered WWI, the military used Ecological momentary assessment (EMA): It is psychological testing to quickly screen recruits for the "in the moment" evaluation of specific problems intellectual and emotional issues. and related cognitive and behavioral variables as During WWII, the military depended even more on they occur. psychological tests to screen recruits for service. Collaborative psychological assessment: The William Stern developed a refined method of assessor and assessee may work as “partners” scoring Binet’s test, the Intelligence Quotient (IQ). from initial contact through final feedback. During WWI, the term “testing” described the group Therapeutic psychological assessment: screening of thousands of military recruits. Therapeutic self-discovery and new understandings By WWII, a semantic distinction between testing are encouraged throughout the assessment and a more inclusive term, “assessment,” began to process (Stephen Finn). emerge. Dynamic assessment: An interactive approach to Psychological assessment: The gathering and psychological assessment that usually follows a integration of psychology-related data to make a model of (1) evaluation, (2) intervention of some psychological evaluation that is accomplished sort, and (3) evaluation. through the use of various tools. The Tools of Psychological Assessment Psychological testing: The process of measuring Psychological test: A device or procedure psychology-related variables using devices or designed to measure psychology-related variables. procedures designed to obtain a sample of - Content: subject matter behavior. - Format: form, plan, structure, arrangement, and layout of test items as well as related Testing considerations such as time limits. - Administration procedures: individual or To obtain some gauge, usually numerical, about group an ability or attribute. - Scoring and interpretation procedures: - Score: A code or summary It can be done individually or in groups, with statement that reflects an evaluation results based on the number of correct answers. of performance on a test, task, The tester is not key to the process. interview, or behavior sample. - Scoring: The process of assigning It requires technician-like skills in administering evaluative codes or statements to and scoring as well as in interpreting a test result. performance on tests, tasks, interviews, or behavior samples. It yields a test score or series of test scores. - Cut score: A numerical reference point used to classify a set of data Assessment into two or more categories. - Psychometric soundness or technical To answer a referral question, solve a problem, or quality: How consistently and how decide on the use of tools of evaluation. accurately a psychological test measures what it purports to measure. It is individualized, focusing on how a person - Utility: The usefulness or practical processes. value that a test or other tool of assessment has for a particular purpose. Interview: A method of gathering information Who? through direct communication involving reciprocal Test developer and publisher: Create tests or exchange. other methods of assessment. - Panel interview (or board interview): More Test user: A trained professional who administers than one interviewer participates in the and interprets psychological or educational tests. assessment. Test-taker: Anyone who is the subject of an - Motivational interviewing: A therapeutic assessment or an evaluation. dialogue that blends person-centered - Psychological autopsy: A reconstruction listening with cognition-altering techniques of a deceased individual's psychological to enhance motivation and facilitate change. profile based on archival records, artifacts, Portfolio: Samples of one’s ability and and interviews with those who knew them. accomplishment. Society at large: Community’s influence on Case History Data: Records and transcripts in psychological assessment standards and practices various forms that preserve archival information to support mental health and well-being. and data relevant to an assessee. Other settings: Organizations, companies, and - Case study (or case history): A report or governmental agencies. illustrative account concerning a person or In What Types of Settings Are Assessments an event that was compiled based on the Conducted, and Why? case history data. Educational setting; - Groupthink: The result of varied forces that - As mandated by law, tests are administered drive decision-makers to reach a early in school life to help identify children consensus. who may have special needs. Behavioral observation: Monitoring actions - Achievement test: Evaluates through visual or electronic means while recording accomplishment or the degree of learning quantitative and qualitative data. that has taken place. - Natural observation: Observing in a setting - Diagnostic test: Help narrow down and in which the behavior would typically be identify areas of deficit to be targeted for expected to occur. intervention. Role-Play Test: Assessees are directed to act as if - Informal evaluation: Nonsystematic they were in a particular situation. assessment that leads to the formation of Computer: an opinion or attitude. - Local processing: on-site Clinical setting: - Central processing: at some central - Help screen for or diagnose behavior location problems. - Teleprocessing: phone lines - Testing is conducted individually, while - Simple scoring report: A mere listing of a group testing is mainly for screening score or scores. individuals needing further evaluation. - Extended scoring report: Includes Counseling setting: statistical analyses of the test-taker’s - The ultimate objective is the improvement of performance. the assessee in terms of adjustment, - Interpretive report: Includes numerical or productivity, or some related variable. narrative interpretive statements in the Geriatric setting: report. - Quality of life: Evaluated variables related - Consultative report: Written in professional to perceived stress, loneliness, sources of language to provide expert analysis of the satisfaction, personal values, quality of data. living conditions, and quality of friendships - Integrative report: Employ previously and other social support. collected data into the test report. Business and military setting: - CAT (computer adaptive testing): The - It is used primarily for decision-making computer’s ability to tailor the test to the regarding personnel careers. test-taker’s ability or test-taking pattern. - It is involved in the engineering and design - CAPA (computer-assisted psychological of products and environments. assessment): The assistance computers - It is involved in taking the pulse of provide to the test user. consumers. Other tools: video and physiological devices Governmental and organizational credentialing: Who, What, Why, How, and Where? - Governmental licensing, certification, or Chapter 2: Historical, Cultural, and Lega/Ethical general credentialing of professionals. Considerations Academic research setting: A Hisiorical Perspective - It is essential for measurement, and The first systematic tests were developed in China researchers should have a strong as early as 2200 B.C.E. as a means of selecting understanding of measurement principles people for government jobs. and assessment tools. The Ancient Egyptian and Greco-Roman cultures Other setting: court, program evaluation, health also had specific ideas relating to mental health psychology and personality but no formal means of - Health psychology: Focuses on the impact psychological assessment. of psychological variables on the onset, By the 18th century, Christian von Wolff had course, treatment, and prevention of illness envisioned psychology as a science and and disability. psychological measurement as a specialized field How Are Assessments Conducted? within it. - Protocol: The form, sheet, or booklet on Darwin’s interest in individual differences led his which a test-taker’s responses are entered. half-cousin, Francis Galton, to devise several - Rapport: A working relationship between measures for psychological variables such as the examiner and the examinee. questionnaires, rating scales, and self-report - Accommodation: The adaption of a test, inventories. procedure, or situation, or the substitution of In Germany, Wilhelm Wundt started the first one test for another, to make the experimental psychology laboratory and measured assessment more suitable for an assessee variables such as reaction time, perception, and with exceptional needs. attention span. - Alternative assessment: An evaluative 1890: James McKeen Cattell coined the term procedure that deviates from standard mental test. measurement methods, either through Charles Spearman: Originated the concept of test special accommodations or alternative reliability and built the mathematical framework for techniques to assess the same variables. the statistical technique of factor analysis. Where To Go for Authoritative Information: Victor Henri: Collaborated with Alfred Binet on Reference Sources papers suggesting how mental tests could be used Test catalogs: Contain only a brief description of to measure higher mental processes. the test and seldom contain the kind of detailed Emil Kraepelin: An early experimenter with the information that a prospective user might require. word association technique as a formal test. - The objective is to only sell the test. Lightner Witmer: “Little-known founder of clinical Test manuals: Detailed information concerning the psychology” development of a particular test and technical The 20th century brought the first tests of abilities information relating to it. such as intelligence. Professional books: An in-depth discussion of a 1905: Alfred Binet and Theodore Simon developed test that offers assessment students and the first intelligence test to identify mentally professionals insights and actionable knowledge retarded Paris schoolchildren. from experienced practitioners. 1939: David Wechsler introduced a test designed to Reference volumes: Provides comprehensive measure adult intelligence test. information on assessment principles, tools, - Formerly known as the Wechsler-Bellevue methodologies, and best practices for various Intelligence Scale and renamed now as the contexts. Wechsler Adult Intelligence Scale (WAIS). Journal articles: Contain reviews of the test, WWI and WWII brought the need for large-scale updated or independent studies of its psychometric testing of the intellectual ability of recruits. soundness, or examples of how the instrument was After WWII, psychologists increasingly used tests in used in either research or an applied context. large corporations and private organizations. Online databases: Offers access to research Robert Woodworth: Developed a measure of articles, assessments, measurement tools, and adjustment and emotional stability that could be relevant literature on assessment methodologies administered quickly and efficiently to groups of and practices. recruits. Other sources: Unpublished tests and measures, - To disguise the true purpose of the test, the and university libraries test was labeled as a “Personal Data Sheet”. - After the war, he developed a personality Discrimination: The practice of favoring majority test for civilian use called the Woodworth group members in hiring or promotion decisions, Psychoneurotic Inventory, a self-report regardless of qualifications. test. Reverse discrimination: The practice of favoring - Self-report: Assessees supply diverse individuals in hiring or promotion decisions, assessment-related information by regardless of qualifications. responding to questions, keeping a Disparate treatment: The result of an employer's diary, or self-monitoring thoughts or hiring or promotion practice intentionally designed behaviors. to produce discrimination. Projective test: An individual is assumed to Disparate impact: The unintended discriminatory “project” onto some ambiguous stimulus his/her outcome of an employer's hiring or promotion own unique needs, fears, hopes, and motivation. practice. - The best-known of all projective tests is the Test User Qualifications Rorschach, a series of inkblots developed 1950: The APA committee published the Ethical by Hermann Rorschach. Standards for the Distribution of Psychological - The use of pictures as projective stimuli was Tests and Diagnostic Aids, defining three test levels popularized in the late 1930s by Henry based on the required knowledge of testing and Murray and Christiana Morgan. psychology. Academic tradition: Researchers at universities - Level A: Tests that can be administered, throughout the world use the tools of assessment to scored, and interpreted using the manual help advance knowledge and understanding of and a basic understanding of the institution human and animal behavior. or organization. Applied tradition: Help select applicants for - Level B: Tests that require technical various positions based on merit. knowledge of test construction, use, and Culture and Assessment related fields like statistics, individual Early psychological testing of immigrant differences, and psychology. populations by Henry Goddard was controversial; - Level C: Tests requiring extensive he found that the majority of the immigrant knowledge of testing, related psychological population was “feeble-minded”. fields, and supervised experience. - His findings stemmed from a translated CAPA’s major issues: Stanford-Binet intelligence test that - Comparability of pencil-and-paper and overestimated mental deficiency in native computerized versions of tests English speakers and immigrants. - The value of computerized test Culture-specific tests: Designed for use with interpretations people from one culture but not from another. - Unprofessional, unregulated “psychological Affirmative action: Efforts by governments, testing” online employers, and schools to combat discrimination The Rights of Test-Takers and promote equal opportunity in education and The right of informed consent employment. - Informed consent: Test-takers have a right Legal and Ethical Considerations to know why they are being evaluated, how Code of professional ethics: A set of guidelines the test data will be used, and what and principles that govern the conduct of information will be released to whom. professionals within a specific field. The right to be informed of test findings - Standard of care: The level at which a The right to privacy and confidentiality reasonable professional provides diagnostic - Privacy right: Affirms an individual's right to or therapeutic services under similar decide when and how much to share or conditions. withhold personal beliefs and opinions. Minimum competency testing programs: Formal - Privileged information: Parties who testing programs are designed to be used in communicate with each other in the context decisions regarding various aspects of students’ of certain relationships. education. - Confidentiality: Concerns about matters of Truth-in-testing legislation communication outside the courtroom. - The objective was to provide test-takers The right to the least stigmatizing label with a means of learning the criteria by which they are being judged. Chapter 3: A Statistics Refresher Measure of central tendency: Indicates the Scales of Measurement average or midmost score between the extreme Measurement: The act of assigning numbers or scores in a distribution. symbols to characteristics of things according to - Mean: It is the average, calculated by rules. dividing the sum of all values in a dataset by Scale: A set of numbers (or other symbols) whose the number of values. properties model the empirical properties of the - Interval and ratio data objects to which the numbers are assigned. - Median: The middle score of the - Continuous: It has infinite values within a distribution. range, allowing precise quantification of - Ordinal, interval, and ratio data smoothly varying variables. - Useful in cases where relatively few - Discrete: Consists of distinct, separate scores fall at the high end of the values or categories, used for counting distribution or relatively few scores variables that cannot have fractional values. fall at the low end of the distribution. Error: The collective influence of all of the factors - Mode: The most frequently occurring score on a test score or measurement beyond those in a distribution. specifically measured by the test or measurement. - Bimodal distribution: Two scores Nominal Scales: Involve classification or that occur with the highest categorization based on 1 or more distinguishing frequency. characteristics. Variability: Indication of how scores in a Ordinal Scales: Rank-ordering on some distribution are scattered or dispersed. characteristics is also permissible. Measures of variability: Describes the amount of Interval Scales: Contains equal intervals and has variation in a distribution. no absolute zero point. - Range: Equal to the difference between the Ratio Scales: Has true zero point. highest and the lowest scores in a Describing Data distribution. Distribution: A set of test scores arrayed for - Interquartile range: Equal to the difference recording or study. between Q3 and Q1. Raw score: A straightforward, unmodified - Semi-interquartile range: Equal to accounting of performance that is usually the interquartile range divided by 2. numerical. - Quartiles: The dividing points Frequency distribution: All scores are listed between the four quarters in the alongside the number of times each score distribution. occurred. - Standard deviation: Equal to the square - Simple frequency distribution: Indicates root of the average squared deviations that individual scores have been used and about the mean. the data have not been grouped. - Variance: Equal to the arithmetic mean of - Grouped frequency distribution: Test the squares of the differences between the score intervals (or class intervals) replace scores in a distribution and their mean. the actual test scores. Skewness: The nature and extent to which Graph: A diagram or chart composed of lines, symmetry is absent. points, bars, or other symbols that describe and - Positive skew: Relatively few of the scores illustrate data. fall at the high end of the distribution. - Histogram: It has vertical lines drawn at the - Negative skew: Relatively few of the scores true limits of each test score, forming a fall at the low end of the distribution of test series of contiguous rectangles. scores. - Abscissa: X axis Kurtosis: The steepness of a distribution. - Ordinate: Y axis - Platykurtic: Relatively flat - Bar graph: Numbers indicative of frequency - Leptokurtic: Relatively peaked appear on the Y-axis, and reference to - Mesokurtic: Somewhere in the middle some categorization appears on the X-axis. The Normal Curve - Frequency polygon: Expressed by a Development of the concept of a normal curve continuous line connecting the points where began in the middle of the 18th century with the test scores meet frequencies. work of Abraham DeMoivre and, later, the Marquis de Laplace. Formerly known as “Laplace-Gaussian curve”. Karl Pearson: The first to refer to the curve as the distance—from the other coordinate points normal curve. in a scatterplot. Normal curve: A bell-shaped, smooth, and Meta-analysis: A family of techniques used to mathematically defined curve that is highest at its statistically combine information across studies to center. produce single estimates of the data under study. Tail: The area on the normal curve between 2 and - Effect size: The estimates derived. 3 standard deviations above the mean. - Evidenced-based practice: It is based on Standard Score: A raw score that has been clinical and research findings. converted from one scale to another scale, where Chapter 4: Of Tests and Testing the latter scale has some arbitrarily set mean and Assumption 1: Psychological Traits and States standard deviation. Exist - Z score: Results from the conversion of a Trait: Any distinguishable, relatively enduring way raw score into a number indicating how in which one individual varies from another. many standard deviation units the raw score States: Distinguish one person from another but is below or above the mean of the are relatively less enduring. distribution. A psychological trait exists only as a construct. - T scores: (Also called fifty plus or minus ten - Construct: An informed, scientific concept scale) A scale with a mean set at 50 and a developed or constructed to describe or standard deviation set at 10. explain behavior. - Devised by W.A. McCall. Overt behavior: An observable action or the - Stanine: A contraction of the words product of an observable action, including test- or standard and nine. assessment-related responses. - Linear transformation: Retains a direct Assumption 2: Psychological Traits and States Can numerical relationship to the original raw Be Quantified and Measured score. Cumulative scoring: A trait is measured by a - Nonlinear transformation: Required when series of test items. the data under consideration are not Assumption 3: Test-Related Behavior Predicts normally distributed yet comparisons with Non-Test-Related Behavior normal distributions need to be made. Assumption 4: All Tests Have Limits and Normalizing a distribution: Involves “stretching” Imperfections the skewed curve into the shape of a normal curve. Assumption 5: Various Sources of Error Are Part of - Normalized standard score scale: A the Assessment Process corresponding scale of standard scores. Error variance: The component of a test score Correlation and Inference attributable to sources other than the trait or ability Coefficient of correlation (or correlation measured. coefficient): Provides an index of the strength of the Assumption 6: Unfair and Biased Assessment relationship between two things. Procedures Can Be Identified and Reformed Correlation: An expression of the degree and Assumption 7: Testing and Assessment Offer direction of correspondence between two things. Powerful Benefits to Society Pearson r: Used when two variables being Norms correlated are continuous and linear. Norm-referenced testing and assessment: A - Devised by Karl Pearson. method of evaluation and a way of deriving - Coefficient of determination: An indication meaning from test scores by evaluating an of how much variance is shared by the X- individual test-taker’s score and comparing it to and the Y-variables. scores of a group of test-takers. Spearman Rho: Frequently used when the sample Norms: The test performance data of a particular size is small (fewer than 30 pairs of measurements) group of test-takers that are designed for use as a and when both sets of measurements are in ordinal reference when evaluating or interpreting individual form. test scores. - Developed by Charles Spearman. Normative sample: The group of people whose Scatterplot: A simple graphing of the coordinate performance on a particular test is analyzed for points for the X-variable and Y-variable values. reference in evaluating the performance of - Curvilinearity: An “eyeball gauge” of how individual test-takers. curved a graph is. Norming: The process of deriving norms. - Outlier: An extremely atypical point located at a relatively long distance—an outlying - Race norming: The controversial practice reference to corresponding percentile of norming based of race or ethnic scores. background. Subgroup norms: A normative sample that can be User norms or program norms: Consist of segmented by any of the criteria initially used in descriptive statistics based on a group of selecting subjects for the sample. test-takers in a given period of time rather than Local norms: Provide normative information with norms obtained by formal sampling methods. respect to the local population’s performance on Standardization or test standardization: The some test. process of administering a test to a representative Fixed reference group scoring systems: The sample of test-takers for the purpose of distribution of scores obtained on the test from one establishing norms. group of test-takers—referred to as the fixed Sample: A portion of the universe of people reference group—is used used as the basis for the deemed to be representative of the whole calculation of test scores for future administrations population. of the test. Sampling: The process of selecting the portion of Criterion-referenced testing and assessment: A the universe deemed to be representative of the method of evaluation and a way of deriving whole population. meaning from test scores by evaluating an - Stratified sampling: A method where a individual’s score with reference to a set standard. population is divided into subgroups (strata) - Also called as domain- or based on specific characteristics, and content-referenced testing and samples are taken from each stratum to assessment. ensure representation. Chapter 5: Reliability - Stratified-random sampling: If Reliability coefficient: Quantifies reliability, such sampling were random. ranging from 0 (not at all reliable) to 1 (perfectly - Purposive sampling: If we arbitrarily select reliable). some sample because we believe it to be Measurement error: The inherent uncertainty representative of the population. associated with any measurement, even after care - Incidental or convenience sample: One has been taken to minimize preventable mistakes. that is convenient or available for use. - Random error: Consists of unpredictable Types of Norms fluctuations and inconsistencies of other Percentile: An expression of the percentage of variables in the measurement process. people whose score on a test or measure falls - Systematic error: Do not cancel each other below a particular raw score. out because they influence test scores in - Percentage correct: The distribution of raw consistent direction. scores—to the number of items that were - Bias: The degree to which a answered correctly multiplied by 100 and measure predictably overestimates divided by the total number of items. or underestimates a quantity. Age norms (or age-equivalent scores): The Variance: The standard deviation squared. average performance of different samples of - True variance: Variance from true test-takers who were at various ages at the time the differences. test was administered. - Error variance: Variance from irrelevant, Grade norms: The average test performance of random sources. test-takers in a given school grade. Reliability: The proportion of the total variance - Developmental norms: Norms developed attributed to true variance. based of any trait, ability, skill, or other Sources of Error Variance characteristic that is presumed to develop, Test construction deteriorate, or otherwise be affected by - Item sampling or content sampling: chronological age, school grade, or stage of Variation among items within a test as well life. as to variation among items between tests. National norms: Derived from a normative sample Test administration that was nationally representative of the population - Test-taker’s attention or motivation. at the time the norming study was conducted. - Test environment National anchor norms: An equivalency table for - Test-taker variables scores on the two tests. - Examiner-related variables - Equipercentile method: The equivalency Test scoring and interpretation of scores on different tests is calculated with Reliability Estimates Test-retest reliability: An estimate of reliability - Heterogeneous: An estimate of internal obtained by correlating pairs of scores from the consistency might be low relative to a more same people on two different administrations of the appropriate estimate of test-retest reliability. same test. Dynamic vs. static characteristics - Coefficient of reliability: The estimate of - Dynamic characteristic: A trait, state, or test-retest reliability. ability presumed to be ever-changing as a Coefficient of equivalence: The degree of the function of situational and cognitive relationship between various forms of a test can be experiences. evaluated by means of an alternate-forms or - Static characteristic: A trait, state, or parallel-forms coefficient of reliability. ability presumed to be relatively Parallel forms: For each form of the test, the unchanging. means and the variances of observed test scores Restriction or inflation of range are equal. Speed tests vs. power tests - Parallel forms reliability: An estimate of - Power test: When a time limit is long the extent to which item sampling and other enough to allow test-takers to attempt all errors have affected test scores on versions items, and if some items are so difficult that of the same test when, for each form of the no test-taker is able to obtain a perfect test, the means and variances of observed score. test scores are equal. - Speed test: Contains items of uniform level Alternate forms: Simply different versions of a test of difficulty so that, when given generous that have been constructed so as to be parallel. time limits, all test-takers should be able to - Alternate forms reliability: An estimate of complete all the test items correctly. the extent to which these different forms of Criterion-referenced tests the same test have been affected by item - It provides an indication of where a sampling error, or other error. test-taker stands with respect to some Internal consistency estimate of reliability or variable or criterion. estimate of inter-item consistency: An estimate Classical Test Theory: A framework used in of the reliability of a test that can be obtained psychometrics to assess the reliability and validity without developing an alternate form of the test and of psychological tests. without having to administer the test twice to the Domain sampling theory: Seek to estimate the same people. extent to which specific sources of variation under Split-half reliability: Obtained by correlating two defined conditions are contributing to the test score. pairs of scores obtained from equivalent halves of a Generalizability theory: A person’s test scores single test administered once. vary from testing to testing because of variables in - Odd-even reliability: An estimate of the testing situation. split-half reliability. - Universe: The details of the particular test - Spearman-Brown formula: Estimate situation. internal consistency reliability from a - Facets: Include considerations such as the correlation between two halves of a test. number of items in the test, the amount of Inter-item consistency: The degree of correlation training the test scorers have had, and the among all the items on a scale. purpose of the test administration. Coefficient alpha: The mean of all possible - Universe score: Analogous to a true score split-half correlations. in the true score model. - Developed by Cronbach. - Generalizability study: Examines how Inter-scorer reliability: The degree of agreement generalizable scores from a particular test or consistency between 2 or more scorers with are if the test is administered in different regard to a particular measure. situations. - Coefficient of inter-scorer reliability: - Coefficients of generalizability: Measure Measures the consistency of scores given the extent to which test scores or by different evaluators. measurements can be generalized across The Nature of the Test different conditions. Homogeneity vs. heterogeneity of test items - Decision study: Developers examine the - Homogeneous: A test that is functionally usefulness of test scores in helping the test uniform throughout. user make decisions. Item Response Theory: A framework used in psychometrics that models the relationship between an individual’s latent traits and their coverage, the organization of the items in probability of answering items correctly on a test. the test, and so forth. - Also called latent-trait theory. Criterion-related validity: A judgment of how - Discrimination: The degree to which an adequately a test score can be used to infer an item differentiates among people with higher individual’s most probable standing on some or lower levels of the trait, ability, or measure of interest. whatever it is that is being measured. - Concurrent validity: An index of the - Dichotomous test items: Test items or degree to which a test score is related to questions that can be answered with only some criterion measure obtained at the one of two alternative responses. same time. - Polytomous test items: Test items or - Predictive validity: An index of the degree questions with 3 or more alternative to which a test score predicts some criterion responses. measure. - Rasch model: It estimates the probability of - Base rate: The extent to which a a correct response based on the difficulty of particular trait, behavior, an item and the ability of the respondent. characteristic, or attribute exists in Standard error of measurement: A tool used to the population. estimate or infer the extent to which an observed - Hit rate: The proportion of people a test score deviates from a true score. test accurately identifies as - Standard error of a score: An index of the possessing or exhibiting a particular extent to which one individual’s scores vary trait, behavior, characteristic, or over tests presumed to be parallel. attribute. - Confidence interval: A range or band of - Miss rate: The proportion of people test scores that is likely to contain the true the test fails to identify as having, or score. not having, a particular characteristic - Standard error of the difference: A or attribute. statistical measure that can aid a test user - False positive: A miss in determining how large a difference should wherein the test predicted be before it is considered statistically that the test-taker did significant. possess the particular Chapter 6: Validity characteristic or attribute Validity: A judgment or estimate of how well a test being measured when in fact measures what it purports to measure in a the test-taker did not. particular context. - False negative: A miss Inference: A logical result or deduction. wherein the test predicted Validation: The process of gathering and that the test-taker did not evaluating evidence about validity. possess the particular Validation studies: Assess the accuracy and characteristic or attribute reliability of a test or measurement tool by being measured when the examining how well it measures what it is intended test-taker did. to measure. Criterion: The standard against which a test or a - Local validation studies: Conducted in a test score is evaluated. specific setting or population to confirm that Characteristics of a criterion a test is valid and reliable for that particular - Relevant group, as test performance can vary based - Valid on factors like culture, language, or - Uncontaminated demographics. - Criterion contamination: A criterion Face validity: A judgment concerning how relevant measure that has been based, at the test items appear to be. least in part, on predictor measures. Content validity: A judgment of how adequately a Validity coefficient: A correlation coefficient that test samples behavior representative of the provides a measure of the relationship between test universe of behavior that the test was designed to scores and scores on the criterion measure. sample. Incremental validity: The degree to which an - Test blueprint: A plan regarding the types additional predictor explains something about the of information to be covered by the items, criterion measure that is not explained by predictors the number of items tapping each area of already in use. Construct validity: A judgment about the - Intercept bias: Occurs when the use of a appropriateness of inferences drawn from the test predictor results in consistent scores regarding individual standings on a variable underprediction or overprediction of a called a construct. specific group’s performance or outcomes. - Construct: An informed, scientific idea - Slope bias: Occurs when a predictor has a developed or hypothesized to describe or weaker correlation with an outcome for explain behavior. specific groups. Evidence of Construct Validity Rating: A numerical or verbal judgment (or both) Evidence of homogeneity that places a person or an attribute along a - Homogeneity: How uniform a test is in continuum identified by a scale of numerical or measuring a single concept. word descriptors known as a rating scale. Evidence of changes with age Rating error: A judgment resulting from the Evidence of pretest—post-test changes intentional or unintentional misuse of a rating scale. Evidence from distinct groups - Leniency error (or generosity error): An - Method of contrasted groups: error in rating that arises from the tendency Demonstrate that scores on the test vary on the part of the rater to be lenient in predictably as a function of membership in scoring, marking, and/or grading. some groups. - Severity error: At the other extreme. Convergent evidence: If scores on the test - Central tendency error: The rater, for undergoing construct validation tend to correlate some reason, exhibits a general and highly in the predicted direction with scores on systematic reluctance to give ratings at older, more established, and already validated tests either the positive or negative extreme. designed to measure the same construct. Rankings: A procedure that requires the rater to Discriminant evidence: A validity coefficient measure individuals against one another instead of showing little relationship between test scores against an absolute scale. and/or other variables with which scores on the test - Halo effect: A tendency to give a particular being construct-validated should not theoretically ratee a higher rating than the ratee be correlated. objectively deserves because the rater fails - Multitrait-multimethod matrix: The matrix to discriminate among conceptually distinct or table that results from correlating and potentially independent aspects of a }|variables within and between methods. ratee’s behavior. - Convergent validity: The correlation Test fairness: The extent to which a test is used in between measures of the same trait but an impartial, just, and equitable way. different methods. Chapter 7: Utility - Discriminant validity: The correlations of Test utility: The usefulness or practical value of different traits via different methods are near testing to improve efficiency. zero. Factors that affect a test’s utility - Method variance: The similarity in scores - Psychometric soundness due to using of the same method. - Costs Factor analysis: A class of mathematical - Benefits procedures designed to identify factors or specific Utility analysis: A family of techniques that entail a variables that are typically attributes, cost-benefit analysis designed to yield information characteristics, or dimensions on which people may relevant to a decision about the usefulness and/or differ. practical value of a tool of assessment. - Exploratory factor analysis: This entails How is a utility analysis conducted? estimating or extracting factors; deciding - Expectancy data how many factors to retain; and rotating - Taylor-Russell tables: Provide an factors to an interpretable orientation. estimate of the extent to which the - Confirmatory factor analysis: Test the inclusion of a particular test in the degree to which a hypothetical model fits selection system will improve the data. selection. - Factor loading: Conveys information about - Naylor-Shine tables: Entails how much the factor determines the test obtaining the difference between the score or scores. means of the selected and Bias: A factor inherent in a test that systematically unselected groups to derive an index prevents accurate, impartial measurement. of what the test (or other tool of assessment) is adding to already histogram containing items deemed established procedures. to be of equivalent value. - Brogden-Cronbach-Gleser formula: Used - Bookmark method: A to calculate the dollar amount of utility gain standard-setting technique where resulting from the use of a particular experts cut scores by marking the selection instrument under specified point on an ordered list of test items conditions. where a specific performance level - Utility gain: An estimate of the begins. benefit (monetary or otherwise) of - Other methods using a particular test or selection - Method of predictive yield: A method. technique for setting cut scores - Productivity gain: An estimated which took into account the number increase in work output. of positions to be filled, projections - Decision theory and test utility regarding the likelihood of offer Some practical considerations acceptance, and the distribution of - The pool of job applicants applicant scores. - The complexity of the job - Discriminant analysis: Used to - The cut score in use shet light on the relationship - Relative cut score: A reference between identified variables and two point that is set based on naturally occurring groups. norm-related considerations rather Chapter 8: Test Development than on the relationship of test Test Conceptualization scores to a criterion. Some preliminary questions - Norm-referenced cut score: It is - What is the designed to measure? set with reference to the - What is the objective of the test? performance of a group (or some - Is there a need for this test? target segment of a group). - Who will use this test? - Fixed cut score: A reference point - Who will take this test? that is typically set with reference to - What content will the test cover? a judgment concerning a minimum - How will the test be administered? level of proficiency required to be - What is the ideal format of the test? included in a particular classification. - Should more than one form of the test be - Multiple cut scores: The use of 2 or developed? more cut scores with reference to - What special training will be required of test one predictor for the purpose of users for administering or interpreting the categorizing test-takers. test? - Compensatory model of selection: - What types of responses will be required of An assumption is made that high test-takers? scores on one attribute can “balance - Who benefits from an administration of this out” or compensate for low scores test? on another attribute. - Is there any potential for harm as the result Methods for cutting scores of an administration of this test? - Angoff method: Devised by William Angoff, - How will meaning be attributed to scores on this method used for setting fixed cut scores this test? can be applied to personnel selection tasks Norm-referenced vs. criterion-referenced tests: as well as to questions regarding the Item development issues presence or absence of a particular trait, Pilot work: The preliminary research surrounding attribute, or ability. the creation of a prototype of the test. - Known groups method: Entails collection Test construction of data on the predictor of interest from Scaling: The process of setting rules for assigning groups known to process, and not possess, numbers in measurement. a trait, attribute, or ability of interest. Types of scale - IRT-based methods - Age-based scale: If the test-taker’s test - Item-mapping method: It entails performance as a function of age is of the arrangement of items in a critical interest. histogram, with each column in the - Stanine scale: If all raw scores on the test sentence that requires the are to be transformed into scores that can test-taker to indicate whether range from 1 to 9. the statement is or is not a Scaling methods fact. - Summative scale: The final score is - Completion item: Requires the obtained by summing ratings across all the examinee to provide a word or items. phrase that completes a sentence. - Likert scale: It is used in surveys to - Short-answer item: A type of measure attitudes or opinions, question that requires respondents where respondents indicate their to provide a brief, concise response, level of agreement or disagreement typically a few words or sentences, with a series of statements, typically to assess knowledge or on a 5- or 7-point scale. understanding of a specific topic. - Method of paired comparisons: - Essay item: A test item that requires Test-takers are presented with pairs of the test-taker to respond to a stimuli, which they asked to compare. question by writing a composition, - Sorting typically one that demonstrates - Comparative scaling: It entails recall of facts, understanding, judgments of a stimulus in analysis, and/or interpretation. comparison with every other - Constructed-response format: Require stimulus on the scale. test-takers to supply or to create the correct - Categorical scaling: Stimuli are answer, not merely to select it. placed into one of 2 or more Item bank: A relatively large and easily accessible alternative categories that differ collection of test questions. quantitatively with respect to some Computerized adaptive testing (CAT): An continuum. interactive, computer-administered test-taking - Guttman scale: Items on it range process wherein items presented to the test-taker sequentially from weaker to stronger are based in part on the test-taker’s performance expressions of the attitude, belief, or feeling on previous items. being measured. - Floor effect: The diminished utility of an - Scalogram analysis: An assessment tool for distinguishing item-analysis procedure and test-takers at the low end of the ability, trait, approach to test development that or other attribute being measured. involves a graphic mapping of a - Ceiling effect: The diminished utility of an test-taker’s responses. assessment tool for distinguishing Item format: The form, plan, structure, test-takers at the high end of the ability, trait, arrangement, and layout of individual test items. or other attribute being measured. - Selected-response format: Requires Item branching: The ability of the computer to test-takers to select a response from a set tailor the content and order of presentation of test of alternative responses. items on the basis of responses to previous items. - Multiple-choice format: Has 3 Scoring items elements: (1) a stem, (2) a correct - Class scoring (or categorical scoring): alternative or option, and (3) several Test-taker responses earn credit toward incorrect alternatives or options placement in a particular class or category variously referred to as distractors or with other test-takers whose pattern of foils. responses is presumably similar in some - Matching item: The test-taker is way. presented with two columns: - Ipsative scoring: Compares a test-taker’s premises on the left and responses score on one scale within a test to another to the right. scale within that same test. - Binary-choice item: A Test Tryout multiple-choice item that contains Item Analysis only two possible responses. Item-difficulty index (or item-endorsement - True-false item: A type of index): The statistic provides not a measure of the selected-response item percent of people passing the item but a measure usually takes the form of a of the percent of people who said yes to, agreed - Scoring drift: A discrepancy between with, or otherwise endorsed the item. scoring in an anchor protocol and the Item-reliability index: Provides an indication of the scoring of another protocol. internal consistency of a test; the higher the index, The Use of IRT in Building and Revising Tests the greater the test’s internal consistency. - Evaluating the properties of existing tests Item-validity index: A statistic designed to provide and guiding test revision. an indication of the degree to which a test is - Determining measurement equivalence measuring what it purports to measure. across test-taker populations. Item-discrimination index: Compares - Differential item functioning: A performance on a particular item with performance phenomenon wherein an item in the upper and lower regions of a distribution of functions differently in one group of continuous test scores. test-takers as compared to another Item-characteristic curve: A graphic group of test-takers known to have representation of item difficulty and discrimination. the same level of the underlying trait. Other Considerations in Item Analysis - DIF analysis: Test developers - Guessing scrutinize group-by-group item - Item fairness: The degree, if any, a test response curves, looking for DIF item is biased. items. - Biased test item: An item that - DIF items: Those items that favors one particular group of respondents from different examinees in relation to another groups at the same level of when differences in group ability are the underlying trait have controlled. different probabilities of - Speed tests endorsing as a function of Qualitative item analysis: A general term for their group membership. various nonstatistical procedures designed to - Developing item banks. explore how individual test items work. Chapter 9: Intelligence and its Measurement - “Think aloud” test administration: A Intelligence consists of the ability to: qualitative research tool designed to shed - Understand complex ideas; light on the test-taker’s thought processes - Adapt effectively to the environment; during the administration of a test. - Learn from experience; - Expert panels - Engage in various forms of reasoning; - Sensitivity review: A study of test - Overcome obstacles by taking thought. items, typically conducted during the A major thread running throough the theories of test development process, in which Binet, Wechsler, and Piaget is a focus on items are examined for fairness to all interactionism. prospective test-takers and for the - Interactionism: The complex concept by presence of offensive language, which heredity and environment are stereotypes, or situations. presumed to interact and influence the Test Revision development of one’s intelligence. Cross-validation: The revalidation of a test on a Louis L. Thurstone conceived of intelligence as sample of test-takers other than those on whom composed of primary mental abilities (PMAs). test performance was originally found to be a valid - He developed and published Primary predictor of some criterion. Mental Abilities test, which consisted of Validity shrinkage: The decrease in item validities separate tests, each designed to measure that inevitably occurs after cross-validation of one PMA: verbal meaning, perceptual findings. speed, reasoning, number facility, rote Co-validation: A test validation process conducted memory, word fluency, and spatial relations. on 2 or more tests using the same sample of Galton believed that the roots of intelligence were test-takers. to be found in the ability to discriminate between Co-norming: When used in conjunction with the small differences in sensations. creation of norms or the revision of existing norms. Although Binet never explicitly defined intelligence, Anchor protocol: A test protocol scored by a he discussed its components in terms of reasoning, highly authoritative scorer that is designed as a judgment, memory, and abstraction. model for scoring and a mechanism for resolving Wechsler’s define intelligence as the aggregate or scoring discrepancies. global capacity of the individual to act purposefully, to think rationally, and to deal effectively with his - Auditory processing (Ga) environment. - Quantitative reasoning (Gq) For Piaget, intelligence may be conceived of as a - Speed of processing (Gs) kind of evolving biological adaptation to the outside - Facility with reading and writing (Grw) world. - Short-term memory (Gsm) Factor-analytic theories: The focus is squarely on - Long-term storage and retrieval (GlrI) identifying the ability or groups of abilities deemed - Vulnerable abilities: They decline with age to constitute intelligence. and tend not to return to preinjury levels Information-processing theories: The focus is on following brain damage. identifying the specific mental processes that occur - Maintained abilities: They tend not to when intelligence is applied to solving a problem. decline with age and may return to preinjury Factor-Analytic Theories of Intelligence levels following brain damage. As early as 1904, the British psychologist Charles John Carroll proposed the three-stratum theory of Spearman pioneered new techniques to measure cognitive abilities because he thought intelligence intercorrelations between tests. is best described at three levels (or strata): general, - Two-factor theory of intelligence broad, and narrow. - General factor g - Cattell-Horn-Carroll theory of cognitive - Specific ability s abilities (termed by Kevin McGrew) - Measurement error e The Information-Processing View - Group factors: An intermediate Aleksandr Luria class of factors common to a group - Her approach focuses on the mechanisms of activities but not to all. by which information is processed—how Gardner developed a theory of multiple intelligence: information is processed. - Logical-mathematical - 2 basic types of information processing - Bodily-kinesthetic styles: - Linguistic - Simultaneous (or parallel) - Musical processing: Information is - Spatial integrated all at one time. - Interpersonal: The ability to understand - Successive (or sequential) other people: what motivates them, how processing: Each bit of information they work, how to work cooperatively with is individually processed in them. sequence. - Successful sales people, politicians, PASS model of intellectual functioning teachers, clinicians, and religious - Planning: Strategy development for leaders. problem-solving - Intrapersonal: A capacity to form an - Attention (or arousal): Receptivity to accurate, veridical model of oneself and to information be able to use that model to operate - Simultaneous effectively in life. - Successive Through Gardner’s intrapersonal and interpersonal Measuring Intelligence intelligence, Mayer and his colleagues proposed In infancy, intellectual assessment consists emotional intelligence. primarily of measuring sensorimotor development. - They hypothesize the existence of specific The focus in evaluation of the older child shifts to brain modules that allow people to perceive, verbal and performance abilities. understand, use, and manage emotions According to Wechsler, adult intelligence scales intelligently. should tap abilities such as retention of general Raymond B. Cattell information, quantitative reasoning, expressive - General fluid intelligence (Gf): Its function language and memory, and social judgment. is to identify novel patterns, solve unfamiliar Tests of intelligence are seldom administered to problems, and acquire new knowledge. adults for purposes of educational placement; - General crystallized intelligence (Gc): A rather, they may be given to obtain clinically repository of knowledge and skills that have relevant information or some measure of learning proved useful in solving problems in the potential and skill acquisition. past. The Stanford-Binet Intelligence Scales: Fifth Edition Horn (SB5) - Visual processing (Gv) - It was the first published intelligence test to young as 2 and as old as 85 (or provide organized and detailed older). administration and scoring instructions. - The test yields a number of - It was also the first American test to employ composite scores, including the concept of IQ. a Full Scale IQ derived from - It was the first test to introduce the concept the administration of 10 of an alternate item, an item to be subtests. substituted for a regular item under - All composite scores have a specified conditions. mean set at 100 and a - Revisions: standard deviation of 15. - Innovations in the 1937 scale - The test yields 5 Factor included the development of two Index scores corresponding equivalent forms, labeled L (for to each of the 5 factors that Lewis) and M (for Maud), as well as the test is presumed to new types of tasks for use with measure. pre-school level and adult-level - Fluid reasoning test-takers. - Knowledge - The use of deviation IQ tables in - Quantitative place of the ratio IQ tables. reasoning - Ratio IQ was based on the - Visual-spatial concept of mental age (the processing age level at which an - Working memory individual appears to be - Routing test: A task used to direct functioning intellectually as or route the examinee to a particular indicated by the level of level of questions. items responded to correctly. - Teaching items: Designed to 𝑚𝑒𝑛𝑡𝑎𝑙 𝑎𝑔𝑒 - 𝑟𝑎𝑡𝑖𝑜 𝐼𝑄 = 𝑐ℎ𝑟𝑜𝑛𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑎𝑔𝑒 𝑥 100 illustrate the task required and assure the examiner that - The deviation IQ reflects a the examinee understands. comparison of the - Floor: The lowest level of the items performance of the individual on a subtest. with the performance of - Ceiling: The highest-level item of others of the same age in the the subtest. standardization sample. - Basal level: A base-level criterion - The third revision was criticized as that must be met for testing on the the manual was vague about the subtest to continue. number of racially, ethnically, - If and when examinees fall a certain socioeconomically, or culturally number of items in a row, a ceiling diverse individuals in the level is said to have been reached standardization sample, stating only and testing is discontinued. that a “substantial portion” of Black - Adaptive testing: Testing and Spanish-surnamed individuals individually tailored to the test-taker. was included. - Extra-test behavior: The way the - The fourth edition employed a point examinee copes with frustration; scale, a test organized into subtests how the examinee reacts to items by category of item, not by age (age considered easy; the amount of scale) at which most test-takers are support the examinee seems to presumed capable of responding in require; the general approach to the the way that is keyed as correct. task; how anxious, fatigued, - Test composite: A test score cooperative, distractible, or or index derived from the compulsive the examinee appears to combination of, and/or be. mathematical transformation of, one or more subtest Measured IQ Range Category scores. - The fifth edition was designed for 145-160 Very gifted or highly administration to assessees as advanced - Picture Completion 130-144 Gifted or very The WAIS-IV standardization sample consisted of advanced 2,200 adults from the age of 16 to 90 years, 11 120-129 Superior months. Wechsler Intelligence Scale for Children (WISC) 110-119 High average and Wechsler Pre-School and Primary Scale of Intelligence (WPPSI) 90-109 Average The Wechsler tests are all point scales that yield deviation IQs with a mean of 100 and a standard 80-89 Low average deviation of 15. Short Forms of Intelligence Tests 70-79 Borderline impaired or Short form: A test that has been abbreviated in delayed length, typically to reduce the time needed for test administration, scoring, and interpretation. 55-69 Mildly impaired or Group Tests of Intelligence delayed Army Alpha test: The test would be administered to Army recruits who could read. 40-54 Moderately impaired or Army Beta test: Designed for administration to delayed foreign-born recruits with poor knowledge of The Wechsler Tests English or to illiterate recruits. In the early 1930s, psychologist David Wechsler’s Screening tool: An instrument or procedure used employer, Bellevue Hospital in Manhattan, needed to identify a particular trait or constellation of traits an instrument for evaluating the intellectual at a gross or imprecise level. capacity of its multilingual, multinational, and Other Measures of Intellectual Abilities multicultural clients. Cognitive style: A psychological dimension that Core subtest: One that is administered to obtain a characterizes the consistency with which one composite score. acquires and processes information. Supplemental subtest (or optional subtest): It is Measures of creativity: used for purposes such as providing additional - Originality: The ability to produce something clinical information or extending the number of that is innovative or nonobvious. abilities or processes sampled. - Fluency: The ease with which responses The WAIS-IV contains 10 core subtests: are reproduced and is usually measured by - Block Design the total number of responses produced. - Similarities - Flexibility: The variety of ideas presented - Digit Span and the ability to shift from one approach to - Matrix Reasoning another. - Vocabulary - Elaboration: The richness of detail in a - Arithmetic verbal explanation or pictorial display. - Symbol Search Convergent thinking: A deductive reasoning - Visual Puzzles: The assessee’s task is to process that entails recall and consideration of identify the parts that went into making a facts as well as a series of logical judgments to stimulus design. narrow down solutions and eventually arrive at one - Information solution. - Coding Divergent thinking: A reasoning process in which And 5 supplemental subtests: thought is free to move in many different directions, - Letter-Number Sequencing making several solutions possible. - Figure Weights: The assessee’s task is to Issues in the Assessent of Intelligence determine what needs to be added to Culture-free intelligence test: Designed to balance a two-sided scale—one that is minimize cultural biases, aiming to assess cognitive reminiscent of the “blind justice” type of abilities without being influenced by the test-taker’s scale. cultural background, language, or social - Comprehension experiences. - Cancellation: A timed subtest used in Culture loading: The extent to which a test calculating the Processing Speed Index, the incorporates the vocabulary, concepts, traditions, assessee’s task is to draw lines through knowledge, and feelings associated with a targeted pairs of colored shapes. particular culture. Culture-fair intelligence test: A test or Goodenough-Harris Drawing Test (G-HDT): One assessment process designed to minimize the of the quickest, easiest, and least expensive to influence of culture with regard to various aspects administer of all ability tests. of the evaluation procedures, such as - Draw a picture of a whole man and to do the administration instructions, item content, responses best job possible. required of test-takers, and interpretations made - Each detail is given one point. from the resulting data. Culture Fair Intelligence Test (CFIT): Provide an Flynn effect: The progressive rise in intelligence estimate of intelligence relatively free of cultural test scores that is expected to occur on a normed and language differences. test of intelligence from the date when the test was - Covers 3 levels and randomly selected first normed (by James R. Flynn). adults. Chapter 10: Assessment for Education Aptitude Tests Achievement Tests Aptitude tests: Tend to focus more on informal Achievement tests: Designed to measure learning or life experiences. accomplishment/past learning. - Also referred to as prognostic tests: used - Designed to measure the degree of learning to predict future behavior. that has taken place as a result of exposure Checklist: Questionnaire on which marks are to a relatively defined experience. made to indicate the presence or absence of a - Adequately samples the targeted subject behavior. matter and reliably gauges the extent to Rating scales: Form completed by an evaluator to which the examinees have learned it. make a judgment of relative standing with regard to - Help in making decisions about placements, specific variable or list of variables. gauging the quality of instruction in Informal evaluation: Nonsystematic, relatively particular institution, screen for difficulties, brief and off the record assessment leading to the etc. formation of an opinion or attitude conducted by Wechsler Individual Achievement Test–III any person, in any way, for any reason, in an Edition (WIAT-III): Designed for use in the schools unofficial context that is not subject to ethics or as well as clinical and research settings. other standards of an evaluation by a professional. - Potential to yield actionable data relating to Tests such as WPPSI-III and SB5 may be used to student achievement in academic areas gauge developmental strengths and weaknesses such as reading, writing, etc. by sampling children’s performance in cognitive, The test most appropriate for use is the one most motor, and social/behavioral content areas. consistent with the educational objectives of the The most obvious example of aptitude test is individual teacher or school system. Scholastic Aptitude Test (SAT). Curriculum-based Assessment: Used to refer to Diagnostic Tests assessment of information acquired from teaching Evaluative: Applied to tests or test data that are at school. used to make judgments. Curriculum-based Measurement: Characterized Diagnostic information: Used in educational by the use of standardized measurement context is typically applied to test or test data used procedures to derive local norms to be used in the to pinpoint a student’s difficulty. evaluation of student performance. Diagnostic test: Used to identify areas of deficit to Different types of achievement test: be targeted for intervention. - Fact-based items: One that draws primarily Woodcock Reading Mastery Tests-Revised on facts and how to apply those facts. (WRMT-III): Measure of reading readiness, reading - Conceptual items: Designed to measure achievement, and reading difficulties. mastery of the material. Stanford Diagnostic Mathematic Test, Fourth Raven Progressive Matrices (RPM): One of the Edition (SDMT4) best-known and most popular nonverbal group KeyMath Diagnostic System tests. Brazelton Neonatal Assessment Scale: Individual - Suitable test anytime one needs an tests for infants between 3 days and 4 weeks of estimate of an individual’s general age to provide an index of newborn’s competence. intelligence. - Reflexes, response to stress, startle - Original RPM has 60 items, which were reactions, cuddliness, motor maturity, ability believed to be of increasing difficulty. to habituate to sensory stimuli, and hand-mouth coordination. Gesell Developmental Schedule: One of the Bender Visual Motor Gestalt Test: Consist of 9 oldest and the most established infant intelligence geometric figures that the subjects is simply asked measures. to copy. - Gesell Maturity Scale, Gesell - Anyone older than 9 who cannot copy the Developmental Observation, Yale Tests of figures may suffer from some type of deficit. Child Development Chapter 11 and 12: Personality - Provide an appraisal of the developmental Personality and Personality Assessment status of the children from 2.3 months to 6.3 McClelland: Personality is the most adequate years of age. conceptualization of a person’s behavior in all its - Developmental quotient: Determined by a detail. test score, evaluated by assessing the Menninger: Personality as the individual as a presence and absence of behavior whole, his height and weight and love and hates associated with maturation. and blood pressure and reflexes; his smiles and Bayley Scales of Infant and Toddler hopes and bowed legs and enlarged tonsils. It Development: Assesses on normative means all that anyone is that and he is trying to maturational developmental data, designed for become. infants between 1 and 42 months old and assesses Personality: Individual’s unique constellation of development across 5 domains: cognitive, psychological traits that is relatively stable over language, motor, socioemotional, and adaptive. time. Cattell Infant Intelligence Scale: Designed as a Personality assessment: The measurement and downward extension of Stanford-Binet Scale for evaluation of psychological traits, states, values, infants and preschoolers between 2 and 30 months interests, attitudes, worldview, acculturation, sense of age, measure intelligence of infants and young of humor, cognitive and behavioral styles, and/or children. related individual characteristics. Psychoeducational Batteries Personality traits: Real physical entities that are Psychoeducational test batteries: Generally bona fide mental structures in each personality contain two types: those that measure abilities (Allport). related to academic success and those that - Trait is a generalized and focalized measure educational achievement. neuropsychic system with the capacity to Kaufman Assessment Battery for Children render many stimuli functionally equivalent, (K-ABC): Designed for ages 2½ through 12½ that and to initiate and guide consistent forms of measures both intelligence and achievement. adaptive and expressive behavior. - KABC-II was published in 2004 and the age - Any distinguishable, relatively enduring way range was extended up to 18 years old. in which one individual varies from another Woodcock-Johnson IV (WJ IV): Consisting three (Guilford). co-normed batteries: achievement, cognitive, and Personality type: Constellation of traits is similar in oral language ability. pattern to one identified category of personality Columbia Mental Maturity Scale-Third Edition within a tax