Chapter 6: Test Development PDF

CHAPTER 6: TEST DEVELOPMENT 4. Who will use the test? Clinicians? Educators? Others? For what purposes would this test be used? Test Development...

CHAPTER 6: TEST DEVELOPMENT 4. Who will use the test? Clinicians? Educators? Others? For what purposes would this test be used? Test Development 5. Who will take this test? The process of test development occurs in five stages: Who is this test for? 1. Test conceptualization Who needs to take it? 2. Test construction Who would find it desirable to take it? 3. Test tryout For what age range of test takers is the test designed? 4. Item analysis What reading level is required of a test taker? 5. Test revision What cultural factors might affect test taker response? I. Test Conceptualization 6. What content will the test score cover? o The beginnings of any published test can probably be traced to Why should it cover the content? thoughts – self-talk, in behavioral terms. Is this coverage different from the content coverage of existing tests with the o A review of the available literature on existing tests designed to same or similar objectives? measure a particular construct might indicate that such tests leave How and why the content area different? much to be desired in psychometric soundness. To what extent this content culture-specific? o An emerging social phenomenon or pattern of behavior might serve as 7. How will the test be administered? the stimulus for the development of a new test. Individually or group? o Involves planning the construct to be measured, scaling items, and Is it amenable to both group and individual administration? scoring. What difference will exist between individual and group administrations of Some preliminary questions: this test? 1. What is the test designed to measure? Will the test be designed for or amenable to computer administration? Its answer is closely linked to how the test developer defines the construct being measured and how that definition is the same as or different from other 8. What is the ideal format of the test? tests purporting to measure the same construct. Should it be true-false, essay, multiple-choice, or in some other format? Why is the format selected for this test is the best format? 2. What is the objective of the test? In the service of what goal will the test be employed? 9. Should more than one form of the test be developed? In what way or ways is the objective of this test the same as or different from On the basis of a cost-benefit analysis, should alternate or parallel forms of this other tests with similar goals. test be created? 3. Is there a need for this test? 10. What special training will be required of test users for administration Are there any other tests purporting to measure the same thing? or interpreting the test? In what ways will the new test be better than or different from existing ones? What background and qualifications will prospective user of data derived from Will there be more compelling evidence for its reliability or validity? an administration of this test need to have? Will it be more comprehensive? What restrictions, if any, should be placed on distribution of the test and n the Will it take less time to administer? tests usage? In what wats would this test not be better than existing tests? 11. What types of responses will be required of test takers? What kind of disability might preclude someone from being able to take this 1. Scale- set of numbers whose properties model empirical properties. test? Scaling Methods What adaptions or accommodations are recommended for persons with a. Rating Scale - grouping of words/ statements on which judgment of disability? the strength of a particular trait / attitude / emotion are indicated by the test taker. 12. Who benefits from an administration of this test? b. Likert Scale - Each item presents the test taker with five (or four, or What would the test taker learn, or how might the test taker benefit from administration of the test? seven), usually on an agree / disagree or approve / disapprove What would the test user learn, or how might the test user benefit? continuum. What social benefit, if any, derives from an administration of this test? 13. Is there any potential for harm as the result of an administration of this test? What safeguards are built into the recommended testing procedure to prevent any sort of harm to any of the parties involved in the use of this test? 14. How will meaning be attributed to scores on this test? What safeguards are built into the recommended testing procedure to prevent any sort of harm to any of the parties involved in the use of this test? 15. How will meaning be attributed to scores on this test? Will a test taker’s score be compared to others taking the test at the same time? To others in a criterion group? Will the test evaluate mastery of a particular content area? c. Paired Comparisons - Test takers are presented with pairs of stimuli, Pilot Work which they are asked to compare. They must choose one over the other. o Preliminary tasks surrounding the creation of the prototype of the test. o Involves writing preliminary items to know what might possibly the III. Item Writing content of the test. “rough draft” of the test, where content, scaling and o refers to the process of writing items to the construct that we intend to scoring may be tested in a very small group. measure. Pilot Study o Standardized tests are required to conduct a pilot study prior to test Item Pool - refers to the process of writing items to the construct that we intend construction, but for tests used only in small settings such as class to measure. achievement test, pilot study is not required. Item Format- refers to the process of writing items to the construct that we intend to measure. II. Test Construction a. Selected Response Format- The test taker chooses an answer from 1) Scaling the options provided. 2) Items 1. Multiple-choice questions 3) Scoring 2. Matching type 3. True or false a.1. An item written in Multiple-choice Format has 3 elements: 1) stem, 2) correct alternative, and 3) several incorrect alternatives / distractors or foils. Floor effect - refers to the diminished utility of an assessment tool for distinguishing test takers at the low end of the ability, trait, or other attribute a.2. In matching type, the test-taker is presented with two columns: premises being measured. Test takers who have not yet achieved such ability might fail on the left, responses on the right. all of the items. a.3. Binary choice format Ceiling Effect - refers to the diminished utility of an assessment tool for distinguishing test takers at the high end of the ability, trait, or other attribute True or False; Yes or No; Fact or Opinion being measured. The test user would conclude that the test was too easy for this group of test takers and so discrimination was impaired by a ceiling effect. b. Constructed- response format- The test taker is required to provide a word or phrase that completes a sentence. 1. Completion item 2. Short-answer item 3. Essay item CHAPTER 7: COMMONLY USED PSYCHOLOGICAL TESTS INDIVIDUALLY ADMINISTERED DEVELOPER ADMINISTRATION PURPOSE/GOAL PSYCHOMETRIC TYPE OF SAMPLE ITEMS INTELLIGENCE OF THE TEST PROPERTIES SCALE USED TESTS 1.STANFORD Gale H. Roid in ages 2 to 89 years 11 Designed to test an Range of score runs The scale BINET 5 2003 months. individual’s from a low of 40 provides a Full intelligence (Moderately Scale IQ (FSIQ), quotient (IQ) and Impaired/Delayed) Non Verbal (NV) cognitive abilities. to a high of 160 and Verbal (V) (Very Gifted or domain scores as Highly Advanced.) well as 5 factor Reliability for the scores: SB5 is very high, 1. Fluid and extensive Reasoning (FR) validity studies were 2. Knowledge conducted, including (KN) clinical-group 3. Quantitative differences, Reasoning (QR) correlations with 4. Visual-Spatial other tests, age Processing (VS) trends, factor 5. Working structure, and Memory (WM) consequential validity. 2.WECHLER The original SCALES WAIS (Form I) was published in February 1955 by David Wechsler at Bellevue Hospital in NYC as a revision of the Wechsler– Bellevue Intelligence A. WECHLER Scale, released children ages 2 years Measures The reliability The WPPSI-R PRESCHOL in 1939. 6 months to 7 years 7 cognitive coefficient of the battery provides a AND David months. development such test is between full-scale IQ but PRIMARY Wechsler in as how to think good (0.86) to also has separate SCALE 1967. verbal and and problem- excellent (≥.90) (WPPSI) performance solve, thinking range. On the other scales. The verbal processes, and hand, it is said to subtests include decision-making measure what it information, skills. intends to comprehension, measure. arithmetic, vocabulary, similarities, and sentences. The performance subtests include object assembly, block design, mazes, picture completion, geometric design, and animals/pegs. The battery allows the examiner to assist the child on early items to ensure that the child understands the test requirements. B. WECHLER David children between the An IQ test and WISC has been WISC uses a INTELLIGENCE Wechsler in ages of 6 and 16 assesses cognitive shown to have high norm-referenced SCALE FOR 1949. years old. abilities in children. reliability and scale, which CHILDREN It is widely used in validity in means that a educational measuring child's score is institutions to intelligence in compared to the identify strengths children, with scores of other and weaknesses in internal consistency children of the cognitive ranging from 0.7 to same age group functioning, and to 0.9 and test-retest who have taken inform educational reliability ranging the test. The test and intervention from 0.7 to 0.9. The is made up of 15 planning. WISC is also widely subtests, which used in research and are used to clinical settings to calculate scores assess cognitive for several abilities in children different domains, with various including verbal disabilities. comprehension, perceptual reasoning, working memory, and processing speed. C. WECHLER David people who are 16–90 To measure a wide The WAIS uses a ADULT Wechsler in years of age. range of cognitive standardized INTELLIGENCE 1955. abilities, including scale, with a SCALE verbal mean score of Originally comprehension, 100 and a called the perceptual standard Wechsler– reasoning, working deviation of 15. Bellevue memory, and Scores on Intelligence processing speed. It individual Scale and is intended to subtests and on developed at provide a overall IQ are the Bellevue comprehensive reported in Hospital in profile of an standard scores, NYC. individual's percentile ranks, intellectual and scaled scores. strengths and weaknesses, and to be used in a variety of settings, including clinical, educational, and research settings 3. Donald D. between 6 and 89 Measures Internal consistency, The production of COMPREHENSIVE Hammill, PhD years and 11 months. analogical test–retest stability, three composite TEST OF Nils A., reasoning, and interscorer scores is one of NONVERBAL Pearson J. Lee categorical agreement were used the main features INTELLIGENCE in 2009 classification, and to measure the of the CTONI-2 2ND EDIION sequential reliability of scores test. These reasoning by using produced by the include a picture pictures of familiar CTONI-2. scale, geometric objects and Content description scale, and overall geometric designs. validity, construct score. The identification addition of scores validity, and of the three criterion prediction subtests involving validity were pictorial objects evaluated by the gives us the authors to provide composite score evidence that the test of the pictorial measures general scale. The intelligence. composite score for geometric scale, unlike others, is the total of three subtests that use geometric objects. 4.KAUFMAN Alan S. 3 years to 18 Strengthening the The construct For the CHC ASSESSMENT Kaufman and years, 11 months theoretical validity of Kaufman model, scales are BATTERY FOR Nadeen L. of age. foundations, and Kaufman identified as CHILDREN- Kaufman in increasing the present Short-Term SECOND 2004 number of intercorrelations Memory (Gsm), EDITION constructs among subtests and Visual Processing measured, results of factor (Gv),Learning enhancing the test’s analyses at various Ability (Gl), clinical utility, stages of Fluid Reasoning developing a test development. (Gf), and that fairly assesses Crystallized children from Ability (Gc)The minority groups, global score for and enhancing fair the CHC model is assessment of called the Fluid- preschoolers. Crystallized Index (FCI). 5.WOODCOCK Richard ages 2 to 90 years old. To assess and The Woodcock- The Woodcock- JOHNSON III Woodcock and evaluate an Johnson III (WJ III) Johnson III (WJ COMPLETE Mary E. individual's test is highly III) test utilizes BATTERY Bonner cognitive abilities reliable, valid, and cognitive scales Johnson in and academic standardized. It to measure 2001 achievement. accurately measures cognitive abilities cognitive abilities and achievement and academic scales to assess achievement, with academic norms reflecting proficiency. It diverse populations. also includes Its sensitivity diagnostic and identifies strengths supplemental and weaknesses, scales for while specificity additional distinguishes insights and between different normative scales levels of ability or for comparison to achievement. standardized groups. These scales offer a comprehensive evaluation of an individual's abilities and performance across different domains. 6.SLOSSON Richard ages 4 to 65 years old. A verbal screening Research confirms IQ scale being INTELLIGENCE Slosson in 1963 measure of that the SIT is a one of the few SCALE (SIT) cognitive ability for valid, reliable, measures children and adults individual IQ test, assessing the and is ideal for nine correlational Infant, toddler, those with visual studies falling and preschool impairment, reading within.90's range, years (two and disabilities, or other and the test does not above), it can also condition. produce results that be used with are significantly severely/profound administrator, or Mentally subjects biased. Challenged populations and it range from 10 to 164. 7.UNIVERSAL Bracken & ages 5 years 0 months Measures the Inter-rater scorer The UNIT2 NONVERBAL McCallum, in to 21 years 11 months general intelligence consistency was subtests produce INTELLIGENCE 2015 and three assessed by having raw scores, age TEST II foundational two PRO-ED staff equivalents, cognitive abilities members scaled scores (memory, fluid independently score (standard scores reasoning, and 50 protocols drawn with a mean of 10 quantitative at random from the and standard reasoning) normative sample. deviations set to The resulting inter- 3), index or rater coefficients composite scores ranged from 0.98 to (standard scores 0.99, indicating with a mean of excellent scoring 100 and standard consistency. deviations set to 15), and percentile ranks. GROUP- ADMINISTERED DEVELOPER ADMINISTRATION PURPOSE/GOAL PSYCHOMETRIC TYPE OF SAMPLE ITEMS INTELLIGENCE AND OF THE TEST PROPERTIES SCALE USED APTITUDE TESTS 1. RAVEN’S John C. individuals aged 6 Assess general The test has high Matrices uses a PROGRESSIVE Raven in years and above, human reliability and standardized MATRICES 1938 and it can be intelligence and validity. The test score scale, administered to abstract has been shown to with a mean groups or reasoning be a reliable and score of 100 individuals. abilities. This test accurate measure and a standard is used in of general deviation of 15. employment cognitive ability This means that assessments, and is widely used scores on the clinical research in educational and test are scaled on Autism clinical settings to a normal Spectrum distribution, Disorders, and to allowing for identify easy individuals with interpretation high intellectual and comparison potential. of scores. 2. STANDARD John C. individuals aged 6 Measure The test has high The SPM uses a PROGRESSIVE Raven in and above, including nonverbal reliability and standardized MATRICES 1938 adults. reasoning validity. The test is score scale, (SPM) abilities. Assess widely used in with a mean general cognitive research and score of 100 abilities, including practical settings, and a standard abstract reasoning, and it has been deviation of 15. problem-solving, adapted for use in This means that and spatial multiple languages. scores on the visualization. test are scaled to a normal distribution, allowing for easy interpretation and comparison of scores. 3. ADVANCED John C. adults and older Measure high- The test has high Uses a PROGRESSIVE Raven in adolescents (ages 16 level reasoning reliability and standardized MATRICES 1940s. and up) abilities that are validity. The test score scale, (APM) relatively has been shown to with a mean independent of be a good predictor score of 100 language and of academic and and a standard educational occupational deviation of 15. background. success, and it is This means that Assess non-verbal widely used in scores on the reasoning ability, research and test are scaled specifically the clinical settings. to a normal ability to perceive distribution, relationships and allowing for make inferences easy based on visual interpretation patterns. and comparison of scores. 4. CULTURE FAIR Raymond B. individuals aged 12 Measure cognitive The test has high Uses a INTELLIGENCE Cattell in and up abilities that are reliability and standardized TEST 1949. relatively validity. The test is score scale, independent of widely used in with a mean language, cultural research and score of 100 background, and clinical settings, and a standard educational level. and it has been deviation of 15. Assess general translated into This means that intelligence as multiple languages. scores on the well as specific test are scaled abilities such as to a normal fluid reasoning, distribution, spatial ability, and allowing for quantitative easy ability. interpretation and comparison of scores. 5. PURDE NON- Ralph M. individuals aged 10 Assess intellectual The test has high The PNLT uses LANGUAGE Stogdill and and up functioning in reliability and a standardized TEST Harry W. individuals who validity. The test is score scale, Davis at may have widely used in with a mean Purdue difficulty with research and score of 10 and University in verbal tests, such practical settings, a standard the 1940s. as those with and it has been deviation of 3. language adapted for use in This means that impairments or multiple languages scores on the those from non- and cultures. test are scaled English speaking to a normal backgrounds. distribution, allowing for easy interpretation and comparison of scores. 6. SRA VERBAL Louis individuals aged 8 to Assess general The test has high The SRA AND Thurstone in 70 years cognitive abilities, reliability and Verbal and NONVERBAL 1973 including verbal validity. The test is Nonverbal FORM and nonverbal often used in Form uses a reasoning, educational and standardized problem- solving, clinical settings to score scale, and abstract identify individuals with a mean thinking. with high levels of score of 100 cognitive ability or and a standard to diagnose deviation of 15. learning This means that disabilities. scores on the test are scaled to a normal distribution, allowing for easy interpretation and comparison of scores. 7. THURSTONE Louis Leon 13 years old and A pre- In earlier research, TEST OF Thurstone above It can be taken employment Rossini, Wygonik, MENTAL and Thelma online assessment that Barrett, and ALERTNESS Gwin evaluates general Friedman (1994) Thurstone in mental ability and demonstrated that Chicago, capability to learn the Thurstone Test Illinois, new information of Mental Alertness United States and acquire new (TMA) is a valid, of Merica in skills through 126 brief measure of 1943 numerical and intelligence by verbal questions. comparing it to the Wechsler Scale of Four job-related Adult Intelligence- tasks: Revised. - adjusting to new situations - learning new skills quickly -understanding complex or subtle relationships - thinking flexibly 8. REVISED BETA Basic 16 to 69.11 years Measure the Test-retest Raw scores on EXAMINATION revision: C. E. general reliability each of the six Kellogg and intellectual ability coefficient: r=.91 tests are N. W. Morton of persons who are converted to Most recent relatively scaled scores revision in illiterate, or non- through the use 1984: Robert English speaking, of a table Lindner and or suspected of provided in the Milton Gurvit having other Revised Beta Publisher: language Examination Psychological difficulties. Used as Manual. Association a nonverbal measure for 1920 - members of the introduction general of the Group population. Examination Beta Group Examination Beta: developed by the United States Army during the First World War for the assessment of the intellectual ability of illiterate recruits 1934- revised to make the instrument more suitable for use civilian purposes designated as the Revised Beta Examination 1978-a second revision of the instrument was introduced known as the Revised Beta Examination, Second Edition, or Beta II. 9. WONDERLIC Eldon F. For ages 16 and Measure cognitive Good psychometric Uses a 1. Which word is the opposite COGNITIVE Wonderlic in above abilities that are properties; high standardized of "friendly"? 1936 relevant to success reliability and score scale, A. Amicable ABILITY TESTS in a wide range of validity. The with a mean B. Hostile (WCAT) occupations. reliability of the score of 20 and C. Congenial Often used in wonderlic is in the a standard D. Cordial employment and range of.91 and deviation of 5. 2. If a car travels 60 miles in educational.93. This means one hour, how many miles will settings to screen that scores on it travel in 3 hours? candidates for the test are A. 80 miles suitability and to scaled to a B. 120 miles identify normal C. 140 miles individuals with distribution, D. 180 miles high levels of allowing for cognitive ability. easy Its purpose is to interpretation assess general and cognitive ability, as comparison of well as specific scores. abilities such as verbal and quantitative reasoning, spatial perception, and general knowledge. 10. OTIS-LENNON Arthur S. individuals aged 12 Measures general Strong Uses a MENTAL Otis and and above. cognitive ability psychometric standardized ABILITY TESTS Roger T. and specific properties with score scale, Lennon in abilities such as high reliability and with a mean 1925. verbal validity It is widely score of 50 and comprehension, used in research a standard quantitative and practical deviation of ability, and settings, and it has 10. spatial been translated into This means that perception. Assess multiple languages. scores on the intellectual test are scaled functioning in to a normal individuals and distribution, identify those allowing for with exceptional easy intellectual interpretation abilities. Often and comparison used in educational of scores. and clinical settings to screen individuals for gifted and talented programs or to diagnose intellectual disabilities. 11. OTIS-LENNON Originally Available in different Assessment of SCHOOL published in levels for students in verbal and ABILITY TESTS 1979 as a kindergarten nonverbal modification through grade 12, it reasoning abilities and currently consists of that are replacement various tasks. predictors of of the earlier success in school. Otis–Lennon Designed to mental ability measure verbal test, now in its comprehension, eighth edition and verbal, (published in pictorial, figural, 2003) and quantitative reasoning. 12. WATSON Goodwin Individuals who Measures the skills Sample Question #5 – GLASER Watson and are in graduate, required to present Evaluation of Arguments CRITICAL Edward professional, a certain point of Should parents put their THINKING TEST Glaser in and managerial view in a clear, children in preparation 1925 recruitment. well- structured, courses for gifted tests, in well-reasoned, and order for them to reach their persuasive way to full potential? convince others of Yes. Parents are responsible your argument for their children’s future and assesses a potential should do whatever they can candidate’s critical to help them succeed in life. thinking skills. A. Strong argument Employers want B. Weak argument to evaluate your ability to identify assumptions, dissect arguments, and draw conclusions. 13. PANUKAT NG Aurora R. For Filipino whose In school: Reliable and valid Wechsler Verbal Comprehension KATALINUHANG Palacio, ages range from 16 Serves as the basis Intelligence (Talasalitaan) PILIPINO Ed.D. in 1991 and above for screening, Scales classifying, and 1. Si Ninoy ay may matatag na identifying needs paninindigan to enhance the learning process. a. makabayan b. marangal In business and c. matayog industry: d. matibay Utilized as a predictor of occupational achievement. It serves as the gauge of an applicant's ability and fitness for a particular job and for promotional purposes. APTITUDE TESTS DEVELOPER ADMINISTRATION PURPOSE/GOAL PSYCHOMETRIC TYPE OF SAMPLE ITEMS OF THE TEST PROPERTIES SCALE USED 1. DIFFERENTIAL Harold G. Individuals who are Assess an Undergone extensive Utilizes a norm- Verbal Reasoning: APTITUDE Raven, John considering individual's psychometric referenced TEST (5TH M. Oldfield, educational or aptitudes across evaluation. scoring system. Select the word that is EDITION) and John C. vocational choices. various It has been The test scores most nearly opposite in Raven in 2003 Age range for test domains, including standardized on large are typically meaning to takers is typically verbal reasoning, and diverse samples reported in "benevolent." between 14 and 65 numerical ability, to establish percentile ranks, years old. abstract reasoning, normative data and which indicate a) Malevolent mechanical percentile ranks. an individual's b) Friendly reasoning, The test has shown relative standing c) Generous space relations, good internal compared to the d) Kind-hearted spelling, language consistency, test- normative usage, and retest reliability, and sample. clerical speed and construct validity accuracy. Aims to across its different provide insights subtests. into an individual's relative strengths and weaknesses. 2. DETROIT TEST Dr. Robert L. For children and Assess the learning The reliability of the Sample Items: Sample LEARNING OF Williams in adolescents, typically potential of DTLA is typically items from the DTLA APTITUDE 1960s between the ages of 6 children and assessed through may include tasks such and 17 years old, who adolescents, measures such as as: may require particularly those internal consistency educational who may have reliability (e.g., Verbal Comprehension: intervention or support learning disabilities Cronbach's alpha) "Which word is most due to learning or other difficulties and test-retest similar in meaning to challenges. in traditional reliability (i.e., 'happy'?" educational consistency of scores (a) joyful settings. Evaluates over time). High (b) sad various cognitive reliability indicates (c) angry abilities such as that the test (d) tired verbal consistently comprehension, measures the same Reasoning: "If all dogs reasoning, spatial construct across have tails, and Rover is visualization, and memory. It consists different a dog, does Rover have of several subtests, administrations. a tail? including tasks like The DTLA Yes or No?" matching shapes, demonstrates content completing patterns, validity by assessing Spatial Visualization: and solving puzzles. relevant cognitive "Which shape would The results provide abilities related to result from folding the insights into an learning potential. unfolded shape?" individual's Additionally, strengths and criterion-related Memory: weaknesses in validity may be "Listen to these words: different areas of established by apple, chair, pencil, cat. learning. comparing test scores Now, which word was with other measures not on the list?" of academic achievement or cognitive functioning. 3. FLANNAGAN John C. adults in various Assess mental provides predictive Normative data INDUSTRIAL Flannagan industrial positions abilities and skills validity for job percentile norms TEST FIT between 1960 of industrial success in multiple and stanines and 1965 workers and positions provided are measures job based on more elements based on than 40 job analyses of critical classifications. behaviors required in different occupations. 4. ARMED Department of 16 or older at the time Measures content validity SERVICES defense in 1966 you take the test. developed abilities construct validity VOCATIONAL and helps predict criterion-related APTITUDE future academic validity BATTERY and occupational (ASVAB) success in the military. Standardized tool for personnel selection and classification (particular job assignment) 5. EMPLOYEE Psychological select employees with Some companies Rating scales are “If the first two APTITUDE services in the necessary use aptitude tests to used in statements below are SURVEY 1946 aptitudes, an employer help them make performance true, is the third can actually improve hiring decisions. management statement true?” the productivity of the These tests, called systems to a. Mr. Brown’s rabbits workforce career assessment indicate an are grey tests, help human employee’s level b. All gret creatures resources personnel of performance are kind learn more about a or achievement c. Mr. Brown’s rabbits prospective are unkind employee’s strengths and “A dress was initially weaknesses. marked at $150, and a Designed to assess pair of jeans were priced cognitive, at $50. If Emily got a perceptual, and 40% discount off the psychomotor dress and a 20% abilities that are discount on the jeans, important for a what was the total wide variety of percentage she saved on occupations. her purchases?” a. 27% b. 35% c. 45% d. 50% 6. STANDARD Horace Mann Assess an Interval scale: APTITUDE in 1845 individual’s measures TEST (FOR knowledge or intelligence and TEACHERS) educational aptitude background related to a particular career skill. Measure or assess the intellectual dimensions necessary for an individual to succeed in the teaching profession. 7. MULTIDIMENS Douglas ages 16 and above Provide a Has strongo Uses a IONAL Jackson in comprehensive psychometric standardized APTITUDE 1980s assessment of an properties and has scale to BATTERY II individual’s been extensively measure an short form - cognitive abilities researched and individual’s 1984 that can be used to validated. Test-retest performance on current version - guide educational reliability: 0.95 for the various 1995 and career verbal, 0.96 for cognitive tasks planning. Provide performance, and included in the information about an 0.97 for the full scale test. The test individual’s aptitude total scores uses a norm- or potential, for referenced various types of scoring system. educational and vocational pursuits. Valuable in assessing and distinguishing between selection candidates for high-stake positions in business, military, and law- enforcement settings. 8. OCCUPATION Dr. Jo-Ida Individuals aged 14 to Designed to assist in Validity: Good Uses a variety of INTEREST AL APTITUDE Hansen and 18 years, and it can be the career Construct scales to SCHEDULE SURVEY AND Dr. John L. administered to development of measure an Which do you INTEREST Holland in groups or individuals. handicapped, Test-retest individual's prefer: repairing a SCHEDULE, 1970 nonhandicapped Reliability: High interests and car or designing a THIRD and disadvantaged abilities in new car? EDITION students in grades Internal Consistency: relation to Which do you (OASIS-3) 8-12. High potential career prefer: working on paths. a research project Measures 12 interest factors Forced-choice or creating a piece directly related to and multiple- of art? the occupations choice scales Which do you listed in the Guide of prefer: helping someone with a Occupational o Likert-type personal problem Exploration. rating scales or conducting a business transaction? APTITUDE SURVEY In this pattern of dots, which line continues the pattern (SpatialReasoning) Which word does NOT belong with the others? (Verbal Reasoning) Which number is the next in the sequence: 2, 4, 6, 8, --? (NumericalReasoni ng) Can you use your left hand as well as your right hand? (ManualDexterity) 9. WIESEN TEST Eldon F. The test is primarily The primary goal of The reliability of the The WTMA 1. Which type of lever OF Wiesen in intended for the WTMA is to WTMA is typically typically uses a has the effort applied MECHANICAL 1960s individuals seeking assess an individual's assessed through norm-referenced between the fulcrum APTITUDE employment or mechanical aptitude, measures such as scoring system, and the load? (WTMA) training in fields that which includes their internal consistency where an a) First-class lever require mechanical understanding of reliability (e.g., individual's b) Second-class lever aptitude, such as mechanical Cronbach's alpha) performance is c) Third-class lever manufacturing, principles, tools, and test-retest compared to that d) None of the above construction, devices, and their reliability. High of a normative automotive, and ability to apply this reliability indicates sample of test- 2. Which of the engineering. While knowledge to solve that the test produces takers. Scores following is used to there isn't a strict age problems in a consistent results may be reported measure pressure? range for the test, it's practical context. over repeated as raw scores, a) Voltmeter typically administered administrations. percentile ranks, b) Barometer to late high school or standard c) Tachometer students or adults who The WTMA The WTMA scores, allowing d) Micrometer are exploring technical measures various demonstrates content for comparison career paths. aspects of validity by aligning of an mechanical aptitude, with the knowledge individual's including and skills required performance understanding for success in relative to mechanical systems, mechanical others. tools, simple occupations. machines, basic Additionally, physics principles, criterion-related and the ability to validity can be interpret diagrams established by and schematics. comparing test scores with job performance ratings or other measures of mechanical proficiency. 10. PHILIPPINES It is most appropriate Developed to o APTITUDE for Grade 9 or third measure student's CLASSIFICATIO year students aged 14- abilities and help N TEST (PACT) 15 years. students decide on the course they will take after high school. Used to predict a student's probable performance in various courses of study. Provides a profile of aptitudes for several Philippine educational programs in order to assist students in the choice of their careers. PERSONALITY DEVELOPER ADMINISTRATION PURPOSE/GOAL PSYCHOMETRIC TYPE OF SAMPLE ITEMS TESTS OF THE TEST PROPERTIES SCALE USED 1. 16 PERSONALITY Dr. Raymond individually or to a To provide a Moderate to good 185 multiple- “My thoughtfulness and FACTORS Cattell in 1949 group of people. thorough, research- reliability ratings choice items charitable nature are based map of normal have been reported with three my foundation.” ages 16 years and personality. for the 16PF. Based options - agree, older. on a sample of 10,261 disagree, and “I continue until The test is used by individuals, Internal neither agree nor everything is perfect.” psychologists and consistency disagree. other mental health reliabilities are on “I am not especially professionals as a average 0.76 for the interested in abstract clinical instrument primary scales and a ideas.” to help diagnose range of 0.68 to 0.87 psychiatric for all 16 scales. disorders, and help with prognosis and therapy planning. It is also used as a career evaluation tool, for couples counseling and personality assessment. 2.MYERS-BRIGGS Katharine is not be suitable for To allow According to the Uses 4 different Choose the statement TYPE INDICATOR Briggs and her those younger than 13 respondents to Myers & Briggs Scales that best describes you: daughter years of age. further explore and Foundation, the 1. Isabel Briggs understand their own MBTI meets 1.Extraversion a. I am the life of the personalities accepted standards of (E) - party. Myers in 1943 including their likes, reliability and Introversion (I) b. I prefer small dislikes, strengths, validity. The official 2.Sensing (S) - gatherings weaknesses, possible website for the test Intuition (N) career preferences, suggests that it has a 3.Thinking (T) - 2. and compatibility 90% accuracy and Feeling (F) a. I think of the future with other people. test-retest reliability 4.Judging (J) - b. I stay in the present. rating. One study Perceiving (P) The Myers-Briggs found that while the 3. Type Indicator scale showed strong a. I dream of tangible (MBTI) is an internal consistency things assessment that is and test-retest b. I dream of abstract believed to measure reliability, variations things psychological were observe. preferences in how people perceive the world and make decisions. 3.EMOTIONS Robert any age beginning To yield information Initially administered PROFILE INDEX Plutchik and with early adolescent. about certain basic to 60 women and Kellerman personality traits and test-retest reliability Henry in 1974 personality conflicts determined +.90. in an individual’s 50 test records to test life. the split half reliability This test has been shown to have good internal reliability. It has good validity, however when used in a selection it's validity was drastically reduced. 4.MINESSOTA Starke R. with adults 18 and It is a psychological MULTIPHASIC Hathaway and over. test that assesses PERSONALITY J. C. McKinley personality traits and INVENTORY - II in 1940 psychopathology. It is primarily intended to test people who are suspected of having mental health or other clinical issues. 5.NEO Paul Costa and 17 years of age and Provides a It falls within the PERSONALITY Robert older comprehensive generally accepted INVENTORY - III McCrae in picture of an range for good 1970s individual's reliability (.70 and personality based on above is considered the Five Factor ideal, but values Model. between.50 and.70 can be acceptable Each of these major depending on the dimensions or context). domains of personality may be subdivided into individual traits or facets measured by the NEO PI-III. 6.BASIC Douglas N. Adults and Intended for use The BPI is a reliable The Basic PERSONALITY Jackson in Adolescents (12 and with clinical and and valid measure of Personality INVENTORY 1974. above years of age) normal populations psychopathology. Inventory (BPI; to identify sources The BPI scales have Jackson 1996) is of maladjustment demonstrated sizable a 12-scale, 240- and personal correlations with item, true/false strengths. The BPI other self-report self-report can be used with measures intended to measure of the both adolescents and assess the same general domain adults, and can be dimensions of of completed in half psychopathology. psychopatholog the time of other y.

Chapter 6: Test Development PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue