Test Utility, Test Development, Item Analysis, and Test Revision PDF

Summary

This document is a presentation on psychological assessment topics, including test utility, test development, and item analysis. It provides a theoretical overview of these concepts and includes examples and formulas.

Full Transcript

SLIDESMANIA.C Topic Learning Outcomes 1. Define test utility 2. Explain the process of test development 3. Discuss item analysis SLIDESMANIA.C SLIDESMANIA.C Test Utility:...

SLIDESMANIA.C Topic Learning Outcomes 1. Define test utility 2. Explain the process of test development 3. Discuss item analysis SLIDESMANIA.C SLIDESMANIA.C Test Utility: Definition “How useful the test is?” The practical value of using a test to aid in decision making SLIDESMANIA.C Test Utility Cost efficiency Savings in time Comparative Utility –how useful is this test as compared to another test Clinical Utility– how useful is this test for the purposes of diagnostic assessment or treatment Diagnostic Utility– how useful is this test for classification purpose How useful is this test in addition to another test Effectiveness SLIDESMANIA.C Should this intervention be used in place of an existing intervention Test Utility Judgments concerning the utility of a test are made on the basis of test reliability and validity data as well as on other data SLIDESMANIA.C Factors Affecting Test Utility 1. Psychometric Soundness Reliability and Validity are acceptably high The higher the criterion-related validity of test scores, the higher the utility Utility has something to do with the characteristics of the targeted test users SLIDESMANIA.C Factors Affecting Test Utility 2. Cost - refers to disadvantages, losses, or expenses in both economic and noneconomic terms Allocate funds to purchase a particular test Funds to supply a blank test protocols Funds for computerized processing, scoring, and interpretation May include the following: ○ Professional fee ○ SLIDESMANIA.C Charges for testing facility ○ Routine costs of doing business (legal, accounting, licensing) Factors Affecting Test Utility 3. Benefits - Refers to profits, gains, or advantages Justification of cost Increase in worker’s performance Reduction in training time Reduction in accidents Reduction in turnover SLIDESMANIA.C Utility Analysis: Definition A family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of SLIDESMANIA.C assessment. SLIDESMANIA.C General Approaches in Utility Analysis 1. Expectancy Table/Chart – Provide an indication of likelihood that a test taker will score within some interval of scores on a criterion – an interval may categorized as “passing”, “acceptable”, or “failing” SLIDESMANIA.C SLIDESMANIA.C General Approaches in Utility Analysis 2. Taylor-Russel Tables – Provide an estimate of the percentage of employees hired by the use of a particular test who will be successful at their jobs, given different combinations of three variables: the test’s validity, the selection ratio used, and the base rate. Selection Ratio = # of Candidates Hired/ Total Applicants SLIDESMANIA.C Base Rate = % of Current Employees that are High Performers SLIDESMANIA.C SLIDESMANIA.C General Approaches in Utility Analysis 3. Naylor-Shine Tables – Used for obtaining the difference between the means of the selected and unselected groups to derive an index of what the test (or some other tool of assessment) is adding to already established procedures. 4. Brogden-Cronbach-Gleser Formula– Formula used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument under specific SLIDESMANIA.C conditions. General Approaches in Utility Analysis Brogden-Cronbach-Gleser Formula utility gain = (N)(T)(rxy)(SDy)(Zm) − (N)(C) N represents the number of applicants selected per year, T represents the average length of time in the position (or, tenure) rxy represents the (criterion related) validity coefficient for the given predictor and criterion SDy represents the standard deviation of performance (in dollars) of SLIDESMANIA.C employees, Zm represents the mean (standardized) score on the test for General Approaches in Utility Analysis 5. Decision Theory – (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical consequences to the outcome.. SLIDESMANIA.C SLIDESMANIA.C Practical Considerations 1. The Pool of Job Applicants - The issue of how many people would actually accept the employment position offer to them even if they were found to be qualified candidate - Many of TOP PERFORMERS ON THE TEST are people who are ALSO BEING OFFERED POSITIONS by one or more other potential employers SLIDESMANIA.C Practical Considerations 2. The complexity of the job - Hunter et al. (1990) observed, the more complex the job, the more people differ on how well or poorly they do that job. SLIDESMANIA.C Practical Considerations 3. The Cut Scores in Use Cut Score/ Cutoff Score Multiple Cut Scores Multistage or Multiple Hurdle Compensatory Model of Selection SLIDESMANIA.C Practical Considerations A. Cut Score/ Cutoff Score - A (usually numerical) reference point derived as a result of a judgment and used to divide a set of data into two or more classifications, with some action to be taken or some inference to be made on the basis of these classifications Relative Cut Score Fixed Cut Score A reference point in a distribution of test scores A reference point in a distribution of test scores used to divide a set of data into two or more used to divide a set of data into two or more classifications—that is set based on norm-related classifications—that is typically set with reference considerations rather than on the relationship of to a judgment concerning a minimum level of test scores to a criterion proficiency required to be included in a particular classification SLIDESMANIA.C Aka. norm-referenced cut score Aka. absolute cut score Normative Criterion Practical Considerations B. Multiple Cut Scores - the use of two or more cut scores with reference to one predictor for the purpose of categorizing test takers Examples: UNO – 90 and above DOS – 80 to 89 TRES – 70 to 79 SLIDESMANIA.C Practical Considerations C. Multistage or Multiple Hurdle - The achievement of a particular cut score on one test is necessary in order to advance to the next stage of evaluation in the selection process Example: Miss Universe Pageant SLIDESMANIA.C Practical Considerations D. Compensatory Model of Selection - A model of applicant selection based on the assumption that high scores on one attribute can balance out low scores on another attribute. Example: I got a high score in ABPSYCH, but I got low score in ASSESSMENT. My score in ABPSYCH will compensate my SLIDESMANIA.C score in ASSESSMENT SLIDESMANIA.C Steps in Test Development Step 1: Test Conceptualization Step 2. Test Construction Step 3. Test Tryout Step 4. Item Analysis Step 5. Test Revision SLIDESMANIA.C Step 1: Test Conceptualization - An emerging phenomenon or pattern of behaviour might serve as the stimulus for the development of a new test. - Preliminary Questions: What is the test designed to measure? What is the objective of the test? Is there a need for this test? Who will use this test? Who will take this test? What content will the test cover? How will the test be administered? What types of responses will be required of test takers? SLIDESMANIA.C Who benefits from an administration of this test? Is there any potential for harm as the result of an administration of this test? Step 1: Test Conceptualization Pilot Work – the generalized term for the preliminary research surrounding the creation of the test prototype. Remember: Items must be subject to pilot studies to evaluate whether or not they should be included in the final form of the test SLIDESMANIA.C Step 2: Test Construction SLIDESMANIA.C Step 2: Test Construction (Scaling) Scaling - The process of SETTING RULES for assigning numbers in measurement Scaling Methods: Likert Scales Paired Comparison Guttman Scale/Scalogram Analysis SLIDESMANIA.C Comparative Scaling Categorical Scaling Step 2: Test Construction (Scaling) 1. Likert Scales - A type of summative rating scale - FIVE alternative responses (sometimes SEVEN) - Ordinal in Nature SLIDESMANIA.C Step 2: Test Construction (Scaling) 2. Paired Comparison - Scaling method whereby one of a pair of stimuli (such as photos) is selected according to a rule (such as “select the one that is more appealing”) SLIDESMANIA.C Step 2: Test Construction (Scaling) 3. Guttman Scale/Scalogram Analysis - Named for its developer, a scale wherein items range sequentially from weaker to stronger expressions of the attitude or belief being measure SLIDESMANIA.C Step 2: Test Construction (Scaling) 4. Comparative Scaling SLIDESMANIA.C Step 2: Test Construction (Scaling) 5. Categorical Scaling SLIDESMANIA.C Step 2: Test Construction (Writing Items) When devising a standardized test using a multiple-choice format, it is usually advisable that the first draft contain approximately twice the number of items that the final version of the test will contain Item Pool: The reservoir or well from which items will or will not be drawn for the final version of the test; the collection of items to be further evaluated for possible selection for use in an item bank SLIDESMANIA.C Item Format: The form, plan, structure, arrangement, and Step 2: Test Construction (Writing Items) Item Formats: Selected-Response Format - A form of test item requiring test-takers to select a response (Multiple Choice, Matching Item, and Binary-Choice Items/True or False) Constructed-Response Items - A form of test item requiring the test taker to construct or create a response (Essay and Completion/Short Answer) SLIDESMANIA.C Step 2: Test Construction (Writing Items) Multiple-Choice Format: Has 3 elements (STEM, CORRECT ALTERNATIVE/ OPTION, DISTRACTORS or FOILS) Criteria of Good Multiple-Choice - has one correct alternative - has grammatically parallel alternatives - has alternatives of similar length - has alternatives that fit grammatically with the stem - includes as much of the item as possible in the stem to avoid unnecessary repetition SLIDESMANIA.C - avoids ridiculous distractors - not excessively long Step 2: Test Construction (Writing Items) Matching-Item - A test taker is presented with two columns: premises and responses, and must determine which response is best associated with which premise. - Two Columns: (Left – Premises; Right – Responses) - Test-taker could get perfect score even he did not actually know all the answers - To minimize the possibility, provide more options or state in the SLIDESMANIA.C directions that each response may be a correct answer once, more than once, or not at all. Step 2: Test Construction (Writing Items) Completion or Short Answer (Fill in the Blanks) - Requires the examinee to provide a word or phrase that completes a sentence Essay - Is useful when the test developer wants the examinee to demonstrate a depth of knowledge about a single topic - Allows for the creative integration and expression of the SLIDESMANIA.C material in the test taker's own words - The MAIN PROBLEM in essay is the SUBJECTIVITY IN Step 2: Test Construction (Writing Items) Writing items for Computer Administration Item Bank - A collection of questions to be used in the construction of computer test administration Item Branching - In computerized adaptive testing, the individualized SLIDESMANIA.C presentation of test items drawn from an item bank based on the test-taker’s previous responses Step 2: Test Construction (Writing Items) Computer Adaptive Testing reduces the..... A. Floor Effect: A phenomenon arising from the diminished utility of a tool of assessment in distinguishing test-takers at the low end of the ability, trait, or other attribute being measured. - VERY LOW SCORED DUE TO VERY HARD QUESTIONS B. Ceiling Effect: The diminished utility of an assessment tool for distinguishing test takers at the high end of the ability, trait, or other attribute being measured SLIDESMANIA.C - - VERY HIGH SCORED DUE TO VERY EASY QUESTIONS Step 2: Test Construction (Scoring Items) 1. Cumulative Model - A method of scoring whereby points or scores accumulated on individual items or subtests are tallied and then, the higher the total sum, the higher the individual is presumed to be on the ability, trait, or other characteristic being measured - Example: High IQ Score -> More Intelligent SLIDESMANIA.C Step 2: Test Construction (Scoring Items) 2. Class or Category Scoring - A method of evaluation in which test responses earn credit toward placement in a particular class or category with other test takers. Example: GWA of 1.5 above will be place on Star Section SLIDESMANIA.C GWA of 2 below will be place on Lower Section Step 2: Test Construction (Scoring Items) 3. Ipsative Scoring - A scale in which the points distributed to the various different items must sum to a specific total. In such a scale, all participants will have the same total score but the distribution of the points among the various items will differ for each individual. - Example: A supervisor using an ipsative scale to indicate an SLIDESMANIA.C employee's strength in different areas initially might assign 20 points for communication, 30 for timeliness, and 50 for work Step 3: Test Tryout - The test should be tried out on people who are similar in critical respects to the people for whom the test was designed - The test tryout should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered - NO FEWER THAN 5, PREFERABLY AS MANY AS 10 - MORE SUBJECTS in TRYOUT – BETTER What is a Good Item? - Good Item is RELIABLE and VALID - It helps to discriminate test-takers - If...... High Scorers – incorrect = BAD ITEM Low Scorers – correct = BAD ITEM SLIDESMANIA.C High Scorers – correct = GOOD ITEM Low Scorers – incorrect = GOOD ITEM Step 4: Item Analysis Item Analysis - Statistical procedures used to analyze items Item Difficulty Index - In achievement or ability testing and other contexts in which responses are keyed correct, a statistic indicating how many test takers responded correctly to an item Formula: # Of Test takers who answered CORRECTLY -------------------------------------------------------- Total # of Test Takers SLIDESMANIA.C 0.00 (NO ONE GOT CORRECT) 1.00 (EVERYONE IS CORRECT) Step 4: Item Analysis Standards: 0.50 – Optimal Average Item Difficulty (whole test) 0.30 to 0.80 – Average Item difficulty on Individual Items 0.75 – True or False 0.625 – Multiple Choice (4 choices) SLIDESMANIA.C Step 5: Test Revision 1. Test Revision as a stage in New test Development - “polishing and finishing touches” 2. Test Revision in the Life Cycle of an Existing Test - No hard-and-test rule exist when to revise a test - BUT it should be revised when significant changes in the domain represented, or new conditions of test use and interpretations, make the test inappropriate for its intended SLIDESMANIA.C use Step 5: Test Revision Cross Validation - A revalidation on a sample of test takers other than the test takers on whom test performance was originally found to be a valid predictor of some criterion Co-Validation - The test validation process conducted on two or more tests using the same sample of test takers; when used in conjunction with the creation of norms or the revision of existing norms, this SLIDESMANIA.C process may also be referred to as co-norming Any questions? SLIDESMANIA.C

Use Quizgecko on...
Browser
Browser