Week 6 lecture slides.pptx

WRITING AND EVALUATING TEST ITEMS: PART 1 PSY61204 Psychological Tests and Measurements Dr Michele Anne Overview Principles underlying test construction Item construction Philosophical issues Principles Underlying Test Construction Identifying a need Identify a reason and need for the test to be constructed Find an area / variable which does not have an existing test to measure it Role of theory Test should be guided by and based on relevant theories in the area of measurement, instead on solely from observation and personal beliefs Help to determine the components / subscales which need to be included in the test Practical choices Decisions about test items, appearance, constraints, rules Format of test, rating scale, time limit, medium, method of scoring, depth of items, etc Pool of items Create table of specifications Subtopics included, importance of each subtopic, number of items per subtopic Create items following table of specifications Table of specification and items constructed based on own perception, observation, theories, past research, other test in the area, expert opinions Pool of items should be 4-5 as large as actual number of items needed Tryouts and refinement Initial pool of items will be unrefined, repetitive, unclear Refine to smaller but usable pool Get feedback from experts to identify unclear items Conduct pilot testing – administer preliminary version of test to sample to determine whether any changes should be made Carry out item analysis to determine which items are relevant Carry out content analysis to sort items into categories, to determine overrepresented and underrepresented content Reliability and validity Establish reliability (consistency) Test-retest reliability, parallel reliability, interrater reliability, internal consistency Establish validity (accuracy) Face validity, content validity, criterion validity, construct validity Determine which reliability and validity are relevant for the test Standardization and norms Standardize administration, time limits, scoring procedure, etc to ensure results are comparable regardless of test administrator Norms are obtained from mean and standard deviation of groups, and will enable comparison between individual performance and group performance Norm groups should be large enough and representative Raw score (actual total)  derived score (e.g., z scores, t scores, percentiles)  comparison with norm scores Further refinement After a test has been made available / released, it can undergo further refinement and revisions Changes can be due to changes in scientific knowledge, societal changes, addition or changes in norms Variations such as short forms can be developed Factor analysis – whether dimensions / factors match theory and test construction Multivariate – test composed of many scales, and additional scales can be added Item Construction Writing test items Clear and unambiguous Should not be double barreled (i.e., more than one aspect is measured in the item) Each item only ask about one aspect Should not use subjective perception of time (e.g., sometimes, frequently) Be specific about the time (e.g., once a week) Categories of item Constructed response items Participant is presented with stimuli and produces a response E.g., essay, short answer, interview response Selected response items Participant selects correct or best response from list of options E.g., rating, multiple choice Types of item Multiple-choice Item which has response options Can measure both factual and subjective aspects Keyed response is the correct / best answer Distractors are incorrect options to differentiate those who know correct answer from those who don’t Pro: can be administered quickly and has broader coverage Con: limits the responses which can be made True-false Item which response is to determine if it is true / correct or false / incorrect More often seen in personality and opinion test Con: Response can be based on guess (50% chance) Types of item (cont.) Anologies Items involving comparison and logical thinking Can use words, numbers, designs, etc E.g., 12 is to 4 as 9 is to ____ Odd-man-out Items involving identifying which component does not belong Can use words, number, etc E.g., bread, apple, book, rice, nuggets Sequences Items involving series of components related to each other, with one component missing which needs to be generated E.g., 6, 12, 18, 24, 30, 36, _____ Types of item (cont.) Matching Two list of items which need to be matches based on the required criteria The 2 list can be unequal to counteract guessing Used to determining factual knowledge Completion Item involving response of completing a sentence E.g., I am a person who _______________ Fill in the blank A variant of completion item, but the required response can come from anywhere within the sentence (not just the end) E.g., ________ is the study of human behaviour Types of item (cont.) Forced choice Items with 2 or more options which are appealing / applicable, where participants must choice one E.g., How would you spend your most of your weekend: (1) sleeping, (2) eating Vignettes A brief scenario is given (e.g., excerpt, case study, short story), and participants need to respond to it (e.g., story completion, choosing a response, making judgement) Rearrangement / continuity Items where a list of provided and participants need to sort it in chronological order Can be used for factual knowledge about series / order Difficult to score Objective-subjective continuum Test items can exist anywhere along an objective- subjective continuum Objective end Consist of multiple choice items, true-false items Easily scored, has one correct answer, easily analysed Only provides information about correct answer and can be a result of guessing Subjective end Consist of essay / open ended questions Responses can be unique, in-depth, or personal Item format Can be affected by test constructor’s preferences and test content Items which can create variation is raw score are better than binary scores Responses can include several choice and rating / Likert scales instead or true-false responses Should Likert scale have odd (middle / neutral option) or even (forced to choose) numbers Should there be an additional item for “I don’t know” or “prefer not to answer” Sequencing of items The order which items will be presented Easy to difficult, or one item per subscale Spiral omnibus Few item from easy to difficult, repeat cycle again Filler items Not scored but included to hide real intent of test Direct assessment Test which provides direct measurement of product or performance Relevant for skill based measurements, problem solving, higher-order cognition E.g., testing swimming ability by actually swimming in the pool instead of filling in a rating of how well you swim Philosophical Issues Is the items working as it should? By fiat Assuming the test measures the respective area via own judgement and expertise of the area Criterion-keyed test Criterion validity – correlate test with test of a similar criterion / area Done several times across several samples Affects decision to retain items or not May not reflect construct validity Can be affected by type of criterion chosen, types of factors in tests, and settings administered Is the items working as it should? Factor analysis Assumes scale should be univariate, independent and measures one factor Items should intercorrelate and load onto one factor to indicate they are measuring one variable / dimension Test constructor needs to study the items, understand the underlying theories, and label the factors May not take into account complexity or multidimensionality of certain variables Questions?

Week 6 lecture slides.pptx

Document Details

Tags

Related

Full Transcript