Item Writing PDF
Document Details
Uploaded by SimplifiedNewton
Tags
Summary
This document discusses the item writing process for psychological tests, including considerations like content coverage, item types, and the appropriate number of items. It provides an overview of different item formats, such as dichotomous, polytomous, Likert, and category formats.
Full Transcript
Psychological Testing and Measurement (PSY-P631) VU Lesson 11 Item Writing The process of test c...
Psychological Testing and Measurement (PSY-P631) VU Lesson 11 Item Writing The process of test construction also involves careful planning, considering a number of factors to be kept in mind before actually writing test items. Three important considerations in this regard are: What range of content should the items cover? Which of many different types of item formats should be employed? How many items should be written? Nature and Range of Content: The test developer will have to decide about the nature and extent of content to be included in the test. This will primarily be decided with reference to the objectives laid down for measurement. Each item should be measuring some aspect of the content area. Type of Items: Different types of formats are available to test developers. The type of format will also depend upon the type of test and construct or trait being measured. For example a projective item may be suitable for a personality test but not for an achievement test. Details of item formats are given in the following sections. Number of Items: The test developer also has to consider the number of items to be included in the test. It will affect the length of the test. Also, whether the test is going to be administered individually or in group may also be affected by this decision. Another variable that may affect decision about the number of items is whether the test is going to measure a single trait/ ability/ domain or multiple traits/abilities/domains. Fewer items will be required if a single domain is to be measured. In case of multiple traits/ abilities a larger number of items will be required so that there are enough items available to measure each one of them. We have very lengthy inventories like MMPI containing a few hundred items, and short scale such as self-efficacy scale containing 3 to 10 items. When a standardized test is developed which is based on multiple-choice response format, it is usually advisable that the number of items for the first draft of a standardized test contain approximately twice the number of items that the final version of the test will contain. The test developer may write a large number of items from personal experience. Help from experts can also be taken for item writing. Literature searches may also be a valuable source of inquiry for item writing. Considerations related to variables such as the purpose of the test and the number of examinees to be tested at one time, enter into decisions regarding the format of the test. Item Formats: Kaplan and Saccuzzo (2001) have given a very good description of test item formats. The test developer can choose from the following formats described by them: a. The dichotomous format b. The polytomous format c. The Likert format d. The category format e. Checklists and Q –sorts Test formats are classified in many other ways as well; recall type and recognition type; constructed response type and identification type. A very common distinction is made between objective type and essay type formats. Another way is to divide test formats as selected response format, and constructed response format. Selected Response Format: This type of format presents the examinee with a choice of answers and requires selection of one alternative e.g. on achievement test, the test taker is required to select the correct option. The types of selected- response format are multiple choice, matching, and true/false items. Test formats described by Kaplan and Saccuzzo (2001) fall into this category. We will discuss these types in the following section. a. The Dichotomous Format: ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU This is the alternate response format in which the test taker is provided with two response options to choose from. It can be used in a number of ways. In a common way, one of the two options is right and the other one wrong. For choosing the right option a certain mark or score, usually one, is given e.g. ten right options marked will carry a score of ten. Another way is to use the ‘yes’ and ‘no’ options, as in many personality inventories. Here the subject tells if a statement describes him or not. An even more popular use is the true/false format. In this format the test takers are presented with a number of statements and they have to tell whether these are true or false. In case of teacher made tests, certain statements from the textbooks or other materials are used in these items. Items using these statements are ‘true’ whereas text materials are altered to develop items that are ‘false’. Such items have the following advantages: They are easy to develop. The test developer does not have to make effort to think of a number of options that seem to be correct. Development of such tests does not take too much time. These items are easy to score. There are some disadvantages as well: A test taker may be able to accurately answer at least 50% answers even when he does not know the right answers. The probability of answering an item correctly by chance is ½. If the teacher had used equal number of true and false items in the test and if the test taker marks all items as true (or marks all as false) then he will be marking half of the items correctly. Such items may encourage rote memorization of course content in case of teacher made tests. When students know that the items will be based on the text of the course content then they try to memorize the content rather than learning the concepts. Nevertheless these items are used quite often. In order to control the disadvantages of these tests a large number of items covering a wider range of content should be used. Activity: Make two test items with alternate response options. One should be true/false type and the other one yes/no type. b. The Polytomous (Polychotomous) Format: In this type of items multiple response options, rather than alternate response options are given. Such an item includes these elements: A stem A correct alternative option Several incorrect alternatives or options which are called “distracters” or “foils”. Example: Stem A psychological test, an interview, and a case study are: Correct Alt. a. psychological assessment tools b. standardized behavioral samples Distractors c. reliable assessment instruments d. theory-linked instruments These response options include one right answer and the others are distractors. The question or the item statement is called the stem. The stem should clearly state the question or the problem. The distractors should be framed in such a manner that each one of them should appear to be the right answer. In multiple choice items the problem of answering half of the items without knowing the right answers is controlled to a great extent. However, there is still a possibility that someone may be able to answer a certain percentage of questions even without knowing a word of it. If the items in a test have four options each then there is a possibility that one may be able to answer 25% of items correctly simply by making a right guess by chance. If three options are used then one may manage to mark 33% of items correctly by chance. This problem can be overcome by increasing the options, but practically speaking it is very difficult to make a number of options which all seem to be the right answer. Testing experts believe that it is good to use three or four options for each item. At times a correction for guessing is used employing the following formula: ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU Corrected score= R- W/n-1 Where, R= the number of right responses W= the number of wrong responses n= the number of choices for each item For example a test taker scores 40 out 100 in a test that used a four option format. The score of 40 will be corrected like this: Corrected score= 40- 60/4-1 = 40- 60/3 = 40-20= 20 This example shows that one should understand that guessing in a test may have more serious consequences than one might foresee. If the scoring procedure involves correction for guessing then guessing is to be completely avoided. If one has little or no knowledge of the content then the likelihood of making wrong guesses will also be very high. Looking at the formula, it can be seen that the greater is the number of wrong answers the greater will be the number subtracted from the right answers, which means the smaller will be the corrected score. The test developer should include all response options in the same proportions. Some teachers have a tendency to include some particular option as their favorite and they would use that option frequently as the right option e.g. ‘C’ or ‘D’ more often than ‘A’ or ‘B’. If students realize this trend then they will be more likely to choose the teacher’s favorite options and mark many answers correctly even without knowing the answer. Activity: Try to develop a multiple-choice item with five options. Can you make five such options that look like correct answers of the stem? If you find it hard to make five, then try making three options. c. Matching Item: A matching item is the format different from multiple choice format. The test taker is presented with two columns of responses and the test taker has to determine which item from the first column matches the item from the second column. Both columns may have equal number of items, hence making it easy for the examinee to match the items to which he does not know how to respond. E.g. if a respondent is unsure about one of the options give, he may deduce the right answer by matching all the other options first. Thus a perfect score would be obtained. Providing more options than are needed is designed to minimize such a possibility. d. The Likert Format: A very popularly used tool for measuring personality and attitudes is the Likert’s scale. Likert’s scale format provides test takers an opportunity to endorse degree of their agreement to a statement. These statements pertain to attitude or personality. Likert used it as part of his method of attitude scale construction. Likert scales provide five response options ranging from ‘strongly disagree’ to ‘strongly agree’, with ‘neutral’ in between. For example: “I like to make friends who are older to me”. Choose from the following options: Strongly disagree _____ , disagree______, neutral_____, agree____, strongly agree______ At times people have a tendency to mark ‘neutral’ rather than giving any clear cut answers. Therefore six options, rather than five may also be used: Strongly disagree _____ , moderately disagree______, mildly disagree_____, mildly agree____, moderately agree_________, strongly agree______ The responses are summed to determine a person’s score. In case of negatively worded items are reverse scored and added to the total. The Category Format: This format is similar to the Likert’s scale, but has more options than five. A 10-point scale is more commonly used. However, the number of response categories may be more or less than that. This scale is used to rate something e.g. performance of a team. It is felt that people may not be very accurate in rating a player or a team’s performance using this scale. It may happen that when a person watches a player in comparison to a superb player then he might rate the player to be rated at a lower level, whereas he may rate him very high when comparing with a poor performer. Some authors have ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU recommended that in order to control this factor the persons who will be rating may be shown videos of performances that could be rated ‘10’ and those that ‘deserve a ‘1’(Kaplan & Ernst, 1983). e. Checklists and Q–sorts: Adjective checklists contain a list of adjectives. The test taker checks the adjectives that are true about him. Checklists may be used for indicating one’s own characteristics or those of other people. These are mostly used in personality assessment. In Q – sort the test taker is provided with a number of statements and has to sort them in nine piles depending on to what extent they stand true of her. This can be used for rating other people as well. Activity: Imagine you have to assess your classmates’ ratings of a certain teacher using Q- sort format. What five statements will you write to be used by the subjects? …… and if you had to rate a TV serial what characteristics will you consider? Constructed Response Format: It is the response format that requires the examinee to provide or create the correct answer than just selecting it. Three types of constructed-response items are the completion item, the short answer, and the essay. A completion item requires the examinee to provide a word or phrase that completes sentences. A good completion item should be clearly worded so that the correct answer is specified.. A good short answer item is written clearly enough that the test taker can indeed respond briefly, with a short answer. There is no hard and fast rule that point out how short an answer should be. An essay is a type of response format in which the examinees are asked to describe in detail a single topic which is asked from them. The skills measured by essay type items are different from other type of item formats e.g. an essay requires recall, organization, planning, and writing ability, the other types of items require only recognition. ©copyright Virtual University of Pakistan