Appraising Classroom Tests and Item Analysis PDF
Document Details
Uploaded by CheaperRapture3435
Tags
Summary
This document provides learning outcomes for a topic on appraising classroom tests and item analysis. It also provides an introduction to the subject and its procedures.
Full Transcript
Topic Appraising Classroom 9 Tests and Item Analysis LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Define item analysis; 2. Compute t...
Topic Appraising Classroom 9 Tests and Item Analysis LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Define item analysis; 2. Compute the difficulty index; 3. Compute the discrimination index; 4. Analyse the effectiveness of distractors in a question; 5. Discuss the relationship between the difficulty index and discrimination index of an item; and 6. Explain the role of an item bank. INTRODUCTION When you develop a test, it is important to identify the strengths and weaknesses of each item. In other words, to determine how well items in a test perform, some statistical procedures need to be used. In this topic, we will discuss item analysis which involves the use of three procedures, namely item difficulty, item discrimination and distractor analysis to help the test developer decide whether the items in a test can be accepted, modified or rejected. These procedures are quite straightforward and easy to use but educators need to understand the logic underlying the learner analyses in order to use them properly and effectively. Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 181 9.1 WHAT IS ITEM ANALYSIS? Having administered a test and marked it, most teachers would discuss the answers with learners. Discussion would usually focus on the right answers and common errors made by learners. Some teachers may focus on the questions most learners performed poorly on and the questions learners did very well. However, there is much more information available about a test that is often ignored by teachers. This information will only be available if item analysis is conducted. What is item analysis? Item analysis is a process which examines the responses to individual test items or questions in order to assess the quality of those items and the test as a whole. Item analysis is especially valuable in improving items or questions that will be used again in later tests. Moreover, it can also be used to eliminate ambiguous or misleading items in a single test administration. Specifically in Classical Test Theory (CTT) the statistics produced from analysing the test results based on test scores include measures of difficulty index and discrimination index. Analysing the effectiveness of distractors also becomes part of the process. We will discuss each of these components of item analysis in detail later. The quality of a test is determined by the quality of each item or question in the test. The teacher who constructs a test can only roughly estimate the quality of a test. This estimate is based on the fact that the teacher has followed all the rules and conditions of test construction. However, it is possible that this estimation may not be accurate and certain important aspects have been ignored. Hence, it is suggested that to obtain a more comprehensive understanding of the test, item analysis should be conducted on the responses of learners. Item analysis is conducted to obtain information about individual items or questions in a test and how the test can be improved. It also facilitates the development of an item or question bank which can be used in the construction of a test. 9.2 STEPS IN ITEM ANALYSIS Both the Classical Test Theory (CTT) and modern test theories such as Item Response Theory (IRT) provide useful statistics to help us analyse the test data. For many item analyses CTT is sufficient to provide the information we need. As such, CTT will be used in this module. Copyright © Open University Malaysia (OUM) 182 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS Let us take an example of a teacher who has administered a 30-item multiple- choice objective test in geography to 45 learners in a secondary school classroom. (a) Step 1 Obviously, upon receiving the answer sheets, the first step would be to mark each of the answer sheets. (b) Step 2 Arrange the 45 answer sheets from the highest score obtained to the lowest score obtained. The paper with the highest score is on top and the paper with the lowest score is at the bottom. (c) Step 3 Multiply 45 (the number of answer sheets) with 0.27 (or 27 per cent) which is 12.15 and round up to 12. The use of the value 0.27 or 27 per cent is not inflexible. It is possible to use any percentage between 27 to 35 per cent as the value. However the 27 per cent rule can be ignored if the class size is too small. Instead of taking the 27 per cent sample, divide the number of answer sheets by 2. (d) Step 4 Arrange the pile of 45 answer sheets according to scores obtained (highest score to the lowest score). Take out 12 answer sheets from the top of the pile and 12 answer sheets from the bottom of the pile. Call these two piles as „high mark‰ learners and „low mark‰ learners. Set aside the middle group of papers (21 papers). Although these could be included in the analysis, using only the high and low groups simplifies the procedure. (e) Step 5 Refer to Item #1 or Question #1: (i) Count the number of learners from the „high mark‰ group who selected each of the options (A, B, C or D); and (ii) Count the number of learners from the „low mark‰ group who selected the options A, B, C or D (refer to Figure 9.1). Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 183 Figure 9.1: Item analysis for one item or question From the analysis, 11 learners from the „high mark‰ group and two learners from the „low mark‰ group selected „B‰ which is the correct answer. This means that 13 out of 24 learners selected the correct answer. Also, note that all the distractors (A, C and D) were selected by at least one learner. However, the information provided in Figure 9.1 is insufficient and further analysis has to be conducted. 9.3 THE DIFFICULTY INDEX Using the information provided in Figure 9.1, you can compute the difficulty index which is a quantitative indicator with regard to the difficulty level of an individual item or question. It can be calculated using the following formula: Number of learners with the correct answer (R) Difficulty index p Total number of learners who attempted the question T R 13 0.54 T 24 Copyright © Open University Malaysia (OUM) 184 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS What does a difficulty index (p) of 0.54 mean? The difficulty index is a coefficient that shows the percentage of learners who got the correct answer compared to the total number of learners in the two groups who answered. In other words, 54 per cent of learners selected the correct answer. Although our computation is based on the high and low scoring groups only, it provides a close approximation to the estimate that would be obtained with the total group. Thus, it is proper to say that the index of difficulty for this item is 54 per cent (for this particular group). Note that, since difficulty refers to the percentage of getting the item right, the smaller the percentage figure the more difficult is the item. Lien (1980) provides these guidelines on the interpretation of the difficulty index as follows (refer to Figure 9.2): Figure 9.2: Interpretation of the difficulty index (p) If a teacher believes that the achievement 0.54 on the item is too low, he can change the way he teaches to better meet the objective represented by the item. Another interpretation might be that the item was too difficult, confusing or invalid, in which case the teacher can replace or modify the item, perhaps using information from the item's discrimination index or distractor analysis. Under CTT, the item difficulty measure is simply the proportion of the correct answer from learners for an item. For an item with a maximum score of 2, there is a slight modification to the computation of proportion of percentage correct. This item has a possible partial credit scoring of 0, 1 and 2. If the total number of learners attempting this item is 100 and 23 learners score 0, 60 learners score 1 and 17 learners score 2, then a simple calculation will show that 23 per cent of the learners score 0, 60 per cent of the learners score 1 and 17 per cent of the learners score 2 for this particular item. The average score for this item should be (0 0.23) + (1 0.6) + (2 0.17) = 0.94. Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 185 Thus the observed average score of this item is 0.94 out of a maximum of 2. So the average proportion correct is 0.94/2 = 0.47 or 47 per cent. ACTIVITY 9.1 A teacher gave a 20-item science test to a group of 35 learners. The correct answer for Question #25 is „C‰ and the results are as follows: Options A B C D Blank High mark group (n = 12) 0 2 8 2 0 Low mark group (n = 12) 2 4 3 2 1 (a) Compute the difficulty index (p) for Question #25. (b) Is Question #25 an easy or difficult question? Justify. (c) Do you think you need to improve Question #25? Why? Share your answers with your coursemates in the myINSPIRE online forum. 9.4 THE DISCRIMINATION INDEX The discrimination index is a basic measure which shows the extent to which a question discriminates or differentiates between learners in the „high mark‰ and „low mark‰ groups. This index can be interpreted as an indication of the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an item. It is most crucial for a test item that learners got the answer correct due to their level of knowledge or ability and not due to something else such as chance or test bias. Copyright © Open University Malaysia (OUM) 186 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS Note in our example earlier, 11 learners in the „high mark‰ group and two learners in the „low mark‰ group selected the correct answer. This indicates positive discrimination since the item differentiates between learners in the same way that the total test score does. That is, learners with high scores on the test (high mark group) got the item right more frequently than learners with low scores on the test (low mark group). Although analysis by inspection may be all that is necessary for most purposes, an index of discrimination can be easily computed using the following formula: Rh RL Discrimination index 1 T 2 Where Rh = Number of learners in „high mark‰ group (RH) with the correct answer RL = Number of learners in „low mark‰ group (RL) with the correct answer T = Total number of learners Example: A test was given to a group of 43 learners and 10 out of the 13 „high mark‰ group got the correct answer compared to 5 out of 13 „low mark‰ group who got the correct answer. The discrimination index is computed as follows: R h R L 10 5 10 5 D 0.38 1 T 1 26 13 2 2 What does a discrimination index of 0.38 mean? The discrimination index is a coefficient that shows the extent to which the question discriminates or differentiates between „high mark‰ learners and „low mark‰ learners. Blood and Budd (1972) provided the following guidelines on the meaning of the discrimination index as follows (refer to Figure 9.3): Figure 9.3: Interpretation of the discrimination index Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 187 A question that has a high discrimination index is able to differentiate between learners who know and those who do not know the answer. When we say that a question has a low discrimination index, it is not able to differentiate between learners who know and learners who do not know. A low discrimination index means that more „low mark‰ learners got the correct answer because the question was too simple. It could also mean that learners from both the „high mark‰ group and „low mark‰ group got the answer wrong because the question was too difficult. The formula for the discrimination index is such that if more learners in the „high mark‰ group chose the correct answer than did learners in the low scoring group, the number will be positive. At the very least, one would hope for a positive value as that would indicate that it is knowledge of the question that resulted in the correct answer. (a) The greater the positive value (the closer it is to 1.0), the stronger the relationship is between overall test performance and performance on that item. (b) If the discrimination index is negative, that means for some reason learners who scored low on the test were more likely to get the answer correct. This is a strange situation which suggests poor validity for an item. ACTIVITY 9.2 A teacher gave a 35-item economics test to 42 learners. For Question #16, 8 out of the 11 „high mark‰ groups got the correct answer compared to 4 out of 11 from the „low mark‰ group who got the correct answer. (a) Compute the discrimination index for Question #16. (b) Does Question #16 have a high or low discrimination index? Justify. Share your answers with your coursemates in the myINSPIRE online forum. Copyright © Open University Malaysia (OUM) 188 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 9.5 APPLICATION OF ITEM ANALYSIS ON ESSAY-TYPE QUESTION The previous subtopic explains the use of item analysis on multiple-choice questions. Item analysis can also be applied to essay-type questions. This subtopic will illustrate how this can be done. For ease of understanding, the illustration will use a short-answer essay question as an example. Let us assume that a group of 20 learners have responded to a short-answer essay question with scores ranging from a minimum of 0 to the maximum of 4. Table 9.1 provides the scores obtained by the learners. Table 9.1: Scores Obtained by Learners for a Short-answer Essay Question Item Score No. of Learners Earning Each Score Total Scores Earned 4 5 20 3 6 18 2 5 10 1 3 3 0 1 0 Total 51 Average score 51/20 = 2.55 The difficulty index (p) of the item can be computed using the following formula as suggested by Nitko (2004): Average score p Possible range of score Using the information from Table 9.1, the difficulty index of the short-answer essay question can be easily computed. The average score obtained by the group of learners is 2.55, while the possible range of score for the item is (4 ă 0) = 4. Thus, 2.55 p 4 0.64 Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 189 The difficulty index (p) of 0.64 means that on average learners received 64 per cent out of the possible maximum score for the item. The difficulty index can be interpreted in the same way as that of the multiple-choice question discussed in subtopic 9.3. The item is of a moderate level of difficulty (refer to Figure 9.2). Note that in computing the difficulty index in the above example, the scores from the whole group are used to obtain the average score. However, for a large group of learners, it is possible to estimate the difficulty index for an item based on only a sample of learners comprising the „high mark‰ and „low mark‰ groups as in the case of computing the difficulty index of a multiple-choice question. To compute the discrimination index (D) of an essay-type question, the following formula is suggested by Nitko (2004): Difference between upper and lower groups average score D Possible range of score Using the information from Table 9.1 and presenting it in the format as shown in Table 9.2, we can compute the discrimination index of the short-answer essay question. Table 9.2: Distribution of Scores Obtained by Learners Average Score 0 1 2 3 4 Total Score High mark group (n = 10) 0 0 1 4 5 34 3.4 Low mark group (n = 10) 1 3 4 2 0 17 1.7 „n‰ refers to the number of learners The average score obtained by the upper group of learners is 3.4 while that of the lower group is 1.7. Using the formula as suggested by Nitko (2004), we can compute the discrimination index of the short-answer essay question as follows: 3.4 1.7 D 4 0.43 Copyright © Open University Malaysia (OUM) 190 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS The discrimination index (D) of 0.43 indicates that the short-answer question does discriminate between the upper and lower groups of learners and at a high level (refer to Figure 9.3.). As in the computation of the discrimination index of the multiple-choice question for a large group of learners, a sample of learners comprising the top 27 per cent and the bottom 27 per cent may be used to provide a good estimate. ACTIVITY 9.3 The following information is the performance of the high mark and the low mark groups in a short-answer essay question. Score 0 1 2 3 4 High mark group (n = 10) 2 2 3 1 2 Low mark group (n = 10) 3 2 2 3 0 (a) Calculate the difficulty index. (b) Calculate the discrimination index. (c) Discuss the findings. Discuss the findings with your coursemates in the myINSPIRE online forum. Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 191 9.6 RELATIONSHIP BETWEEN DIFFICULTY INDEX AND DISCRIMINATION INDEX Theoretically, the more difficult a question (or item) or the easier the question (or item), the lower will be the discrimination index. Stanley and Hopkins (1972) provided a theoretical model to explain the relationship between the difficulty index and discrimination index of a particular question or item (refer to Figure 9.4). Figure 9.4: Theoretical relationship between difficulty index and discrimination index Source: Stanley & Hopkins (1972) Copyright © Open University Malaysia (OUM) 192 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS According to the model, a difficulty index of 0.2 can result in a discrimination index of about 0.3 for a particular item (which may be described as an item of moderate discrimination). Note that as the difficulty index increases from 0.1 to 0.5, the discrimination index increases even more. When the difficulty index reaches 0.5 (described as an item of moderate difficulty), the discrimination index is positive 1.00 (very high discrimination). Interestingly, for difficulty index of more than 0.5, the discrimination index decreases. Why is this so? (a) For example, a difficulty index of 0.9 results in a discrimination index of about 0.2 which is described as an item of low to moderate discrimination. What does this mean? The easier the question, the harder it is for that question or item to discriminate between those learners who know and do not know the answer to the question. (b) Similarly, when the difficulty index is about 0.1, the discrimination index drops to about 0.2. What does this mean? The more difficult the question, the harder it is for that question or item to discriminate between those learners who know and do not know the answer to the question. ACTIVITY 9.4 1. What can you conclude about the relationship between the difficulty index of an item and its discrimination index? 2. Do you take these factors into consideration when giving a multiple-choice test to students in your school? Explain. Share your answers with your coursemates in the myINSPIRE online forum. Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 193 9.7 DISTRACTOR ANALYSIS In addition to examining the performance of the entire test items, teachers are also interested to examine the performance of individual distractors (incorrect answer options) on multiple-choice items. By calculating the proportion of learners who chose each answer option, teachers can identify which distractors are "working" and appear attractive to learners who do not know the correct answer and which distractors are simply taking up space and not being chosen by many learners. To eliminate blind guessing which results in a correct answer purely by chance (which hurts the validity of a test item), teachers want as many plausible distractors as is feasible. Analyses of response options allow teachers to fine tune and improve items they may wish to use again with future classes. Let us examine performance on an item or question (refer to Figure 9.5). Example: Which European power invaded Melaka in 1511? Figure 9.5: Effectiveness of distractors Generally, a good distractor is able to attract more „low mark‰ learners to select that particular response or distract „low mark‰ learners towards selecting that particular response. What determines the effectiveness of distractors? In Figure 9.5, a total of 24 learners selected the options A, B, C and D for a particular question. Option B is a less effective distractor because many „high mark‰ learners (n = 5) selected option B. Option D is relatively a good distractor because two Copyright © Open University Malaysia (OUM) 194 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS learners from the „high mark‰ group and five learners from the „low mark‰ group selected this option. The analysis of response options shows that those who missed the item were equally likely to choose option B and option D. No learners chose option C. Therefore, option C does not act as a distractor. This is because learners are not choosing between four answer options on this item, they are really choosing between only three options as they are not even considering option C. This makes guessing correctly more likely, which hurts the validity of the item. The discrimination index can be improved by modifying and improving options B and C. ACTIVITY 9.5 Which British resident was killed by Maharajalela in Pasir Salak? Hugh Low Birch Brooke Gurney Options No Response A B C D High mark (n = 15) 4 7 0 4 0 Low mark (n = 15) 6 3 2 4 0 The answer is B. Analyse the effectiveness of the distractors. Share your answer with your coursemates in the myINSPIRE online forum. 9.8 PRACTICAL APPROACH TO ITEM ANALYSIS Some teachers may find the techniques discussed earlier as time consuming, which cannot be denied (especially when you have a test consisting of 40 items). However, there is a more practical approach which may take up less time. Imagine that you have administered a 40-item test to a class of 30 learners. Surely it will take a lot of time to analyse the effectiveness of each item and may discourage you from analysing each item in a test. Diederich (1971) proposed a method of item analysis which can be conducted by the teacher and the learners in his class. The following are the steps: Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 195 (a) Step 1 Arrange the 30 answer sheets from the highest score obtained to the lowest score obtained. (b) Step 2 Select the answer sheet that obtained a middle score. Group all answer sheets above this score as „high marks‰ (mark a „H‰ on these answer sheets). Group all answer sheets below this score as „low marks‰ group (mark an „L‰ on these answer sheets). (c) Step 3 Divide the class into two groups (high and low) and distribute the „high‰ answer sheets to the high group and the „low‰ answer sheets to the low group. Assign one learner in each group to be the counter. (d) Step 4 The teacher then asks the class. Teacher: The answer for Question #1 is „C‰ and those who got it correct, raise your hand. Counter from „H‰ group: 14 for group H. Counter from „L‰ group: 8 from group L. (e) Step 5 The teacher records the responses on the whiteboard as follows: High Low Total Correct Answers Question #1 14 8 22 Question #2 12 6 18 Question #3 16 7 23 | | Question #n n n n (f) Step 6 Compute the difficulty index for Question #1 as follows: R H R L 14 8 Difficulty index 0.73 30 30 Copyright © Open University Malaysia (OUM) 196 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS (g) Step 7 Compute the discrimination index for Question #1 as follows: R H R L 14 8 6 Difficulty index 0.40 1 30 15 15 2 Note that earlier, we took 27 per cent of answer sheets in the „high mark‰ group and 27 per cent of answer sheets in the „low mark‰ group from the total answer sheets. However, in this approach we divided the total answer sheets into two groups. There is no middle group. The important thing is to use a large enough fraction of the group to provide useful information. Selecting the top and bottom 27 per cent of the group is recommended for more refined analysis. The method shown in the example may be less accurate but it is a „quick and dirty‰ method. ACTIVITY 9.6 1. Compare the difficulty index and discrimination index obtained using this rough method with the theoretical model by Stanley and Hopkins in Figure 9.4. Are the indexes very far out? 2. Teachers should perform an item analysis every time after administering a test. Explain your answers with your coursemates in the myINSPIRE online forum. 9.9 USEFULNESS OF ITEM ANALYSIS TO TEACHERS After each test or assessment, it is advisable to carry out item analysis of the test items because the information from the analysis would be useful to teachers. Among the benefits from the analysis are as follows: (a) From the discussions in the earlier subtopics, it is obvious that the results of item analysis could provide answers to the following questions: (i) Did the item function as intended? (ii) Were the items of appropriate difficulty? Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 197 (iii) Were the items free from irrelevant clues and other defects? (iv) Was each of the distracters effective (in multiple-choice questions)? Answers to the previous questions can be used to select or revise test items for future use. This would help to improve the quality of test items and the test paper for future use. It also saves time for teachers when preparing test items for future use because good items can be stored in an item bank. (b) Item analysis data can provide a basis for efficient class discussion of the test results. Knowing how effectively each test item functions in measuring the achievement of the intended learning outcome and how learners perform in each item, teachers can have a more fruitful discussion with the learners as feedback based on item analysis is more objective and informative. For example, teachers can highlight the misinformation or misunderstanding reflected in the choice of particular distractors on multiple-choice questions or frequently repeated errors on essay-type questions, thereby enhancing the instructional value of assessment. If, during the discussion, the item analysis reveals that there are technical defects in the items or the marking scheme, learnersÊ marks can also be rectified to ensure a fairer test. (c) Item analysis data can be used for remedial work. The analysis will reveal the specific areas that the learners are weak in. Teachers can use the information to focus remedial work directly on the particular areas of weakness. For example, based on the distractor analysis, it is found that a specific distractor has a low discrimination with a high number of learners from both the high mark and low mark groups choosing the option. This could suggest that there is some misunderstanding on a particular concept. Remedial lessons can thus be planned to arrest the problem. (d) Item analysis data can reveal weaknesses in teaching and provide useful information to improve teaching. For example, despite the fact that an item is properly constructed, it has a low difficulty index, suggesting that most learners fail to answer the item satisfactorily. This might indicate that the learners have not mastered a particular syllabus content that is being assessed. This could be due to the weakness in instruction and thus necessitates the implementation of more effective teaching strategies by the teachers. Furthermore, if the item is repeatedly difficult for the learners, there might be a need to revise the curriculum. Copyright © Open University Malaysia (OUM) 198 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS (e) Item analysis procedures provide a basis for teachers to improve their skills in test construction. As teachers analyse learnersÊ responses to items, they become aware of the defects of the items and what causes them. When revising the items, they gain experience in rewording the statements so that they are clearer, rewriting the distractors so that they are more plausible and modifying the items so that they are at a more appropriate level of difficulty. As a result, teachers improve their test construction skills. 9.10 CAUTIONS IN INTERPRETING ITEM ANALYSIS RESULTS Despite the usefulness of item analysis, the results from such an analysis are limited in many ways and must be interpreted cautiously. The following are some of the major concerns to observe: (a) Item discriminating power does not indicate item validity. A high discrimination index merely indicates that learners from the high mark group performed relatively better than the learners from the low mark group. The division of the high mark and low mark groups is based on the total test score obtained by each learner, which is an internal criterion. By using the internal criterion of total test score, item analysis offers evidence concerning the internal consistency of the test rather than its validity. The validity of a test needs to be judged using an external criterion, that is, to what extent the test assesses the learning outcomes intended. (b) The discrimination index is not always an indicator of item quality. For example, a low index of discriminating power does not necessarily indicate a defective item. If an item does not discriminate but it has been found to be free from ambiguity and other technical defects, the item should be retained especially in a criterion-referenced test. In such a test, a non-discriminating item may suggest that all learners have achieved the criterion set by the teacher. As such, the item does not discriminate between good and weak learners. Another possible reason why low discrimination occurs for an item is that the item may be either very easy or very difficult. Sometimes, this item is necessary or desirable to be retained in order to measure a Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 199 representative sample of learning outcomes and course content. Moreover, an achievement test is usually designed to measure several different types of learning outcomes (knowledge, comprehension, application and so on). In such a case, there will be learning outcomes that are assessed by fewer test items and these items will have low discrimination because they have less representation in the total test score. Removing these items from the test is not advisable as it will affect the validity of the test. (c) This type of traditional item analysis data is tentative. They are not fixed but influenced by the type and number of learners being tested and the instructional procedures employed. The data would thus change with every administration of the same test items. Therefore, if repeated use of items is possible, item analysis should be carried out for each administration of each item. The tentative nature of item analysis should therefore be taken seriously and the results are interpreted cautiously. 9.11 ITEM BANK An item bank is a large collection of easily accessible questions or items that have been administered over a period of time. For achievement tests which assess performance in a body of knowledge such as geography, history, chemistry or mathematics, the questions that can be asked are rather limited. Hence, it is not surprising that previous questions are recycled with some minor changes and administered to different groups of learners. Making good test items is not a simple task and can be time consuming for teachers. An item or question bank would be of great assistance to teachers. An item bank consists of questions that have been analysed and stored because they are good items. Each stored item will have information on its difficulty index and discrimination index. Each item is stored according to what it measures especially in relation to the topics of the curriculum. These items will be stored in the form of a Table of Specifications indicating the content being measured as well as the cognitive levels measured. For example, from the item bank, you will be able to draw items measuring the application of concepts for the topic on „Electricity‰. You will also be able to draw items from the bank with different difficulty levels. Perhaps, you want to arrange easier questions at the beginning of the test so as to build confidence in learners and gradually introduce questions of increasing difficulty. Copyright © Open University Malaysia (OUM) 200 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS With computerised databases, item banks are easy to access. Teachers will have hundreds of items at their disposal from which they can draw upon when developing classroom tests. This would certainly help teachers with the tedious and time consuming task of having to construct items or questions from scratch. Unfortunately, not many educational institutions are equipped with such an item bank. The more common practice is for teachers to select items or questions from commercially prepared workbooks, past examination papers and sample items from textbooks. These sources do not have information about the difficulty index and discrimination index of items or information about the cognitive levels of questions or what they aim to measure. Teachers will have to figure out for themselves the characteristics of the items based on their experience in teaching the content. However, there are some issues with regard to the use of item bank. One of the major concerns of the item bank is how to place different test items collected over time on a common scale. The scale should indicate difficulty of the items, one scale per subject matter. Retrieval of items from the bank is made easy when all items are placed on the same scale. The person in charge must also take every effort to add only quality items to the item pool. To develop and maintain a good item bank requires a great deal of preparation, planning, expertise and organisation. Although Item Response Theory (IRT) approach is not a cure all pill for item banking problems, many of these issues can be solved. 9.12 PSYCHOMETRIC SOFTWARES Software programs designed for general statistical analysis such as SPSS can often be used for certain types of psychometric analysis. Various software programs are available specifically to analyse the data from tests. Classical Test Theory (CTT) is an approach to psychometric analysis that has weaker assumptions than Item Response Theory (IRT) and is more applicable to smaller sample sizes. Under CTT, the learnerÊs raw test score would be the sum of the scores received on the item in the test. For example, Iteman is a commercial program while TAP is a free program for classical analysis. Copyright © Open University Malaysia (OUM) TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS 201 IRT is a psychometric approach which assumes that the probability of a certain response is a direct function of an underlying trait or traits. Under IRT, the concern is whether the learner obtained each item correctly or not, rather than the raw test score. The basic concept of IRT is about the individual item of test rather than about the test scores. LearnersÊ trait or ability and item characteristics are referenced to the same scale. For example, ConQuest is a computer program for item response and latent regression models and TAM is an R package for item response models. Item analysis is a process which examines the responses to individual test items or questions in order to assess the quality of those items and the test as a whole. Item analysis is conducted to obtain information about individual items or questions in a test and how the test can be improved. The difficulty index is a quantitative indicator with regards to the difficulty level of an individual item or question. The discrimination index is a basic measure which shows the extent to which a question discriminates or differentiates between learners in the „high mark‰ group and „low mark‰ group. Theoretically, the more difficult a question (or item) or easier the question (or item), the lower will be the discrimination index. By calculating the proportion of learners who chose each answer option, teachers can identify which distractors are "working" and appear attractive to learners who do not know the correct answer and which distractors are simply taking up space and not being chosen by many learners. Generally, a good distractor is able to attract more „low mark‰ learners to select that particular response or distract „low mark‰ learners towards selecting that particular response. An item bank consists of questions that have been analysed and stored because they are good items. Copyright © Open University Malaysia (OUM) 202 TOPIC 9 APPRAISING CLASSROOM TESTS AND ITEM ANALYSIS Classical Test Theory (CTT) High mark group Difficult question Item analysis Difficulty index Item bank Discrimination index Item Response Theory (IRT) Distractor analysis Low mark group Easy question Psychometric Good distractor Copyright © Open University Malaysia (OUM)