Assessment Prelims Quiz 2 PDF

Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 19 of 198 UNIT 1: Standardization in Test Administration Engage: “UNIFORMLY USEFUL” Today, a second group of 40 college freshmen will be taking an Intelligence Test. Ana, a Psychometrician, makes sure to prepare the same materials, set up the testing room the same way, and distribute the materials in the same order as she did when she conducted the same test last week to the first group of freshmen. Once the examinees are settled, she gives the same instructions like she did during the test last week. What do you think is the importance of: - doing the same procedure when conducting a test? - giving the same instructions to examinees?. Explore: Read the given scenario: Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 19 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 20 of 198 Two junior classes, Class A and Class B, will be taking an Aptitude Test today. The school is currently being renovated, so the Psychometricians- in-charge need to be flexible. Class A will be taking the test in the mini gymnasium, where windows are big enough to allow sunlight in, and the room is wide enough for proper ventilation. The chairs and tables are spread apart, making it comfortable for the examinees. The gym’s sound system is used, so the Psychometrician’s voice is audible and clear enough to be heard across the gym. Class B, on the other hand, is assigned to take the test in a classroom at the basement where it is cold, and quite dark. The lights tend to flicker, as the classroom has not been used for a while. For their test, armchairs will be used, and there is not much distance between examinees. Every now and then, the sounds of hammering and pouring of cement disrupts the Psychometrician who is giving instructions. Reflect on the scenario. What can you say about it? Will test performance be affected? How so? To better understand these two activities, let us discuss the concept of “Standardization”. Explain: Do you still recall that a psychological test is a standardized measure? What exactly do we mean by “standardized”? STANDARDIZATION refers to the uniformity of procedures in the administration and scoring of a psychological test. A prerequisite in psychological testing is the creation of a behavioral situation which is standardized in space, as well as in time. When conditions are standardized, differences in test results are more certainly attributable to differences in the person factor, and not to differences in the stimuli or conditions affecting them. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 20 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 21 of 198 Refer back to the scenario in the Explore part of this unit. Classes A and B will be taking the same test, but under different conditions. This goes against standardization. Class B will probably get lower scores, resulting in the test not being able to accurately and reliably measure what it intends to measure. This is caused, not by individual differences, but by the test-taking condition they are in. What should have been done? To apply standardization, the rooms should have been the same in terms of lighting, ventilation, seating arrangements, and the like. Also, the condition of the examinees should have been considered. The rationale requires that the Psychometrician should try to standardize the state of the examinee, as well as the test stimuli. Procedures must be designed to eliminate irrelevant individual differences or extraneous variables, so that only factors that the test aims to measure are left. The psychometrician should try to reduce all examinees to a “standard state” of motivation, expectation, and interpretation of the task, It is important to keep in mind that the single independent variable is usually the individual being tested / the examinee. UNIFORMITY OF PROCEDURES IN TEST ADMINISTRATION ✓ Task of the Test Constructor Psychological tests are administered under a prescribed set of procedures. The test constructor is the authority on matters regarding the test, from construction of items to the final version of the test. He/she is responsible for providing a detailed set of procedures for administering the developed test. These procedures should be clearly and completely stated in the test manual as the interpretation of a psychological test is most reliable when measurements are obtained under standardized conditions. Directions or instructions are a major part of the standardization of a new test, and it would help to also have instructions that a test examiner could simply read to examinees. Standardization extends to the exact materials employed, time limits, preliminary demonstrators, and ways of handling examinees’ queries, as well as other details of the testing situation. Necessary test materials should be included in the test packet. ✓ Task of the Test Examiner/Psychometrician With RA10029, there is a need for a qualified examiner (licensed Psychometrician). Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 21 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 22 of 198 The examiner should be trained in the task he/she is expected to perform in every test he/she will be using. A prerequisite of test administration is familiarity with the test. Advanced Preparation is the most important single requirement for good testing procedures. There can be no emergencies in testing, as every step should be planned and prepared for. Special efforts must be made to foresee and forestall emergencies. It is in this way that uniformity of procedures can be assured. Advanced Preparation for the testing session can be achieved in various forms and ways: Memorizing the exact verbal instructions is most essential in individual testing. In group testing, instructions are read to the test takers, but familiarity with the statements to be read is still a must to prevent misreading and hesitation (“umm..”, “err..”). it also permits a more natural, informal manner during test administration, and allows the examiner to be more at ease with him/herself. Instructions should be read word per word, adding nothing and changing nothing. Test materials should be prepared before the test is administered. Familiarization of test materials should be done prior to test administration. This includes being knowledgeable about when materials are to be used and how to use them. In individual testing, the actual layout of necessary materials facilitates subsequent use, and decreases the need to search or fumble. Materials should be placed within easy reach of the examiner, and should not distract the test taker. In group testing, all materials such as answer sheets, test questionnaires, pencils, and the like should be carefully counted, checked, and arranged in advance of the testing day. Time limits for each test or subtest should be strictly observed. Sufficient time should be allotted for the entire testing process. This includes setting up, reading of instructions, and actual test taking. Thorough familiarity with the specific testing procedure is an important prerequisite in both individual and group testing. For individual testing, supervised training in test administration is essential. For group testing, briefing of examiners and proctors should be done for awareness of functions that each should perform. Awareness of the following test conditions should also be observed by the Psychometrician: Physical Conditions Place: The selection of a suitable testing room should be given attention. The room should be free from undue noise and distraction, and should have adequate lighting and ventilation. Sitting facilities and enough working space for test takers should be provided. The type of desks or chairs should be Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 22 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 23 of 198 considered. A sign on the door indicating that testing is ongoing may be posted to avoid disruptions. There are possible differences between paper-and-pencil and computer administration of the same test. Professional guidelines have been formulated to aide test users in accessing comparability of test scores obtained under these two types of administration. The nature of the test and the population of test takers are considered in relation to the effect of differences in test administration on norms, reliability, and validity. Time: In administering a test, the time in which a test is taken should be considered. Alert subjects are more likely to give their best than subjects who are tired. Children, for instance, may be more active in the morning after having a good breakfast than in the afternoon when there is a need for a nap. Generally though, equally good results can be produced at any hour of the day if the examinee really wants to do well. Occasionally, it is necessary to administer a test to a person at a psychologically inopportune time. If this is the case, the only correct procedure is to maintain an adequately critical attitude toward the results. The unfavorable testing conditions should be taken into account when interpreting results. Motivational Conditions It is but proper for the Psychometrician to remember that the “subject” being tested is a person, a human being. This then makes testing a more complex psychological relationship. The traditional concern with motivation recognizes this fact. Influence of the Examiner: Rapport is defined as “a close and harmonious relationship in which the people or groups concerned understand each other's feelings or ideas and communicate well.” Examiners may differ in their abilities to establish rapport. Those who are quite unfriendly or unwelcoming will likely obtain less cooperation from their subjects, and may result in reduced performance in ability tests or defensive, distorted results on personality tests. On the other hand, those who are overly warm or affectionate may err in the opposite direction, and may even give subtle cues to correct answers. Examinees may feel uncomfortable with both extremes, thus, they should be avoided. Examiners are urged to establish rapport with their subjects. In psychometrics, rapport refers to the examiner’s efforts to arouse the examinee’s interest in the test, elicit cooperation, and ensure that the examinee follows the standard test Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 23 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 24 of 198 instructions. A crucial aspect of valid testing is the ability to initiate a cordial testing environment. Part of examiner training is training on some techniques for establishing rapport. These will vary depending on the nature of the test, the age, and other special characteristics of the person tested. In ability tests, for instance, careful concentration on the given tasks and giving one’s best efforts to perform well is the objective. In personality inventories, the objective calls for frank and honest responses, while in projective tests, full reporting of associations evoked by stimuli without censoring or editing content is required. In all instances, the examiner aims to motivate test takers to follow directions as conscientiously as they can. In terms of age groups, there are special factors to be considered in the establishment of rapport and building of motivation. In testing preschool children, a friendly, cheerful, and relaxed approach on the part of the examiner may help reassure the child. The test may be presented as a game, and should be intrinsically interesting. Early elementary school children may also be given the game approach, while older school children can usually be motivated through an appeal to their competitive spirit and the desire to do well on tests. Adult testing may present some additional problems, as adults are not likely to work hard at a task merely because it is assigned. It is then important to promote the purpose of the test to the adult so that he/she would understand the need for such, and be motivated as well. For people of any age, the examiner should note that every test presents an implied threat to the individual’s prestige. Providing some reassurance should then be given from the very start. Special motivational problems may be encountered in testing emotionally disturbed persons, prisoners, or juvenile delinquents. Such persons are likely to manifest unfavorable attitudes, such as suspicion, insecurity, fear, or cynical indifference. The examiner should make special efforts to establish rapport under these conditions. He or she must be sensitive to these special conditions and take them into account when interpreting and explaining test results and performance. Background and Motivation of the Examinees: Examinees differ not only in personality and internal characteristics, but also in other extraneous ways that Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 24 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 25 of 198 might affect or influence test results. In some instances, test results may be inaccurate due to the filtering and distorting effects of certain characteristics, such as: ❖ Test Anxiety – This is a type of performance anxiety characterized by a combination of physiological over-arousal, tension and somatic symptoms, along with worry, dread, fear of failure, and catastrophizing, that occur before or during test situations. Undoubtedly, subjects experience different levels of test anxiety ranging from a carefree outlook to incapacitating dread at the prospect of being tested. Emotionality and worry are two important components of test anxiety. The emotionality component consist of feelings and physiological reactions such as increase in heartbeat and tension. The worry component is the cognitive component which includes negative self-oriented thoughts, such as expecting to do poorly and concern about the consequences of failure. These thoughts may disrupt performance as they draw attention away from the task. The process of how anxiety affects test performance is better understood by first analyzing the role of motivation as a determinant of performance. Motivation-theorists found that the level of performance depends on the product of two factors: ability and efficiency. Efficiency is influenced by the strength of motivation expressed in the task. The relationship between the strength of motivation and the efficiency of performance is curvilinear. This means that it is only when the strength of motivation is optimal that the efficiency of performance is 100%. An efficiency level of 100% means that the subject’s true level of ability is fully expressed in his or her performance. The figure above further shows that stronger or lesser than optimal levels of motivation results to performance that does not reflect one’s true Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 25 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 26 of 198 ability. Below is a numerical example to supplement the above figure and explanations. Given five examinees with the same level of true ability, but with different strengths of motivation (such as test anxiety), the following levels of performance result: Examinee True Strength of Efficiency Level of Ability Motivation Performance A 100 1.50 50 B 100 2.80 80 C 100 3 1.00 100 D 100 4.80 80 E 100 5.50 50 The table above shows that only examinee C had an optimal level of motivation, and was 100% efficient in his task. The same applies to psychological testing, that is, the relationship between the level of test performance and anxiety is also curvilinear. A slight amount of anxiety is beneficial while a large amount of anxiety or none at all, is detrimental. This implies that an able but unmotivated or highly motivated person could actually perform less than one who is less able but more optimally motivated. The detrimental effect of anxiety may be decreased by the same strategies used by the examiner to elicit the subject’s best efforts. The practices that may enhance rapport may also serve to reduce anxiety. A well-organized, sooth running testing operation on the part of the examiner will contribute to the goal of lowering the subject’s anxiety. Motivation to Deceive: This refers to the intention to falsify responses on a test. Test results may be inaccurate if the subject intends to perform in an inadequate manner, or give answers that are not true of him/her. Although rare, overt faking of test results may happen. A well-trained Psychometrician can be able to detect conscious faking by asking two questions: 1) Does the client have a motivation to perform deceitfully on the tests? and 2) Is the overall pattern of test results suspicious in the light of other information known about the client? Answering “Yes” to both questions shows that test results should be approached with skepticism. This is also one of the reasons why other relevant information about the client is asked, as the examiner can check for consistency between information given and test results obtained. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 26 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 27 of 198 Effects of Prior Training on Test Performance: Any educational experience undergone by the individual, either formal or informal, should be reflected in his/her performance on tests that measure or sample relevant aspects of behavior. For instance, studying math with a tutor should result to high scores on intelligence tests with areas relevant to math, and also be reflected in the individual’s behavior (e.g. responses when asked about math equations). In evaluating the effects of training or practice on test scores, a fundamental question is whether the improvement is limited to the specific items in a test or whether it extends to the broader behavior domain that the test is designed to assess. The answer to this question represents the difference between coaching and education. COACHING refers to the tutoring process of preparing for a psychological assessment by practicing with the tests beforehand. Its influence on test scores has been widely studied, and presents a serious problem to test developers because it can inflate a subject’s test score without correspondingly improving his/her overall abilities in the domain being tested. The influence of coaching on test scores may include several components, such as: 1) extra practice on test-like materials, 2) review of fundamental concepts likely to be covered by the test, and 3) advice on optimal test-taking strategies. These guarantee that the coached individual performs at his/her best, but still gives him/her an improper advantage. Coaching tends to concentrate on a particular sample of skills and knowledge covered by the test, rather than on the broader knowledge domain that the test tries to assess. Take for instance, an individual who is coached by being given sample items of the test, particularly in the Numerical area. He would be able to answer these on the actual test and get a good score, however, he would not actually possess the knowledge about it on a broader sense. In addition, coaching may be available to some test takers, but not to others. This then tends to introduce individual differences in narrowly defined test-taking skills, thereby reducing the diagnostic value of the test. Scores would no longer be a valid representation of an individual’s ability. Coaching, however, may be more beneficial for those with deficient educational backgrounds than those who have received superior educational opportunities and are already prepared to do well on tests. In addition, the more similar the test content and coaching material, the greater improvement there will be in test scores. TEST SOPHISTICATION refers to familiarity with a test or a subtype of a test, or sheer test-taking practice. It consists of an extensive prior experience in taking standardized tests. An individual with such test sophistication may enjoy a certain advantage in test performance. This advantage partly stems from having overcome an initial feeling of unfamiliarity, and having developed more self-confidence and better attitude on test-taking. Part is the result of a Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 27 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 28 of 198 certain overlap in the type of content and function covered in many tests. Performance may be slightly improved by having specific familiarity with common item types and practice in the use of objective answer sheets. It is particularly important to take test sophistication into account when comparing scores obtained by persons whose test-taking experience may have varied widely. Short orientation and practice sessions can be quite effective in equalizing test sophistication. Such familiarization training (being familiar with the test situation only without the involvement of actual test items) reduces the effects of prior differences in test-taking experiences. Experience with similar tests, coaching, or the ability to respond advantageously to items that contain extraneous clues and suggestions may yield a score that is higher than the true ability of the test taker. WHAT DO WE PICK UP FROM ALL THESE? 1. Standard procedures are to be followed to the minutest detail. The test author and test publisher are responsible in describing such procedures clearly and completely in the test manual. 2. Any unusual condition, no matter how minor, should be recorded and taken note of. 3. Testing conditions should be taken into account in the interpretation of test results. Elaborate: For further understanding and awareness, let us delve deeper into the influence of the background and state of the examinee on his/her obtained test results. Extensive studies and researches have confirmed the notion that test anxiety is negatively correlated with aptitude test scores, measures of intelligence, and school achievement. These correlational findings, however, are not interpreted straightforward. The following are possible explanations: 1. Persons develop test anxiety because of a history or past experience of performing poorly on tests. Reductions in performance may come before, and cause test anxiety. 2. Paulman and Kennelly (1984) found that even without anxiety, many test-anxious students display ineffective test taking in academic settings. Such students would perform poorly on tests whether they were anxious or not. 3. Naveh-Benjamin et.al. (1987) determined that poor study habits predispose a large proportion of test-anxious college students to perform poorly on tests.\ Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 28 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 29 of 198 4. Test anxiety is a by-product of lifelong frustration over mediocre or average test results. 5. Test anxiety has a directly detrimental effect on test performance, in which test anxiety is both cause and effect of poor test performance. Sarason (1961) conducted a study on this topic, and tested high- and low-anxious college students under neutral or anxiety-inducing instructions. The subjects were required to memorize two-syllable words low in meaningfulness, which was a difficult task. Half of the subjects performed under neutral instructions – they were simply told to memorize the lists. The other half was told to memorize the lists and told that the task was an intelligence test. As such, they were urged to perform as well as possible. Result: Both groups did not differ significantly in performance when the instructions were neutral and non-threatening. When the instructions triggered anxiety, however, performance levels for the high-anxious subjects dropped significantly. This left them at a huge disadvantage compared to low-anxious subjects. This indicates that test-anxious subjects show significant reductions in performance when the situation is perceived as a test. Low-anxious subjects, on the other hand, are relatively unaffected by such a simple redefinition of the context. In addition to the above-mentioned implications of test anxiety, it is important to note that tests with narrow time limits pose a special problem to persons with high levels of test anxiety. Time pressure seems to worsen the degree of personal threat, causing significant reductions in the performance of test-anxious persons. Have you experienced having test anxiety? What was it like? How did you overcome it? ========================== Read the following findings on Coaching, and answer the question at the end of the text. The effects of coaching differ in procedure and results, as found in some studies. In a British study, for instance, it was found that an individual who has no previous experience with objective speed tests may have a bit of increase in test score with coaching. Increase in test scores were measured by repeating the same test after the experimental interval. According to these studies: a. The control group (group that did not receive coaching) gained, on the average, about 2-3 points in IQ, merely as a result of taking the first test. b. The experimental group (“coached” group) gained about 5-6 points, after having been told about the test, and having numerous representative items explained. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 29 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 30 of 198 c. The “practiced, uncoached” group gained about 6 points, after taking around four to eight tests without special explanations. d. The “practiced and coached” group gained 8-10 points. These gains from a longer period of coaching are no greater. To this, Cronbach states that “all the gains from coaching are rather small”. Around 6 points added to the score might be enough to push, let’s say, a college applicant over the borderline, but coaching will not carry the really poor prospect past the examination hurdle. In other words, it may still not be enough for him/her to be accepted. The conclusion arrived at by French and Dear is that, “an eager College candidate should not spend money on special coaching for the Scholastic Assessment Test (SAT). He would probably gain at least as much by reviewing mathematics on his own and by reading a few good books.” A partial solution to the problem of coaching that many test developers are considering is to make self-tutored coaching available to the public. Not everyone will make use of the materials, but at least there will be an available opportunity. Contrast COACHING versus TEST SOPHISTICATION versus "EDUCATION". Which of these involves a breach in standardization and uniformity in measurement? UNIT 2 : Standardization of Procedures in Scoring & The Concept of Guessing Engage: Recall the last time you answered an essay question. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 30 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 31 of 198 What score was given for your answer? Do you think the score you got was fair? Have you ever wondered how answers to essay questions are scored? Explore: Here are items from several standardized psychological tests. HOW would you SCORE the answers to these? Outline and elaborate on your answer. 1. What is a tricycle? 2. In what way is a banana and a mango alike? 3. Why do we wash our clothes? Explain: In the previous unit, we discussed uniformity of procedures in test administration. This time, let us look into standardizing procedures in scoring. UNIFORMITY OF PROCEDURES IN SCORING For scores obtained from different individuals to be comparable, testing conditions, as well as scoring procedures should be standardized. Scientific research or sound practical decisions cannot be reached if scoring standards vary unreliably. So how are scoring procedures standardized? Here are several proposed solutions depending upon the type of test item: For Open-Ended Types of Items: Different types of tests employ different types of questions. When the test taker or examinee has to construct a suitable response orally or in writing, or sometimes by manipulating objects, this is called an open-ended type of question. Recall tests, essay tests, other complex problem-solving tests, and most psychomotor tests and job-performance tests fall under this category. Free-response items, which ask the examinee to supply whatever response is fitting, are also included within this group. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 31 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 32 of 198 HOW TO SCORE OPEN-ENDED ITEMS to ensure standardized and objective scoring: The PRIMARY strategy to ensure standardized and objective scoring for open- ended types of items is this: Develop rules for judgment that all scorers will follow Specifically, the rules for judgment can be made using the following methods: a. It is possible to construct guides that show possible acceptable responses and corresponding credits, such as those in the test manuals of the WISC, WAIS, and Stanford-Binet. Example: Recall the item from the Explore part of this unit. How would you score the answers to: In what way is a banana and a mango alike? Rule: Answer must express the main feature or main essence of similarity POINT/S ANSWER 2 points both are fruits; both can be eaten 1 point both are yellow; both grow on trees; both have seeds No other answers such as: both are long, both are nutritious; both points are God’s creations; I love them both What is a tricycle? Rule: A good synonym will be given full points. A complete description with mention of the function or use will be given full credit. POINT/S ANSWER 2 points a mode of transportation; a ride with three wheels 1 point it has three wheels No points other answers such as: I own one; it is fun to ride in; bigger than a bike; its shiny Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 32 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 33 of 198 b. Guides that display specimens representing various levels of quality (e.g. handwriting, drawing) may be used as references for scoring. Example: HOW TO CONSTRUCT OPEN-ENDED ITEMS to ensure standardized and objective scoring: Suggested Rules for Item Construction: a. The direct question form is preferred over the statement form. Example: Who is the National Hero of the Philippines? instead of: He is the National Hero of the Philippines. b. The question should be designed in such a way that there is only one possible correct answer. c. The question should be worded so that the answer is short and definite – preferably a single word, number, or symbol. This will facilitate objective scoring. d. Extraneous hints should be avoided, as these may enable examinees to identify the correct answer. ADVANTAGES AND DISADVANTAGES of the Open-Ended Type of Item: Advantages: ✓ For free-response items, familiarity and naturalness are the foremost advantages worth mentioning. ✓ Open-ended questions and free-response items are relatively easy to construct. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 33 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 34 of 198 ✓ These types of items decrease the likelihood that examinees will result to guessing the correct answer. ✓ These types of items are argued to be more difficult, as examinees are confronted with a situation that is not highly structured and they have to choose answers from among a large number of options. Limitations: o The answer to these types of items call for judgment in scoring. Scoring then, is not always entirely objective. o These types of items are more time consuming in terms of answering and scoring, and scoring is more difficult. Now, since there is still a possibility of subjectivity in scoring open-ended types of items, test constructors and developers may opt to come up with structured items. For Structured Items: Use structured items. These are items where the subject has to choose one right answer, or the best answer from among alternatives that the test provides. With this, the chance for subjectivity in scoring is eliminated, ensuring standardized and objective scoring. Multiple-choice type of items, true or false, matching type items, where a direct question or incomplete statement is presented, and a number of possible responses is given, are examples of Structure Items. Examinees are then asked to choose the response that best answers the question or that which is the best expression for completing the statement. The question or incomplete statement introducing the test item is referred to as the STEM. Generally, four or five responses are listed, and all but one is a distracter. The correct answer is called the KEYED OPTION or KEY. Any undesired or incorrect answer is called a DISTRACTER or FOIL. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 34 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 35 of 198 Example: STEM Who established the very first experimental laboratory for psychology? A. William James DISTRACTERS B. Wilhelm Wundt KEY C. Sigmund Freud D. James McKeen Cattell Variations : Single-response – this is the most familiar form where only one correct or best answer is required Example: The capital of the Philippines is __. A. Baguio City B. NCR C. Manila D. Cebu Multiple-response – having two or more correct responses Example: Which of the following are fruits? A. Cucumber B. Tomato C. Carrot D. Apple E. Cherry Combined-response - the answers include two or more correct choices, but only one choice is the correct answer - the one that lists all the choices. Example: Which of the following is a fruit? A. Cucumber B. Tomato C. Apple D. Both B and C HOW TO CONSTRUCT STRUCTURED ITEMS to ensure standardized and objective scoring: Suggested rules for construction: ▪ The “stem” should contain the problem and all qualifications, including words that would otherwise be repeated in each alternative. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 35 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 36 of 198 ▪ Omissions in incomplete statements should not occur early in the stem as this necessitates re-reading of the stem and more time spent on an item. Example: The capital of the Philippines is __. instead of: ___ is the capital of the Philippines. ▪ Each item should be as short as possible, consistent, and clear. ▪ Negative expressions should be minimized, as such expressions reduce the clarity of an item, create confusion, and may artificially add to its difficulty. Example: A tomato is not a vegetable because it does not __. Note the use of negative expressions (not, does not). ▪ Make certain that only one answer among the choices is clearly the best answer. ▪ Maintain grammatical consistency in all response alternatives. ▪ Vary the number of options included in the test items as needed. There will be at least three and not more than five. The examinees should be told about it at the beginning of the test. ▪ Distracters must be plausible and equally attractive to the examinee who does not know the correct responses. ▪ Avoid “stems” that reveal the answer to another item. ▪ Responses such as “none of these”, “not given”, or “can’t tell” should be carefully presented, and may be used as a last option as distracters but only when an exactly correct response is possible. ▪ Avoid placing the correct response in the same place all the time (for instance, in a 25-item test, the correct answer is the third option in 20 items, or coming up with “patterns” using the keyed answers). Vary the position/place of the correct answer. ADVANTAGES AND DISADVANTAGES of the Structured Type of Item: Advantages: ✓ Once the initial key (answer) is fixed, no judgment in scoring is required. ✓ Versatility is the outstanding feature of a multiple-choice item. It can be used to determine how well subjects can recall the most specific pieces of information, as well as the ability to apply important principles in new situations. ✓ Successful guessing may not be eliminated, but these can be reduced considerably. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 36 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 37 of 198 ✓ Multiple-choice items may be used at all grade levels (except primary), and in all subject areas. Limitations: o Multiple-choice items are difficult to construct, and suitable distracters may be hard to find. o The response time should be considered. For a given amount of testing time, subjects can complete fewer multiple-choice items than true or false items. This is especially true when items demand fine discriminations and fundamental understandings. POINTS TO KEEP IN MIND: o A test is characterized as an objective, as well as standardized measure. o The purpose of standardizing a test is to give it objectivity, that is, to device an instrument that will be free from subjective/personal judgments regarding the skill, ability, knowledge, trait, or potentiality to be measured and evaluated. o A test that is objective allows every observer or judge seeing a performance to arrive precisely at the same report. o To make a test objective, several elements should be paid attention to. These include administration procedures, conditions of the examinee, immediate and accurate recording of observations to eliminate errors of recall, and scoring by the same rules. o The use of time and observance of time-limits in some tests is important in establishing the objectivity of a measurement. There is a distinction between the so-called “time-limit” and “work-limit” tasks. In both cases, a reliable timer/chronometer is necessary for the objective and standardized procedures in both administration and scoring. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 37 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 38 of 198 TIME-LIMIT WORK-LIMIT Examinees are given the same Examinees are given the same amount of time. number of items. The score would depend on the The score would depend on the time number of items completed. needed to complete all the items. e.g. in 10 minutes, how many items Given 100 items, how long will it take can you finish? for you to finish? Examinee A: 15 items Examinee X: 12.5 minutes Examinee B: 10 items Examinee Y: 30 minutes Examinee C: 45 items Examinee Z: 24 minutes At this point it would be good to remind yourself of the purpose for following strict protocols for standardization and objectivity in administration and scoring of the test. Ensuring such procedures will certainly help build a psychometrically worthy tool of measurement. The next section will present to you another aspect that many of us have experienced: guessing in a test or exam. This time, however, let us try to understand how guessing can interfere with objectivity in the measurement of behavior. Elaborate: The Problem of GUESSING Imagine taking a test in which there is an item that you do not know the answer to, or are not sure of the answer. “Eeny-meeny-miny-moe…okay, B it is.” Is this a familiar scenario? Like many, you probably have guessed answers in tests. But in what instances do you guess? Do you end up getting the right answer? Do you have strategies in guessing? Indeed, guessing is a common behavior in test-taking. When we speak of psychological tests, the occurrence of guessing can also be quite common. Is it bad to guess? But what if you end up getting the right answer? Will this not be advantageous? Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 38 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 39 of 198 Let us try to understand this phenomenon more. When taking an objective ability test, some examinees are likely to ask, “Should I guess when I am not sure?” At times, the test directions provide an answer to this question, but even where advice to guess or not to guess is given, some ambiguity remains. It is against the rules for the tester to give supplementary advice, and he/she must retreat to: “Use your own judgment.” The following discussion intends to clarify the guessing problem for the examiner and test developer. When one does not know the answer in objective ability tests, a pervasive tendency is to guess. Sometimes, this guess is based on partial knowledge, other times on misinformation, and still at other times, on no information at all. In the last instance, one may not even have read the test item, or if they did, its answer is a total mystery. If the score is determined by the number of correct responses, any score the examinees had when they guessed will raise the score. Obviously, this should not be the case. The test score should reflect ability only, and NOT ability plus the examinee’s willingness to guess and the amount of success he/she has in guessing (read that again. Make sure you understand that statement). To register what the examinee really knows, one should be able to distinguish between knowledge, competence, and “good luck”. Given this, examinees should then be discouraged from wild guessing. In instances where people guess more freely than others, a correction formula is desired to wipe out gains due to guessing. In tests, items may be thought of as falling into two categories: those which the examinee knows the answer to, and those the examinees cannot answer. Common sense will tell us that it is only in the latter category that guesses are made. If the item calls for choice of alternatives, the examinee has a chance of picking the correct response even on items to which he/she does not know the answer. When guesses are made, some scoring formula can be applied to correct the final score. Scoring rules assume two things: 1. That a wrong choice (incorrect answer) represents an unlucky guess, and 2. That a number of unlucky guesses is equal to the number of lucky guesses Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 39 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 40 of 198 Various formulas have been developed for correction for guessing. The general formula for both 2-choice tests and for 3 or more choices per item tests is as follows: S=R W - n-1 Where: S is the test score (corrected) R is the number of correct responses; number of “rights” W is the number of incorrect responses; number of “wrongs” n is the number of response alternatives per item Example: In a 30-item multiple-choice test with 3 alternatives per item, subject A is SURE about the answers to 21 items. He, however, is either unsure of or does not know the correct answer to 9 items. Subject A guesses the answers to these 9 items. After, the test examiner checks the paper of subject A, and finds that there are 24 correct answers and 6 wrong answers. S = 24 - 6 3-1 S = 21 This means that although Subject A got a score of 24 out of 30, the correction formula, considering that he guessed some of the answers, gives a real test score of 21. For 2-choice tests, as in true-false tests, n is 2 and the formula is reduced to “the number of items right minus the number of items marked wrong.” S=R- W Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 40 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 41 of 198 For n-choices per item tests, as in multiple-choice tests, the chance probability of a correct guess is 1/n, and that of an unlucky guess is (n-1)/n. For every (n-1) incorrect guess, 1 correct guess is expected, on the average, hence, the correction formula most often used in this case is “right minus wrong” / (n-1), as shown above. The possibility of “increasing” one’s score by guessing is greatly reduced when the number of distracters increases. CRITIQUE OF THE CORRECTION FORMULAS FOR GUESSING Unfortunately, the logic followed so far does not describe the situation accurately. The assumptions only hold true in cases where guessing is done by “pure chance.” However, guessing is not a matter of pure chance because experiences and common sense permit one to guess correctly even on items one knows least about. One may have partial information on some items or perhaps he/she may “know” that the distracters are not the right answers. One then cannot simply divide the items into those that the subject knows perfectly and those which he does not know at all. There are items he knows fairly well but is not positive of, and others that he has hazy knowledge about. For these reasons, the correction formulas do not apply. From the point of view of the tester, on the other hand, the tendency to guess remains an unstandardized aspect of the situation that interferes with accurate measurement. Hence, several attempts to solve the problem of guessing were devised. Soderquist proposed a “confidence scoring”. The subject need not only give his answer, but he must also indicate how sure he is about the correctness of his responses. Both are combined in scoring the items. That is, the examinee receives more points credit for a correct answer rated as ‘highly confident’, than for a response rated ‘uncertain.’ This confidence scoring yielded higher reliability than the conventional scoring, but it does not seem to affect (increase) the validity. Furthermore, it becomes very complex and rather difficult to follow. According to Stanley and Wang, “The (above) procedure offers no clear advantages at the present time.” At present, the actual stand about guessing is that if the test items offer at least 3 or 4 or more alternatives, or if the test is long enough, the influence of guessing should not be overestimated. |to guess one’s way to a high score,” is impossible, even when guessing may give one a few more points on the test. Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 41 Document Code FM-STL-013 Saint Louis University Revision No. 01 School of Teacher Education and Liberal Arts Effectivity June 07, 2021 Page 42 of 198 The systematic advantage of the guesser is eliminated if the test manual directs everyone to guess. It is usual now to inform subjects that wild guessing is to their disadvantage, and to encourage them to respond when they can make an informed judgment as to the most reasonable answer. In other words, encourage examinees to guess intelligently by developing an appropriate answer strategy. Do you make use of any strategies when guessing? If so, what are these strategies? UNIT 3: Standardization: Introduction to Introduction to Item Analysis, Distribution of Test Scores, Item Analysis for the Index of Difficulty, including Interval Scales for Difficulty Indices and Decisions in Relation to Testing Purpose. Learning Tip: This is where you will have to get re-acquainted with your best friend, the scientific calculator! Please make sure you utilize your calculator in doing the computations for the next modules. You may also apply what you learned in Psychological Statistics Lab in the use of Excel for computing the different statistical indices! DO NOT PANIC. YOU.CAN.DO.THIS. (refer to Test Anxiety and how it can benefit learning) Engage: Pre-assessment activity: “Identifying strengths and flaws” What makes a good test? ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ Explore: Property of and for the exclusive use of SLU. Reproduction, storing in a retrieval system, distributing, uploading or posting online, or transmitting in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise of any part of this document, without the prior written permission of SLU, is strictly prohibited. 42

Assessment Prelims Quiz 2 PDF

Document Details

Tags

Related

Summary

Full Transcript