Week 9 - MC Testing 2024-25 PDF

Learning to Learn Week 9 Multiple-Choice Testing Overview MC Testing as assessment MC Testing as a Learning Tool Retrieval Practice with repeated questions Pitfalls of MC testing - negative testing effect Retrieval Practice with Related Questions PeerWise - Generating MC questions MC Testing as Assessment MC Testing as Assessment Long history, dating back to at least 1916 Used in schools and universities as they are easy to grade and can be seen as more “objective” Two markers will often disagree on blind marks assigned to students’ work No need for second marker with MC testing Also used with large-scale standardised tests Used historically with SAT (Scholastic Aptitude Test) GRE (Graduate record exam) MC tests can be tested! Remove questions that are at ceiling or oor Remove questions that behave “oddly” (e.g., don’t correlate well with overall score) fl MC Test Formats Many different formats Most common is to select single option representing the correct answer Criticism: Doesn’t measure “partial knowledge” very well Partial knowledge: examinee knows only part of the answer or is not con dent about their answer Many other formats do measure partial knowledge: Con dence marking (assign con dence to favourite option) Elimination testing (eliminate all options that examinee identi es as incorrect) Complete ordering (rank options from most to least favourite) Partial ordering (eliminate if con dent and rank the remaining ones) Probability testing (distribute 100 points across the options) fi fi fi fi MC Test Formats Probability Testing: Final score is sum of probabilities assigned to correct answers Precision in measurement of partial knowledge Examinees tend to like it (allows them to fully express their knowledge in a natural way) “…the performance of probability testing was impressive for most tests and by most criteria” (Ben-Simon et al., 1997, p. 85) We used to use this type of testing for both PSYC1016 and PSYC1022, but the data les were impossible to interpret (i.e., technology not up to the task). fi Partial Knowledge Suppose a student’s favourite option is incorrect, but their second favourite is the correct option Standard format: 0% (they select the incorrect option) Probability testing: Student assigns higher probability (e.g.,60%) to favourite (but incorrect) option and lower probability to (e.g., 40%) to second favourite (but correct) option They would score >0% (e.g., 40%) for that question Multiple Correct Answers Used in some modules in School of Psychology Examinee can select one or more options and there may be more than one correct answer Examinee must decide how many correct answers there are and which ones they are Not much research on this type of testing Concern that risk-taking students will score differently than risk-averse students when that has nothing to do with what we’re trying to measure Correction for Guessing Limited number of options means that correct answers can occur without underlying knowledge (correct guessing) Many different scoring rules have been developed to counter the effect of correct guessing Most common for standard test: subtract partial score for each incorrect response 1/(n-1) where n=number of options Can omit answers to avoid the penalty Example: On a 4-option MC test with 100 questions, student scores 70 correct responses, 20 incorrect responses, and 10 omissions Score = 70 - 20*(1/[4-1]) = 70 - 20*(.33) = 70 - 6.33 = 63.33 Correction for Guessing Problems for formula scoring: Examinees don’t like being penalised!! Test score is in uenced by metacognition: students must assess their knowledge correctly when deciding which answers to omit Metacognition is not perfect Test score is in uenced by risk-taking tendencies Students tend to omit answers that are correct, thereby reducing their score fl fl Correction for Guessing (Higham, 2007) Correction for Guessing (Higham, 2007) MC Testing as a Learning Tool Retrieval Practice with Repeated Questions McDaniel et al. (2007) Web Based university course on Brain and Behaviour Assigned weekly readings and then practiced with: MC quizzing Short-answer quizzing Rereading Corrective feedback was provided Some facts were not practiced at all (not exposed) McDaniel et al. (2007) Read Only All preganglionic axons, whether sympathetic or parasympathetic, release acetylcholine as a neurotransmitter Short answer All preganglionic axons, whether sympathetic or parasympathetic, release ________ as a neurotransmitter MC All preganglionic axons, whether sympathetic or parasympathetic, release _________ as a neurotransmitter: a. acetylcholine b. epinephrine c. norepinephrine d. adenosine McDaniel et al. (2007) Practiced Not Exposed 0.6 0.55 0.5 0.45 0.4 0.35 0.3 Read Only MC Quizzes Short-Answer Quizzes MC vs Recall Retrieval Practice McDaniel et al. (2007) suggests short-answer (cued-recall) is best However, other studies suggest otherwise… Bottom line: ANY format for retrieval practice will provide bene ts MC vs Recall Retrieval Practice Meta-Analyses MC Cued-Recall 0.8 0.65 Effect Size 0.5 0.35 0.2 Rowland (2014) Adesope et al. (2017) Yang et al. (2021) Pitfalls of MC Practice Tests Negative Testing Effect Retrieval Practice Final Test From what classic movie comes From what classic movie the line, 'Frankly my dear, I don't comes the line, 'Frankly my give a damn'? dear, I don't give a damn'? All About Eve A. All About Eve ✗ People sometimes respond with a B. Casablanca lure from practice test on cued- C. Gone with the Wind recall test: D. Sunset Blvd negative testing effect Marsh et al. (2009) - Final Test Had participants practice answering SAT II MC test questions on biology, chemistry, world history, US history Formula scored, so there was the option to omit responses Filler task Answer short-answer questions 40 items from earlier test 40 new items Marsh et al. (2009) - Final Test Undergraduates High-School Juniors 0.6 0.6 0.5 Proportion Responses 0.5 Proportion Responses 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Correct Answers MC Lures Correct Answers MC Lures Tested Not Tested Marsh et al. (2009)- Initial Test Undergraduates High-School Juniors 0.6 0.6 0.5 Proportion Responses 0.5 Proportion Responses 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Correct MC Lure Omit Correct MC Lure Omit Corrective Feedback and the Negative Testing Effect Retrieval Practice Final Test From what classic movie comes From what classic movie the line, 'Frankly my dear, I don't comes the line, 'Frankly my give a damn'? dear, I don't give a damn'? Gone with the Wind A. All About Eve ✗ B. Casablanca Negative testing effect reduced C. Gone with the Wind correction by corrective feedback (e.g., D. Sunset Blvd Butler & Roediger, 2008) MC Testing as a Learning Tool Retrieval Practice with Related Questions Related Questions with Different Correct Answers Retrieval Practice Final Test From what classic movie comes From what classic movie the line, 'Frankly my dear, I don't comes the line, ‘Here’s looking give a damn'? at you kid’? A. All About Eve ✗ (Correct answer: Casablanca) B. Casablanca What happens? Do people respond with C. Gone with the Wind correction earlier correction (now wrong)? Previous D. Sunset Blvd response? Or the correct response? Related Questions with Different Correct Answers Retrieval Practice Final Test From what classic movie From what classic movie comes comes the line, ‘Here’s looking the line, 'Frankly my dear, I don't at you kid’? Casablanca give a damn'? (Correct answer: Casablanca) A. All About Eve Competitive B. Casablanca Lures C. Gone with the Wind Elaborative D. Sunset Blvd Retrieval Little et al. (2019) Participant rst completed an online MC practice test with general knowledge questions Elimination testing to encourage processing of all the lures After a distractor task, completed a cued-recall test Previously tested items (repeated) Related questions (like Gone With The Wind/Casablanca example) New questions (control) fi Little et al. (2019) 0.6 0.5 0.4 0.3 0.2 Repeated Related New (Control) However… Whether practice with related items facilitates later test performance depends on the whether the nal test is in cued-recall or MC format fi Alamri & Higham (2022) Followed a similar design as Little et al. (2019), but tested two groups of participants Group 1: Practice test = MC; Final test = cued-recall Group 2: Practice test = MC; Final test = MC Alamri & Higham (2022) Alamri & Higham (2022) Replicates Little et al. (2019) Alamri & Higham (2022) New Finding: Impaired performance Most of the errors were due to participants Participants selecting the believed the corrective related questions feedback from the were repeated practice test, which (and had same is no longer correct answer): false recognition Alamri & Higham (in press) Does false recognition of practice questions also lead to impaired MC performance in genuine educational contexts? Can sequencing of related pairs be used to boost student performance? First-year psychology students (n = 164) wrote a 55-item MC test in preparation for the nal exam Main manipulation was the sequencing of related pairs 11 related-separated pairs 11 related-back-to-back pairs 11 new items No repeated items Alamri & Higham (in prep) Lots of false recognition Very little false recognition (discrepancy detection) Performance impairment Comparing questions might boost performance (Although no repeated items) Alamri & Higham (in prep) Facilitation was observed even though participants answered all questions in MC format. Bigger boost in back-to- back condition Negative Testing Effect Related Questions Effect MC (T1) -> CR (T2) MC (T1) -> MC (T2) Repeated questions with same correct Related questions with different correct answers answers (Selected) lures on T1 are falsely Feedback on T1 is erroneously selected recalled on T2 again on T2 Caused by feedback (original erroneous Reduced by feedback selections are largely avoided) Errors caused by responding with earlier Errors caused by repeating earlier error corrective feedback MC Testing as a Learning Tool Generating MC Questions PeerWise An online tool that allows students in a course to collaborate and learn by creating, sharing, answering, and discussing multiple- choice questions Substantial evidence that it promotes learning Kelley et al. (2019) PeerWise involves two effective learning techniques: Generation - generating information leads to better memory than reading At least on standard memory tests like those used in university Retrieval Practice - retrieving information leads to better long-term retention than restudying it Investigated the role of each in this study Kelley et al. (2019) Participants were 40 students enrolled in a cognitive psychology course Participant were required to both generate and answer questions For each of eight textbook chapters, participants were required to generate one question (generation component) Could generate more than one if they wanted and were free to choose the particular content Required to provide explanations for the correct and incorrect options (including a book citation with page numbers) and to tag the question with key words (i.e., chapter, broad topic, narrow topic) Evenly spaced throughout the semester Students were also required to answer and evaluate other students questions (retrieval practice component) Kelley et al. (2019) At the end of the course, the researchers examined the questions generated and answered by each student Determined if there were exam questions that overlapped with either generated or answered questions For example: A student generates a question for topic X, and there is a exam question for topic X A student answered another student’s question on topic Y and there was an exam question on topic Y Compared exam performance on these questions to control questions: exam questions on topics that did not overlap with a student’s authored or answered PeerWise topics Kelley et al. (2019) Recommendations Take practice tests and lots of them Don’t worry about the format (MC vs cued-recall) If practicing with MC tests, have corrective feedback available to avoid negative testing effect Try not to search for repeated questions If questions remind you of practice questions, solve the question again rather than trying to remember the corrective feedback Use PeerWise!! References Alamri, A., & Higham, P. A. (2022). The dark side of corrective feedback: Controlled and automatic in uences of retrieval practice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 48, 752–768. https://doi.org/10.1037/xlm0001138 Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21, 65–88. http://dx.doi.org/10.1177/0146621697211006 Butler, A. C., & Roediger, H. L. (2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36(3), 604–616. https://doi.org/10.3758/MC.36.3.604 Higham, P. A. (2007). No Special K! A signal detection framework for the strategic regulation of memory accuracy. Journal of Experimental Psychology: General, 136, 1–22. https://doi.org/10.1037/0096-3445.136.1.1 Kelley, M. R., Chapman-Orr, E. K., Calkins, S., & Lemke, R. J. (2019). Generation and retrieval practice effects in the classroom using PeerWise. Teaching of Psychology, 46, 121–126. https://doi.org/10.1177/0098628319834174 Little, J. L., Frickey, E. A., & Fung, A. K. (2019). The role of retrieval in answering multiple-choice questions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45, 1473–1485. https://doi.org/10.1037/xlm0000638 Marsh, E. J., Agarwal, P. K., & Roediger, H. L. (2009). Memorial consequences of answering SAT II questions. Journal of Experimental Psychology: Applied, 15, 1–11. https://doi.org/10.1037/a0014721 McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513. https://doi.org/10.1080/09541440701326154

Week 9 - MC Testing 2024-25 PDF

Document Details

Tags

Related

Summary

Full Transcript