Language Testing in the 21st Century PDF

Summary

This document provides an overview of language testing in the 21st century. It examines various approaches, including computer-assisted language learning (CALL), music-based methods, and considerations for testing in digital environments. The article also addresses different types of testing, such as high-stakes and low-stakes assessments, along with the challenges presented by automated scoring.

Full Transcript

# Language Testing in the 21st Century ## Introduction - **Game:** a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome. - Over the past decade, the number of publications on CALL (Computer-Assisted Language Learning) has increased si...

# Language Testing in the 21st Century ## Introduction - **Game:** a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome. - Over the past decade, the number of publications on CALL (Computer-Assisted Language Learning) has increased significantly. - This growth should be compared to the total number of CALL publications during the same periods for better context. - Most research on CALL games has focused on design issues. - Since developing game-based language learning (DGBLL) is costly and interest in traditional tutorial CALL has decreased, the strong focus on design suggests potential for integrating tutorial CALL tools into gaming environments. ## Learners' Perceptions - Cornillie, Clarebout, and Desmet highlight the importance of considering learners' perceptions when designing language learning games, especially regarding corrective feedback. - Their mixed-method study shows that learners generally view corrective feedback in immersive role-playing games positively. ## Music-based Language Teaching Methods - Music is recognized as an effective tool for language learning across all levels. - Methods vary: - lesson planning - teaching grammar or vocabulary - base entire courses on music - Edwards => many ESL teachers don't use music due to:  - cost and lack of training - though 72% were interested in learning music-based strategies. - Initially focused on children, music-based methods are now also tailored for older learners. ### The Audio-Singual Method - Uses familiar songs to teach English. - Familiar tunes create a sense of recognition, helping learners overcome fear and resistance. - Music and songs are more effective, faster, and easier to recall than traditional drills. ## Materials Development - Provide teachers with materials to integrate songs into lessons, ranging from supplemental use to the core of instruction. ## Low Stakes & High Stake Tests - **Low Stakes:** Their use has only indirect or relatively insignificant consequences for stakeholders. - **High Stakes:** Directly influences a decision that may be life-changing for the test-taker or extremely costly to remedy if made in error. ## Automated Scoring - Automated scoring raises important concerns about test validity. - A key issue is that it may fail to fully represent the construct being measured due to its inherent limitations. ## Digital Environment - Developing a language test in a digital environment => Automated systems can help share test items with reviewers and add approved ones to final tests. - With built-in quality checks, they can prevent mistakes and keep tests consistent. - A well-organized digital system => collect evidence to check test reliability and validity => data like item reviews, reviewer qualifications, and feedback on items. - Test scores are consistent and fair. - Computer-based tests compared to paper or interviewer-based tests => more standardized environment, especially when using multimedia like audio and video. ## Technology Incorporation - Incorporating technology into an established high-stakes testing program in order to transform test development and administration procedures => implications for score interpretations. - Some researchers argue that taking a test on a computer versus on paper might measure different skills. - They suggest carefully studying how the test format affects performance. - However, computer-administered tests => proficiency in using computers. - Differences in computer skills among test-takers => bias, advantage to those more skilled with computers. - Differences in scores between mobile and non-mobile devices => test-takers avoid using mobile devices for high-stakes assessments. ## Computer-adaptive Tests - Computer-adaptive tests use a bank of pre-tested items with known characteristics to select questions based on the test-taker's ability level => more efficient test. ### Adaptivity - Varying item types or the number of items, either with the same or different difficulty levels. - High-stakes tests => item types are fixed => maintain consistent score interpretation. - Adaptivity can be controlled --> 1. by the test developer, who selects items based on performance, 2. by the test-taker, who chooses the items or constructs to measure. - Now, very few high-stakes language assessments use adaptive procedures, and those that do usually rely on examiner-adaptive methods (e.g., IELTS or the OPI) rather than computer-adaptive ones. - High-stakes language tests typically give control to the test developer for several reasons. - Allowing test-takers to control the test may confuse them, as they might not fully understand their language ability. - When test-takers choose the difficulty of questions, they tend to score higher => bias. - Self-adaptive testing => inaccurate scores, with the most capable test-takers scoring too low and the least capable scoring too high, though results vary. - Self-adaptive tests => less reliable + have a modest effect on reducing test anxiety. - Even when the test developer maintains control, using computer-adaptive procedures may still be unsuitable for high-stakes language tests. ## Challenges of Computer Adaptive Tests - A challenge of computer-adaptive: all items must be pre-tested with large samples => estimate their parameters. - A large item bank is needed. - A large item bank is needed -> cover various ability levels and ensure there are enough items for frequent test-taker abilities. - Maintaining such an item bank can be very costly. ## Technology's Role in Rater Training and Scoring - Technology can help improve rater training and scoring. - **Rater Calibration:** Raters score sample performances and get feedback to ensure they are consistent. - This process helps avoid bias in scoring. - Using technology allows test developers to monitor the quality of ratings. - For example, in an online system, raters score test performances, get feedback comparing their scores to benchmarks, and may need more training if their scores are too different from the expected ones. - **1. Rater Calibration:** Ensures raters score consistently by scoring sample performances and receiving feedback. - **2. Technology's Role:** Helps monitor rater training and scoring. - **3. Immediate Feedback:** Raters get feedback comparing their scores to benchmarks and may need retraining if their scores differ significantly. ## Automated Scoring - Automated scoring may limit the scope of what is being assessed, which could weaken the validity of claims about the test's accuracy. - Validity evidence can come from various sources => expert human raters' ratings, analysis of scores across different groups, comparisons of automated scoring factors, and the ability of automated systems to detect strategies used to manipulate scores. - Stakeholder acceptance and perceptions are also crucial when using automated scoring. - A study on the TOEFL Practice Online test showed => users trusted scores produced by a combination of automated and human raters more than those from automated scoring alone. - Users also indicated they were more likely to try to "game" the system when only automated scoring was used. ## High and Low Stakes - One important distinction between high- and low-stakes testing: the level of test-taker motivation that each generates. - When the stakes are high, the motivation for test-takers to cheat should not be underestimated. ## Test Security - Test score interpretations and their usefulness can be weakened if security is not properly maintained. - The Standards for Educational and Psychological Testing emphasize test security, stating that efforts should be made to prevent cheating and that test materials should be protected to maintain the integrity of the scores. ## Cheating Forms - Cheating in assessments can take various forms, including test-takers receiving or sharing information with others, using prohibited materials, or bypassing the testing process entirely. - Examples include paying for illegally obtained test items, using technology to get help during the test, or hiring someone to impersonate the test-taker, memorizing pre-prepared responses that can be altered for different speaking or writing prompts. ## Dimensions of Cheating - Cheating has two dimensions: - high-tech Vs. low-tech methods - individual Vs. collaborative cheating - Technology makes cheating easier => using small electronic devices like video cameras, cell phones, or digital recorders to capture test items or information. - The internet allows test-takers to collaborate widely across countries => cheating more accessible on a global scale. ## Prevention of Cheating - Technology can not only be used for cheating but also to prevent and detect it. - Four areas to prevent cheating: - identification and authentication, - digital security, - on-site cheating, - theft of item banks. - To validate a test-taker's identity, measures like multiple forms of photo ID, digital photos, facial recognition, biometric identification (e.g., retinal scans or fingerprints), and access tokens or registration codes can be used before, during, and after the test. - On-site cheating can be prevented or detected with monitoring equipment like video or audio recording devices + digital communication devices can be blocked or identified using tools such as noise generators, spectrum analyzers, and metal detectors. ## Large-Scale Testing - Large-scale testing involves exposing items to many test-takers at least once. - Due to the high cost of test development, item banks are typically updated only periodically. - Despite these monitoring efforts, the need for a large item bank remains, as any level of exposure still leaves it vulnerable. - Unexpectedly or consistently higher test scores at specific test centers => red flag, signaling possible coordinated cheating. ## Virtual Reality - The development of consumer-oriented VR devices (Oculus Rift) => fully-immersive virtual environments can be accessible to test developers -> to create more authentic and interactive tasks. ## Multimodal Tasks - There is growing interest in multimodal tasks => test-takers engage in multiple activities: reading a text, listening to speakers, and then responding by writing or speaking. ## Redefining "Reading Ability" - It is argued that language ability should be seen as an interaction between language and technology => "reading ability" could be defined more broadly to include skills like gathering information from the internet, in addition to traditional print sources.

Use Quizgecko on...
Browser
Browser