NU Dasmariñas Test Development Lecture 5 PDF
Document Details
Uploaded by Deleted User
NU Dasmariñas
Tags
Summary
This document is a lecture presentation about test development at NU Dasmariñas. It covers different topics including scaling methods like rating scales, equal-appearing intervals, absolute scaling, Likert scales, Guttman scales, and empirical keying. The document explains the stages of test development such as conceptualization, construction, tryout, item analysis, and revision, and also includes considerations for test construction, like defining the test purpose clearly, and choosing the right item formats.
Full Transcript
Test Development Lecture 5 PSYCH ASSESSMENT OBJECTIVE: At the end of the session, the students shall be able to: Iden%fy the concepts needed towards test development Explore the steps on how to scien%fically construct a test Be able to construct own items to be u%lized using ite...
Test Development Lecture 5 PSYCH ASSESSMENT OBJECTIVE: At the end of the session, the students shall be able to: Iden%fy the concepts needed towards test development Explore the steps on how to scien%fically construct a test Be able to construct own items to be u%lized using item analyis Test Development Test Development Is an umbrella term that refers to the process of creating a test. The 5 stages of test development Test Test Test Item Test Conceptualization Construction Tryout Analysis Revision Step 1: TEST CONCEPTUALIZATION - The process can be traced through thoughts - “There ought to be a test designed to measure _____ in such and such way” - An emerging phenomenon or pattern of behavior might serve as the stimulus for test conceptualization Pilot Work : the generalized term for preliminary research surrounding the creation of the test prototype Items must be subject to pilot studies to evaluate whether or not they should be included in the final form of the test Preliminary Questions to ask as test developer - What is the test designed to measure? - What is the objective of the test? - Is there a need for this test? - Who will use this test? - Who will take this test? - What content will the test cover? - How will the test be administered? Preliminary Questions to ask as test developer - What is the ideal format of the test? - Should more than one form of the test be developed? - What special training will be required of test users for administering and interpreting the test? - What type of responses will be required of test takers? - Who benefits from the administration of this test? - Is there any potential for harm as a result of the administration of this test? - How will meaning be attributed to scores on this test? Step 2: TEST CONSTRUCTION Scaling - Process of setting rules for assigning numbers in measurement. Rating scale - Grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude, or emotion are indicated by the test taker. Scaling Methods. Rankings of Experts - Asking a panel of experts which would then rank the behavioral indicators and provide a meaningful numerical score Step 2: TEST CONSTRUCTION Scaling Methods Method of Equal-Appearing Intervals - Developed by L. L. Thurstone (1929) - A large number of true-false statements reflects positive and negative attitudes - Items would be in an interval scale - Reliability and validity analyses are important to determine the appropriateness and usefulness - An item with a larger standard deviation would be dropped Step 2: TEST CONSTRUCTION Scaling Methods Method of Absolute Scaling - Obtaining a measure of absolute item difficulty based on results for different age groups of test takers - Commonly used in group achievement and aptitude testing Likert Scale - Consists of ordered responses in a continuum - Total score is obtained by adding the scores from individual items Step 2: TEST CONSTRUCTION Scaling Methods Guttman Scales - Respondents that endorse a stronger statement will also endorse on the milder ones Method of Empirical Keying - Test items are selected based entirely on how well they contrast a criterion group from a normative sample Step 2: TEST CONSTRUCTION Scaling Methods Method of Paired Comparisons - Test takers are presented with pairs of stimuli which they will be asked to compare Categorical Scaling - Stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum. Step 2: TEST CONSTRUCTION Writing Items Questions to consider: - What range of conatent should the items cover? - Which of the many different types of item formats should be employed? - How many items should be written in total and for each content area covered? Step 2: TEST CONSTRUCTION Writing Items - Define clearly what you want to measure - Generate an item pool (reservoir of items which will or will not be drawn for the final revision of the test) - Avoid exceptionally long items - Keep the level of difficulty appropriate for those who will take the test. - Avoid double-barreled items that convey two or more ideas at the same time - Consider mixing positively and negatively worded items scale Step 2: TEST CONSTRUCTION Approaches to Test Construction Rational (Theoretical) Approach - Reliance on reason and logic over data collection for statistical analysis Empirical Approach - Reliance on data gathering to identify items that relate to the construct Bootstrap - Combination of rational and empirical approaches based on a theory, then an empirical approach will be used to identify items that are highly related to the construct Step 2: TEST CONSTRUCTION Item Format form, plan, structure, arrangement, and layout of individual test items Multiple choice Matching Binary-choice (i. e., True or False) Short Answer Step 2: TEST CONSTRUCTION SCORING MODEL Cumulative - the number of items endorsed/responded to match the key which represents the construct being measured Class/Category - the placement of an individual to a particular class for description or prediction Ipsative - Respondent chooses between two or more equally socially acceptable options Step 3: TEST TRYOUT The test should be tried out on people who are similar in critical respects to the people to whom the test was designed A x 5 to 10 = n A = items on a questionnaire n = participants For validation purposes, there must be at least 20 participants each A good test helps in discriminating test takers Step 4: ITEM ANALYSIS is a process which examines responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. Item-Difficulty Index - Calculation of the proportion of the total number of test takers that answered the test correctly Item-Reliability Index - Indication of the test’s internal consistency - Use factor analysis Step 4: ITEM ANALYSIS Item-Validity Index - Indicates the degree on which a test is measuring what it intends to measure Item-Discrimination Index - How an item discriminates high-scorers and the low- scorers Step 4: ITEM ANALYSIS Considerations: - Guessing - Item fairness - Speed Tests Qualitative Item Analysis - nonstatistical procedures designed to explore how individual test items work - Comparison of individual test items with one another and the test as a whole Step 4: ITEM ANALYSIS “Think Aloud” Test Administration - Innovative approach to cognitive assessment by having respondents verbalize thoughts as they occur Expert Panels - Sensitivity Review - Expert panels may also play a role in the development of new tools of assessment for members of underserved populations Step 5: TEST REVISION Why we revise tests: Materials look dated and test takers can’t relate to them Popular culture changes Adequacy of test norms Changes in reliability or validity Theoretical modifications Psychological Assessment Step 5: TEST REVISION Cross Validation: Re-validation of a test on a sample of test takers other than those on whom test performance was originally found a valid predictor of some criterion. Co-Validation: Test validation process conducted on two or more test using the same sample of test takers. Psychological Assessment Step 5: TEST REVISION Quality Assurance Anchor Protocol - Produced by a highly authoritative scorer designed to model scoring and resolve discrepancies that goes along with it Scoring Drift - Discrepancy between scoring in an anchor protocol and another protocol - Evaluate properties of existing tests and guide in revisions - Determine measurement equivalence across populations - Development of item banks End of Discussion! ASSIGNMENT: 1. Construct a five-item binary scale for measuring motivation. 2. Construct a five-item Likert scale for measuring selfishness 3. Construct a five-item semantic differential scale for measuring attitude toward the COVID-19 response of the Philippine government. 4. Construct a five-item Guttman scale for measuring attitude toward depression