week4-1_testing (1).ppt
Document Details
Uploaded by BraveJubilation
Full Transcript
Reliability 01/18/24 1 Reliability Reliability= the consistency of test scores A measure’s ability to produce consistent results Unreliable if we cannot count on something, somebody to behave consistently The extent to which a variable is being measured without error. Error-free measurem...
Reliability 01/18/24 1 Reliability Reliability= the consistency of test scores A measure’s ability to produce consistent results Unreliable if we cannot count on something, somebody to behave consistently The extent to which a variable is being measured without error. Error-free measurement is impossible! 01/18/24 2 Sources of Consistency and Inconsistency (Why do test scores vary at all?) Lasting and general characteristics of the individual. Some people do consistently better than others because they are just good at that task. E.g. Spelling ability Lasting but specific characteristics of the individual. Some people who are generally poor might nevertheless know how to spell many of the particular words included in the test. 01/18/24 3 Sources of Consistency and Inconsistency (Why do test scores vary at all?) Temporary but general characteristics of the individual. internal distractions (e.g., fatigue) A person who is ill or tired might do poorly. Temporary and specific characteristics of the individual. The test might contain words like Baltimore, Seattle. A child who took the test shortly after looking at sports news might have a temporary advantage. 01/18/24 4 Sources of Consistency and Inconsistency (Why do test scores vary at all?) Testing situation. external distractions (e.g., noises) Getting lower scores in a noisy, poorly lit class Chance factors Luck Momentary distraction 01/18/24 5 General Model of Reliability The extent to which individual differences in test scores are attributable to “true” differences or to chance errors. Holds that every score has two components: True score that reflects the examinee’s true skills, abilities, knowledge, etc. A combination of all factors result in consistency in measurement. Error score 01/18/24 6 General Model of Reliability 01/18/24 7 General Model of Reliability Error variance: due to random, unsystematic factors. Any condition that is irrelevant to the purpose of test represents error variance. 01/18/24 8 General Model of Reliability The Reliability Coefficient The ratio of true score variance to the total variance of test scores rxx = 2T/2x rxx = 2T/(2T + 2e) 2 T x2 2 T o r 2 2 e T rxx = .90 indicates ........ that 90% of the score variance is due to true score variance. 01/18/24 9 Reliability Estimates Estimates of true score variance There are 3 types we will discuss: 1.Test-retest 2.Alternate Forms 3.Internal Consistency Split Half Alpha Reliability 01/18/24 10 Test-retest The tendency of test to yield relatively similar scores for the same individual over time. Administer a test at two different points in time to the same group of people. Calculate the correlation between these scores (Pearson r between Time 1 score and Time 2 score) High correlation temporal stability Problem1 – Practice effects Problem2 – some behaviors may fluctuate daily. Problem3 – testing can be time consuming and expensive. 01/18/243 11 Alternate Forms Make up 2 equivalent forms of a test Give both to 1 group of people, at one time. Compute the correlation between Form A and Form B of the test (consistency between parallel tests) Practice effect is not a problem anymore since tests have different items. Problem1 – Order effects Problem2-difficult to develop several alternate forms. 01/18/24 12 Internal Consistency How much the individual items in a test “go together”, or whether they are all measuring the same thing. Based on the intercorrelations of the items, and the number of items. Two types of internal consistency: 1. Split-half 2. Alpha reliability 01/18/24 13 Split-Half Make a very large test and administer it to one group of people. Divide the items into 2 smaller tests. Two halves must be as similar as possible. Compute a correlation between the two halves. Only one test administration, practice effects are minimized. High correlation Internal consistency 01/18/24 14 Split-Half The simplest method is ______, Odd-even split. 01/18/24 15 Alpha reliability Coefficient alpha: represents the mean reliability coefficient one would obtain from all possible split-halves. rxx k r ij 1 k 1 r ij k = number of items; k usually >1 rij= average intercorrelation among the items. 01/18/24 16 Alpha reliability First, administer the test to a group of people. Then, compute the correlations among all items and compute the average of those intercorrelations. Lastly, use the formula to estimate reliability. rxx 01/18/24 k r ij 1 k 1 r ij 17 Alpha reliability 01/18/24 18 Alpha reliability 01/18/24 19 Alpha reliability If there are more than one dimension, report an estimate of internal consistency for each homogeneous subtest or factor. Source: From Personality Assessment Inventory by L. C. Morey. Copyright © 1991. Published by Psychological Assessment Resources (PAR). 20 Alpha reliability If there are more than one dimension, report an estimate of internal consistency for each homogeneous subtest or factor. A hypothetical test for accountants that contained subtests for calculation skills, and use of a spreadsheet. Source: From Personality Assessment Inventory by L. C. Morey. Copyright © 1991. Published by Psychological Assessment Resources (PAR). 21 Alpha reliability Alpha values α ≥ 0.9 0.7 ≤ α < 0.9 0.6 ≤ α < 0.7 0.5 ≤ α < 0.6 α < 0.5 01/18/24 judgment Excellent (HighStakes testing) Good (Low-Stakes testing) Acceptable Poor Unacceptable 22 Split half vs. Alpha reliability The difference between them is in terms of unit of analysis. Split half compares _______ One half of the test with the other half. Alpha reliability compares ________ Each item with every other item. 01/18/24 23 Scorer Reliability and Agreement Scorer Reliability – the amount of consistency among scorers’ judgments. Examiner variance in clinical instruments. Scorer variance in projective techniques. Do not deal with Factors which we actually measure. Disturbing factors which can be controlled experimentally. 01/18/24 24 One might be interested in the stability over time, rather than in the stability of scores obtained by different 01/18/24 psychologists. 25 The Interpretation of Reliability Coefficients What is an acceptable level of reliability? .90? .70? Depends on the purpose of measurement. Applied work generally requires more. For theoretical research, .70 is usually OK. 01/18/24 26 Factors Affecting Reliability The test itself (e.g., sample of items, test length, typos, item quality, scoring difficulty) Administration conditions (vary across sessions, distractions) The scoring process (scorers make mistakes, scoring equipment faulty, scorers are inconsistent, scorers are biased) Test takers (motivation, fatigue) Test developers (poor definition of domain, biased coverage) Test-Retest Interval 01/18/24 27 Take-Home Message! To improve realibility; reduce the proportion of random error. To improve realibility; eliminate weak items, add good ones, adjust difficulty, increase administration time, better standardization. 01/18/24 28