week4-2_testing.ppt
Document Details
Uploaded by BraveJubilation
Full Transcript
Using and Interpreting Information about Test Reliability Chapter 7 11/29/23 1 Aim is... Using the information about reliability in evaluating, interpreting and improving psychological tests. Reliability information alone is NOT enough Relationship between reliability and validity 11/29/23...
Using and Interpreting Information about Test Reliability Chapter 7 11/29/23 1 Aim is... Using the information about reliability in evaluating, interpreting and improving psychological tests. Reliability information alone is NOT enough Relationship between reliability and validity 11/29/23 2 Using the Reliability Coefficient It provides a relative measure of the accuracy of test scores. It doesn’t provide an indication of how accurate test scores really are, in absolute terms. A score of 110 from an intelligence test. Is it really higher than the average score (i.e.100)? How much variability should we expect on the basis of measurement error? RC doesn’t say this in concrete terms! So, we need to know the size of the standard error of measurement. 11/29/23 3 Using the Reliability Coefficient vs SEM Reliability coefficients are most useful in comparing the scores produced by different tests. The standard error of measurement (SEM) is more useful when interpreting test scores. 11/29/23 4 SEM – Standard Error of Measurement SEM – measure of how much the individual’s score is likely to differ from the individual’s true test score. SEM – 2 factors; The reliability of the test (rxx) The variability of test scores (X ) 11/29/23 5 SEM – Standard Error of Measurement A spelling test has a reliability coefficient of .84 and a standard deviation of 10, then SEM= 11/29/23 6 SEM – Standard Error of Measurement For testing purposes, SEM is more useful than reliability coefficients. We can use SEM to create a confidence interval around a users score. confidence interval= SEM X 1.96 (confidence level %95) As reliability increases, SEM decreases As a test becomes more reliable, we can feel more confident that an individual’s observed score is close to the individual’s true score. 11/29/23 7 SEM – Standard Error of Measurement A mean of 100, a standard error of 4.7 4.7 x 1.96= 9.2 (confidence interval) Range is 90.8 and 109.2 11/29/23 8 Confidence Intervals Confidence intervals reflect a range that contains the examinee’s true score. Confidence intervals are calculated using the SEM and the SD of the scores. As reliability increases, SEM and confidence intervals get smaller. 11/29/23 9 Confidence Intervals Example of Confidence Interval: “Johnny’s FSIQ is 113 (between 108 and 118 with 95% confidence).” The SEM and confidence intervals remind us that scores are not perfect. 11/29/23 10 SEM – Standard Error of Measurement If a person's true score is 110 on a test with a standard error of measurement of 3.7 and a mean of 100, we would expect 95% of the person's test scores to fall within a) 102.75 - 117.25 b) 92.75 - 107.25 c) 90.75 - 120.25 d) 100-110 A mean of 100, a standard error of 3. 3.7 x 1.96= 7.25 Range is 92.75 and 107.25 11/29/23 11 Reliability and Validity A reliable test is NOT necessarily valid. A test can be reliable, not yet valid. 11/29/23 12 Reliability and Validity Measurement errors would decrease the correlatio between two tests, X and Y (in other words, the validity of predictions.) ‘correction for attenuation’ A method of estimating the true correlation between X and Y given the correlation between two unreliable measures of X and Y is by using the correction for attenuation. 11/29/23 13 Reliability and Validity If the reliability of tests are increased, the validity of tests would also be expected to increase. he aim is to increase the correlation between two tes 11/29/23 14 Reliability and Validity Example (a) shows what an unreliable test would look like. Example (b) shows what a reliable but invalid test would look like. It is similar to a rifle that has its sights mis-aligned. The high degree of reliability is shown by the consistency of the strikes. The lack of validity is shown by the fact that the missiles are missing their target, the bullseye. For example, a job satisfaction test given to unskilled workers may measure literacy skills rather than job satisfaction if the test is written in complex language. In psychometric terms, the test is not measuring what it was intended to measure. Example (c) is what a valid and reliable test would look like: the 11/29/23 15 missiles hit the mark and they hit it consistently. Special Issues Speed test vs. power tests Speed:A test in which items are trivially easy 60 seconds for a 100-item test. Power: 20-item test with no time limit. 11/29/23 16 Special Issues Speed test vs. power tests A pure speed test should have an odd-even split-half reliability of about 1.0 The most useful method of assessing the reliability of highly speeded tests is the test-retest method. A participant may be slow in the speed test and cannot finish all the questions on time. Some test items may be poorly constructed, but not responded by some participants. 11/29/23 17 Selecting a Reliability Coefficient If a test is to be administered multiple times: Test-Retest Reliability Tests to be administered one time: Homogeneous content – coefficient alpha Heterogeneous content – split-half coefficient 11/29/23 18 How Reliable Should Tests Be? A lower level of reliability are acceptable when tests are used for preliminary rather than final decisions. 11/29/23 19 Reliability of Composite and Difference Scores Composite scores When scores are combined to form a composite For example, IQs are typically composite scores The reliability of composite scores is typically better than the individual scores in composite 11/29/23 20 Reliability of Composite and Difference Scores Difference scores Involves calculating the difference between two scores The reliability of difference scores is typically lower than the individual scores 11/29/23 21