Concepts of Reliability (PDF)
Document Details
Tags
Summary
This document discusses the concepts of reliability in measurement, focusing on classical test theory and various methods for estimating it. It explores the sources of error and how they affect test scores. The document also highlights different types of reliability, such as test-retest and internal consistency, and how to interpret reliability coefficients.
Full Transcript
THE CONCEPTS OF RELIABILITY Reliability Classical Test Theory MAIN POINTS Sources of Error Variance Methods of Estimating Reliability RELIABILITY is a quality of test scores that suggests they are sufficiently consistent and free from measurement...
THE CONCEPTS OF RELIABILITY Reliability Classical Test Theory MAIN POINTS Sources of Error Variance Methods of Estimating Reliability RELIABILITY is a quality of test scores that suggests they are sufficiently consistent and free from measurement errors to be useful Measurement error is any fluctuation in scores To be useful, do test scores need to be totally error-free? Measurement Error Random error/Noise caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process Systematic Error typically constant or proportionate to what is presumed to be true value of the variable SOURCES OF MEASUREMENT ERROR Intrinsic Error Variance Test Construction Item Selection Extrinsic Error Variance Test Administration Test-taker Variables Examiner Influence CLASSICAL THEORY AND THE SOURCES OF MEASUREMENT ERROR Also called the theory of true and error scores. Charles Spearman laid down the foundation for the theory The basic point of the classical theory is the idea that test scores result from the influences of two factors: True Variance and Error Variance Classical Test Theory Formula X= T+e Errors in Where: X= obtained score Measurement T= True variance e= measurement error E= X-T TWO FACTORS THAT AFFECT TEST SCORES TRUE VARIANCE ERROR VARIANCE Factors that Factors that contribute contribute to the to the inconsistency consistency Also known as Stable Attributes measurement error Methods of Estimating Reliability 1. Reliability Coefficient- when two sets of measures are obtained and correlated. 2. Standard Error of Measurement- reliability in terms of estimated deviance of a set of obtained scores from the true score CORRELATION COEFFICIENT AS A RELIABILITY COEFFICIENT The correlation Coefficient is used to gauge the consistency of psychological test scores Two broad groups: 1. Temporal stability approaches 2. Internal consistency approaches TYPES OF RELIABILTY 1. RELIABILITY AS TEMPORAL STABILITY Stability of test scores over a period of time Involves administering identical tests twice to the same sample group scores may be somewhat higher the second time because of practice, maturation, schooling, or other intervening variables TYPES OF RELIABILTY 1. RELIABILITY AS TEMPORAL STABILITY a. Test-Retest Reliability -repeating the identical test on a second occasion -rtt is the correlation between the 2 scores obtained by the same person on the 2 testing occasions -higher reliability, the less susceptible the score is to the random daily changes in testing/taker condition; coefficient of stability (Cronbach) a. Test-Retest Reliability (cntd...) -the interval between test-retest should rarely exceed six months. The recommended interval is 2 weeks b. Alternate-Forms and Parallel-Forms Reliability Estimates -The same persons are tested with one form on the first occasion and with another equivalent form -Reliability coefficient is called the coefficient of equivalence Parallel-Forms VS Alternate-Forms the test means and Simply different versions variances of observed of a test. test scores are equal. do not meet the In theory, the means of requirements for a scores obtained in parallel-form parallel form same covered content correlates equally with and difficulty level the true score alternate forms reliability parallel forms reliability Parallel forms reliabilty- the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each of the test, the means and variance of observed test scores are equal Alternate forms reliability- estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error RELIABILITY AS INTERNAL CONSISTENCY The reliability of a scale based on the degree of within-scale item correlation Split-Half Reliability Kuder-Richardson and Coefficient of Alpha Inter-Scorer/Inter-Rater Reliability a. Split-Half Reliability A one-test session The test is divided into equivalent halves Provides a measure of consistency with regard to content sampling Coefficient of internal consistency Since Pearson r is only the correlation between two halves of a test, then the half-test reliability correlation coefficient can be adjusted for a whole test. The Spearman-Brown formula provides this adjustment. a. Split-Half Reliability (cntd...) The Spearman-Brown formula: In using this formula, both halves should be parallel forms - variance must be the same. a. Split-Half Reliability (cntd...) Rulon-Flanagan Formula only requires the variance of the difference between each person's scores on the two half-test Use Rulon-Flanagan if SDs are not the same a. Split-Half Reliability (cntd...) Guttman Formula It does not require real equivalency of both halves Derived from Rulon- Flannagan and Spearman- Brown Formula b. Kuder-Richardson and Alpha Coefficient This inter-item consistency is influenced by two error variances; (1) content sampling, (2)heterogeneity of the behavior domain sampled Higher the homogeneity of the domain, the higher the inter-item consistency Kuder-Richardson (KR20) The most common procedure for finding inter- item consistency. This technique is based on an examination of performance on each item. Used when there is a right or wrong answer Coefficient Alpha Used when there are no right or wrong answers Horst's modification Allows the computation of inter-item consistency even when the items have different levels of difficulty. Inter-Scorer Reliability/Inter- rater Reliability Used for tests with some level of subjectivity in scoring. Cohenn’s Kappa (2 raters) Fleiss’ Kappa (3 or more raters) What is the accepted reliability range for: Research? Clinical use? Using and Interpreting a coefficient of Reliability HOMOGENEITY HETEROGENEITY High Internal Consistency Low Internal Consistency DYNAMIC STATIC Use Internal Consistency Use test-retest or alternate forms RESTRICTED RANGE INFLATED RANGE Lower Reliability Higher Reliability Using and Interpreting a coefficient of Reliability HOMOGENEITY HETEROGENEITY High Internal Consistency Low Internal Consistency DYNAMIC STATIC Use Internal Consistency Use test-retest or alternate forms RESTRICTED RANGE INFLATED RANGE Lower Reliability Higher Reliability