Psychology Module 5: Reliability

Reliability

Reliability refers to consistency in measurement, and is a crucial aspect of psychological assessment.

Concept of Reliability

In the language of psychometrics, reliability is an index of reliability, which is a proportion that indicates the ratio between the true score variance on a test and the total variance.
A score on an ability test is presumed to reflect not only the test-taker's true score on the ability being measured but also error.

Sources of Error Variance

Test construction: item sampling or content sampling can be a source of error variance.
Test administration: factors such as room temperature, lighting, and ventilation can influence the test-taker's attention or motivation.
Examiner-related variables: the examiner's physical appearance, demeanor, and nonverbal gestures can be sources of error variance.
Test scoring and interpretation: scorer differences can be a source of error variance, especially in subjective scoring.

Reliability Estimates

Test-Retest Reliability Estimates: a way of estimating the reliability of a measuring instrument is by using the same instrument to measure the same thing at two points in time.
Parallel-Forms and Alternate-Forms Reliability Estimates: the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability.

Key Concepts

True score variance refers to the proportion of the total variance attributed to true variance.
Error variance refers to the proportion of the total variance attributed to irrelevant, random sources.
Systematic source of error does not affect score consistency.
Variance from true differences is true variance, and variance from irrelevant, random sources is error variance.### Reliability Coefficients
Item sampling is an additional source of error variance in computing alternate- or parallel-forms reliability coefficient.
Developing alternate forms of tests can be time-consuming and expensive, but it minimizes the effect of memory for the content of a previously administered form of the test.

Internal Consistency Estimates of Reliability

Internal consistency estimate of reliability is an evaluation of the internal consistency of the test items.
Methods of obtaining internal consistency estimates of reliability include split-half estimates and Kuder-Richardson formulas.

Split-Half Reliability Estimates

Split-half reliability entails dividing the test into equivalent halves, correlating scores on the two halves, and adjusting using the Spearman-Brown formula.
Ways to split a test include randomly assigning items to one or the other half, assigning odd-numbered items to one half and even-numbered items to the other half, or dividing the test by content.

The Spearman-Brown Formula

The Spearman-Brown formula allows a test developer to estimate internal consistency reliability from a correlation of two halves of a test.
The formula is used to estimate the reliability of a whole test from the reliability of one half of a test.

Other Methods of Estimating Internal Consistency

Inter-item consistency refers to the degree of correlation among all the items on a scale.
Tests are said to be homogeneous if they contain items that measure a single trait, and heterogeneous if they measure more than one trait.

The Kuder-Richardson Formulas

The Kuder-Richardson formulas were developed by G. Frederic Kuder and M.W. Richardson to estimate reliability.
KR-20 is widely used for estimating inter-item consistency of dichotomous items, and KR-21 is an approximation of KR-20.

Coefficient Alpha

Coefficient alpha is the mean of all possible split-half correlations, corrected by the Spearman-Brown formula.
It is appropriate for use on tests containing nondichotomous items and is widely used as a measure of reliability.
Coefficient alpha is the preferred statistic for obtaining an estimate of internal consistency reliability.

Measures of Inter-Scorer Reliability

Inter-scorer reliability is the degree of agreement or consistency between two or more scorers with regard to a particular measure.
Measures of inter-scorer reliability include calculating a coefficient of correlation, referred to as a coefficient of inter-scorer reliability.

Using and Interpreting a Coefficient of Reliability

There are three approaches to the estimation of reliability: test-retest, alternate or parallel forms, and internal consistency estimates.
A reliability index published in a test manual may not be applicable to a new group of testtakers, and measures of reliability are subject to error.### Purpose of Reliability Coefficient
The reliability coefficient is a measure of how high a test score should be, depending on the purpose and importance of the decisions made based on the scores.
A test designed for multiple administrations over time should demonstrate reliability across time, and an estimate of test-retest reliability is desirable.

Nature of the Test

The nature of the test influences the choice of reliability coefficient.
Considerations include:
- Homogeneity or heterogeneity of test items
- Dynamic or static characteristics being measured
- Restricted or inflated range of test scores
- Speed or power test
- Criterion-referenced test

Homogeneity vs. Heterogeneity

Homogeneous test items: functionally uniform throughout, measuring one factor or ability, expected to have high internal consistency.
Heterogeneous test items: not uniformly measuring one factor or ability, internal consistency may be low.

Dynamic vs. Static Characteristics

Dynamic characteristics: ever-changing traits, states, or abilities, best measured by internal consistency.
Static characteristics: relatively unchanging traits, states, or abilities, best measured by test-retest or alternate-forms reliability.

Restriction or Inflation of Range

Restricted range of test scores: correlation coefficient tends to be lower.
Inflated range of test scores: correlation coefficient tends to be higher.

Speed vs. Power Tests

Speed tests: uniformly low difficulty, time limits allow test-takers to complete all items, reliability estimated using test-retest, alternate-forms, or split-half reliability.
Power tests: high difficulty, time limits allow test-takers to attempt all items, reliability estimated using test-retest, alternate-forms, or split-half reliability.

Criterion-Referenced Tests

Designed to measure mastery of specific skills or knowledge.
Scores interpreted in pass-fail terms, used for diagnostic and remedial purposes.

Alternatives to True Score Model

Domain sampling theory: estimates the extent to which specific sources of variation contribute to the test score.
Generalizability theory: examines how generalizable scores are across different testing situations.
Item response theory: models the probability of a person's performance based on their ability.

Reliability and Individual Scores

Standard error of measurement (SEM): estimates the precision of an observed test score, inverse relationship with reliability.
SEM used to estimate the true score and establish a confidence interval.

Standard Error of the Difference between Two Scores

Used to compare scores between tests, individuals, or both.
Essential to convert scores to the same scale when comparing across tests.
Formula for standard error of the difference between two scores: √(SEM1^2 + SEM2^2).

Psychology Module 5: Reliability

Choose a study mode

Podcast

Questions and Answers

What is a reliability coefficient in psychometrics?

What does the term reliability refer to?

Test-retest reliability is used to estimate the stability of a test over time.

Test-retest reliability is an estimate of reliability obtained by correlating pairs of scores from the same people on two different __________ of the same test.

What is one of the advantages of developing alternate or parallel forms of tests?

What method of estimating reliability refers to the degree of correlation among all the items on a scale?

What is the purpose of the reliability coefficient?

The _________ formula allows a test developer to estimate internal consistency reliability from a correlation of two halves of a test.

Coefficient alpha is only suitable for tests with dichotomous items.

Which characteristics influence the nature of a test? Select all that apply.

What is the standard error of measurement (SEM) used for?

What is the best estimate available of the individual's true score on a test?

What is the standard error of measurement used for?

What is a confidence interval in the context of test scores?

What does the standard error of the difference between two scores help determine?

What must be considered when comparing scores achieved on different tests?

Study Notes

Reliability

Concept of Reliability

Sources of Error Variance

Reliability Estimates

Key Concepts

Internal Consistency Estimates of Reliability

Split-Half Reliability Estimates

The Spearman-Brown Formula

Other Methods of Estimating Internal Consistency

The Kuder-Richardson Formulas

Coefficient Alpha

Measures of Inter-Scorer Reliability

Using and Interpreting a Coefficient of Reliability

Nature of the Test

Homogeneity vs. Heterogeneity

Dynamic vs. Static Characteristics

Restriction or Inflation of Range

Speed vs. Power Tests

Criterion-Referenced Tests

Alternatives to True Score Model

Reliability and Individual Scores

Standard Error of the Difference between Two Scores

Studying That Suits You

More Like This

Reliability and Validity in Psychology: Concepts and Importance

Measurement Instrument Reliability

AQA A-level Psychology Research Methods

Psychology Research Methods