Understanding Reliability and True Scores

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What does the observed test score equation X = T + E represent?

  • Observed score equals the true score plus the error score. (correct)
  • Observed score equals the true score multiplied by the error score.
  • Observed score equals the true score divided by the error score.
  • Observed score equals the true score minus the error score.

Which of the following statements is true regarding the reliability coefficient 'r'?

  • The value of 'r' can be greater than 1.
  • An 'r' of 0 indicates perfect reliability.
  • An 'r' of 1 indicates perfect reliability. (correct)
  • An 'r' of 1 indicates no reliability.

What is the primary challenge associated with test-retest reliability?

  • The length of the test.
  • The cost of administering the test.
  • Changes in test-taker, environment, and testing conditions between administrations. (correct)
  • The complexity of the scoring process.

What does a high test-retest reliability coefficient suggest about a construct being measured?

<p>The construct is relatively stable over time. (C)</p>
Signup and view all the answers

Which type of reliability assessment involves administering two equivalent forms of the same measure to the same group?

<p>Alternate-form reliability (A)</p>
Signup and view all the answers

What is a key challenge associated with alternate-form reliability?

<p>It is expensive and time-consuming to develop two truly equivalent forms. (A)</p>
Signup and view all the answers

In split-half reliability, what statistical method is used to estimate the reliability of the full test?

<p>Spearman-Brown formula (C)</p>
Signup and view all the answers

Why is the reliability coefficient in split-half reliability often considered an underestimation?

<p>Because it is based on only half of the items. (C)</p>
Signup and view all the answers

What does inter-item consistency primarily assess?

<p>The degree to which different items on a test measure the same construct. (D)</p>
Signup and view all the answers

Which formula is used to compute inter-item consistency for measures with dichotomous responses (scored as 0 or 1)?

<p>Kuder-Richardson 20 (KR 20) formula (C)</p>
Signup and view all the answers

Which of the following is a significant challenge to inter-item consistency?

<p>Establishing other types of reliability is sometimes neglected as a result. (A)</p>
Signup and view all the answers

When evaluating inter-rater reliability, what does a high inter-scorer reliability coefficient indicate?

<p>High consistency of ratings among different raters. (D)</p>
Signup and view all the answers

Which of the following is a key consideration when interpreting the magnitude of a reliability coefficient?

<p>How the measure is being used. (A)</p>
Signup and view all the answers

What is the primary purpose of cross-validation in criterion-prediction procedures?

<p>To assess the stability of validity coefficients and correct for overestimation. (B)</p>
Signup and view all the answers

According to the provided text, what is the BEST description of a psychological construct?

<p>A theoretical concept that serves as a label for a set of behaviors that appear to go together. (A)</p>
Signup and view all the answers

Which of the following BEST describes content validity?

<p>Whether the content of the measure covers a representative sample of the behavior domain to be measured. (B)</p>
Signup and view all the answers

What does it mean for a test to have discriminant validity?

<p>It correlates minimally with measures of unrelated constructs. (C)</p>
Signup and view all the answers

What is the purpose of concurrent validity?

<p>To measure how accurately a test diagnoses current behaviors or characteristics. (D)</p>
Signup and view all the answers

What does 'shrinkage' in the context of validity refer to?

<p>The decrease in validity after cross-validation. (D)</p>
Signup and view all the answers

In the context of validity, what is the coefficient of determination ($r^2$) primarily used to assess?

<p>the amount of variance shared by two variables (C)</p>
Signup and view all the answers

Flashcards

Reliability Definition

The consistency of a measure.

Observed Test Score Equation

Observed score equals true score plus error score.

Reliability (R)

Ratio of true score variance to observed score variance.

Reliability Coefficient

A number ranging between 0 and 1 indicating reliability.

Signup and view all the flashcards

Test-Retest Reliability

Administering the same test twice to the same test-takers.

Signup and view all the flashcards

Alternate-Form Reliability

Administering two equivalent forms of the same measure.

Signup and view all the flashcards

Split-Half Reliability

Splitting a test into two equivalent halves.

Signup and view all the flashcards

Inter-Item Consistency

Consistency of results across items in a test.

Signup and view all the flashcards

Inter-Scorer (Rater) Consistency

Consistency of ratings between different raters.

Signup and view all the flashcards

Intra-Scorer (Rater) Consistency

When rater scores same test at different times.

Signup and view all the flashcards

Non-Response Errors

Self-selection and people not completing a test.

Signup and view all the flashcards

Response Bias

Responding systematically, creating a skewed picture.

Signup and view all the flashcards

Extremity Bias

When someone responds very positively or negatively.

Signup and view all the flashcards

Centrality Bias

Constantly choosing neutral response options.

Signup and view all the flashcards

Acquiescence Bias

Agreement with all statements.

Signup and view all the flashcards

Halo Effect

Influence based on attributes they rate or assess.

Signup and view all the flashcards

Social Desirability Bias

Respond in a way that is socially desirable.

Signup and view all the flashcards

Purposive Falsification

Providing factually incorrect responses on purpose.

Signup and view all the flashcards

Unconscious Misrepresentation

Giving incorrect answers unintentionally.

Signup and view all the flashcards

Validity

Refers to whether a measurement tool measures what it is supposed to measure.

Signup and view all the flashcards

Study Notes

Defining Reliability

  • Reliability refers to the consistency of a measure
  • Reliability assesses if a test measures the same attribute consistently each time
  • Reliability refers to whether results are similar when a test is administered repeatedly

True Score Concept

  • A single measure doesn't capture the true trait amount an individual possesses
  • Scores are affected by systematic or chance factors like emotional state, fatigue, and noise
  • A true score refers to a theoretical concept
  • A person's true score is generally unknown

Observed Test Score Equation

  • Observed test score = true score + error score
  • X = T + E, where X is the observed score, T is the true score, and E is the error score
  • T represents the proportion of true score, indicating the reliability of the measure
  • E represents the proportion of error score, also known as unexplained variance

Variance in Test Scores

  • Variance of observed test scores (X) is expressed in terms of true (T) and error (E) variance:
  • Sx² = St² + Se²
  • Reliability (R) is the ratio of true score variance to observed score variance
  • The formula is: R = S²t/ S²x = (S²x - S²e) / S²x
  • Reliability can be expressed as Observed Score Variance / True Score Variance
  • Variance is the average of squared deviations from the mean and is a squared measure itself

Numerical Example

  • If true score variance = 17 and error score variance = 3
  • Observed score variance = 20 (17 + 3)
  • The reliability of the measure = .85, calculated as 17/20 or (20-3)/17

Types of Reliability

  • Reliability of a test is indicated by the reliability coefficient, denoted by "r" -Reliability coefficient ranges between 0 and 1
  • r = 0 indicates no reliability
  • r = 1 indicates perfect reliability
  • Finding a test with perfect reliability is very rare
  • The higher the reliability coefficient, the more consistent the test scores

Test-Retest Reliability

  • Test entails administering the same test twice to the same test-takers
  • Intervals for retesting are usually around a month
  • Some constructs are more stable than others impacting reliability coefficients
  • Expect for a higher test-retest reliability coefficient on a reading test vs an anxiety test

Data Analysis for Test-Retest Reliability

  • Correlating scores from the first and second administration of the test
  • This results in reliability coefficient, also called coefficient of stability

Challenges to Test-Retest Reliability

  • Test-taker, environment, and testing conditions can change
  • Transfer effects like practice and memory can influence scores
  • Test-Retest reliability contributes to systematic error variance

Alternate-Form Reliability

  • Alternate-form reliability uses two equivalent forms of the same measure
  • These equivalent forms are administered to the same group twice

Data Analysis for Alternate-Form Reliability

  • Correlate the two sets of scores
  • Results in a reliability coefficient, also known as the coefficient of equivalence

Challenges to Alternate-Form Reliability

  • Creating two equivalent test forms can be time-consuming and expensive

Split-Half Reliability

  • Administer the test once
  • Split the test into two equivalent halves (e.g., odd/even numbered questions)

Split-Half Data Analysis

  • Correlate scores on the two halves using the Spearman-Brown formula
  • Corrected reliability coefficient (rtt) is calculated using the provided formula
  • This results in a reliability coefficient, also called the coefficient of internal consistency

Challenges to Split-Half Reliability

  • It can often underestimate reliability coefficient as a test score is based on only half the items

Inter-Item Consistency

  • Requires administering a test only once to a group of test-takers

Data Analysis for Inter-Item Consistency

  • Compute the coefficient of internal consistency
  • Kuder-Richardson 20 (KR 20) formula is applied for dichotomous questions
  • Coefficient Alpha (α) used for multiple response categories (non-dichotomous)

Variables in KR20

  • rtt = reliability coefficient
  • n = number of items in test
  • St2 = variance of total test score
  • p₁ = proportion of testees who answered item i correctly
  • q₁ = proportion of testees who answered item i incorrectly.

Variables in Coefficient Alpha

  • ⍺ = reliability coefficient
  • n = number of items in test
  • St2 = variance of total test score
  • Si2 = sum of individual item variances

Challenges to Inter-Item Consistency

  • It is popular due to only one administration being needed for the test
  • Relying mainly on this method may result in neglecting other types of reliability

Inter-Scorer (Rater) Consistency

  • Involves administering a test and having all test protocols scored/marked by two psychological-assessment practitioners

Data Analysis for Inter-Scorer Reliability

  • Scores from both practitioners are correlated
  • An inter-scorer reliability coefficient reflects the consistency of ratings among the raters

Formula variables for Inter Rater Consistency

  • ⍺ = inter-rater reliability coefficient
  • n = number of items or rating dimensions in test
  • St2 = variance on all the raters’ summative ratings (total scores)
  • Si2 = the sum of the raters’ variances across different rating dimensions (sub-scores)

Challenges for Inter-Scorer Reliability

  • The application is limited as it is mainly useful when scoring procedures are not highly standardized or when questions are open-ended

Intra-Scorer (Rater) Consistency

  • Administer test and have test protocols scored/marked twice by one psychological assessment practitioner

Data Analysis for Intra-Scorer Reliability

  • Correlate the two sets of scores for the same protocols
  • An intra-scorer reliability coefficient reflects the consistency of ratings for a single rater

Variables in Formula for Intra-Rater Reliability

  • ⍺ = intra-rater reliability coefficient
  • n = number of items or rating dimensions in test
  • St2 = variance of a rater’s scores on different individual’s summative ratings (total scores)
  • Si2 = the sum of the raters’ variances on each rating dimension (sub-scores) for different individuals being assessed

Challenges of Intra-Scorer Reliability

  • It is time-consuming for the rater to score/mark the same protocol twice
  • Errors may occur if the rater remembers the protocol and how it was scored before

Contemporary Approaches to Reliability

  • Cronbach expressed reservations about traditional reliability methods
  • Cronbach's alpha makes assumptions rarely true in applied settings
  • Tests are unlikely to measure only 1 construct
  • Emerging consensus: alternate reliability estimates needed, like random effects or CFA
  • Omega is a common alternative, easily calculated by statistical software
  • Reliability measures are changing rapidly

Factors Affecting Reliability

  • Random and systematic error can affect reliability
  • Systematic error arises from respondent/test-taker and administrative factors

Respondent/Test-Taker Error

  • Non-response/self-selection bias occurs when respondents do not complete tests
  • Response bias occurs when respondents respond systematically
  • The timing of a measure impacts its reliability
  • Variability in individual scores impacts reliability coefficient, such as range restriction
  • Ability level variability of test-takers can affect the reliability coefficient

Respondent Error: Response Bias Types

  • Extremity Bias: Test-taker responds very positively or negatively
  • Centrality Bias: Test-taker constantly opts for neutral response options
  • Stringency/Leniency Bias: Raters are very strict or lenient
  • Acquiescence Bias: Test-taker agrees with all statements/questions
  • Halo Effect: Respondents influenced by favorable/unfavorable attributes of what they rate
  • Social Desirability Bias: Test-taker wants to create a favorable impression
  • Purposive Falsification: Test-takers purposefully misrepresent facts
  • Unconscious Misrepresentation: Test-takers give incorrect answers unintentionally

Administrative Error

  • Variations in instructions must be consistent and standardized
  • Test manuals and instruction booklets are important
  • Variations in assessment conditions leads to deviations from 'standard conditions’
  • Instructions need to be understood similarly by all test-takers
  • Scoring/rating variations can result in outcomes that vary

Reliability Interpretation

  • Reliability is affected by many factors, no assessment is fully reliable
  • Reliability estimates vary by sample, thus specific measures are more useful

Magnitude of Reliability Coefficient

  • Interpretation depends on measure use
  • Standardized measures for individual decisions need reliability coefficient between 0.80 and 0.85

Standard Measurement Error

  • Standard Measurement Error (SME) is like a standard deviation and used to interpret test scores in reasonable limits It can be computed using a formula involving reliability

Mastery Assessment

  • Mastery tests differentiate those who mastered skills with those who have not
  • Usual correlation procedures are inappropriate and different techniques should be used

Finding Information on Test Score Reliability

  • Information on test score reliability can be found in published and unpublished sources
  • Can look in: databases, test manuals, academic journals, master’s dissertations and PhD theses

Defining Validity

  • Validity refers to the extent inferences can be made from test scores
  • Does the test accomplish the purpose for which it was designed?
  • Validity determines if the test measures what it intends to measure
  • Validity pertains to the accuracy of measurement and is specific to purpose and sample

Measurement Process Requirements

  • Understand the entity of the test, what the exact test purpose is
  • Know the exact nature of the measure, how well it can measure
  • Understand the rules on measuring the object

Three types of validity evidence

    1. Content-description (internal focus)
  • Face and content validity
    1. Construct-identification (internal focus)
  • Construct, factorial, convergent/discriminant validity
    1. Criterion-prediction (external focus)
  • Concurrent, predictive, known groups, incremental generalization and algorithm validity

Content-Description: Face Validity

  • Face validity looks like the test measures a construct
  • It is non-psychometric and non-statistical
  • Prospective test-takers/experts say if test 'look and feel' is good, e.g., words used, pics

Content Validity

  • Assesses whether the content covers a representative sample of the behavior domain

Psychological Construct

  • A theoretical concept which uses sets of behaviors that go together in nature
  • Examples can include intelligence, personality, memory, etc.

Construct Validity

  • Measures if the test measures well the theoretical construct, trait, or concept

How to measure Construct Validity

  • Use factor analysis (factorial validity)
  • Can include exploratory factor analyses & confirmatory factor analyses

Construct Validity Evaluation

  • Achieved when measure correlates strongly with relevant or similar variables (convergent validity)
  • Achieved when measure correlates minimally with differing or unrelated variables or traits (discriminant validity)

Construct Validity: Correlation with Other Tests

  • Test correlates well with similar and established tests

Construct Validity

  • Measure explains additional variance in predicting outcomes (incremental validity)
  • Measure distinguishes between different groups (differential validity)
  • Calculate correlation between test 'predictor' and job performance 'criterion'

Criterion Validity: Types

  • Concurrent Validity: Accurately diagnoses present behaviors or characteristics
  • Predictive Validity: Predicts future outcomes or behaviors
  • Determine whether test scores discriminate across groups with difference known in theory (known-groups validity)
  • Identify how well test explains variance in predicting a variable (incremental validity)

Possible Criterion Measures

  • Assess ability to predict performance via benchmark variable, e.g. job performance

Commong Criterion Measures

  • Academic Achievement; intelligence, aptitude
  • Performance in Specialized Training; aptitude
  • Job Performance : Key measure for validating intelligence and aptitude
  • Psychiatric Diagnoses; personality measure

Other Common Criterion Measures

  • Ratings from teachers, supervisors, etc.
  • Must train the raters
  • Use other valid tests

Validity Generalization

  • Establishing generalizability of measures validity given specific context/localized contexts
  • Requires statistical integration and analysis of previous studies’ data by Meta-analysis

Validity procedures using Cross-Validation

  • No measure operates perfectly after the first test
  • A refined version of the measure should be administered to a separate sample

Cross Validation Considerations

  • A refined version is developed after performing an item analysis in this process
  • The process requires the validity coefficients be be recalculated for the second sample
  • Cross validation leads to a decreased coefficient (shrinkage)
  • Identify and correct for spuriously high validity coe.

Algorithm validity

  • Predictive models about the relationship between scores and outcomes
  • Algorithms are related candidate assessments i.e. AI and machine learning (ML)
  • AI and ML must still abide by same validity standards, as legaly requires within industry in SA.

Unitary Validity

  • Must develop a body of evidence that evolves through multiple validation studies and on diverse groups
  • Validation process never happens with one study

Indices and Interpretation of Validity

  • Must calculate the validity coefficient + factors affecting it, coefficient of determination, standard error of estimation and predicting of the criterion

Magnitude of Validity: Interpretation

  • Must depend on the use of the measure and statistically be significant at standards of .05 and .01
  • Tests with Selection purposes and measures should score between .20–30 for test. i.e +or- 30

Factors Affecting Validity

  • Reliability
  • Differential impact of subgroups
  • Sample homogeneity
  • Linear relationship between predictor and criterion.
  • Criterion contamination
  • Moderator variables

Validity - Reliability

  • Validity is directly proportional to reliability.
  • Reliability thus has a has limitations and does not imply validity.
  • Therefore it is a sufficent condition for validity.

Validity - Subgroups

  • Validity must be consistent given the subgroups of a test
  • This Includes their age, gender, education, etc
  • Subgroup differences leads to different impact on the VC

Linear Relationship - Validity

  • Relationship must be linear
  • Must use person product
  • To determine if the technique is applied

Understanding the Coefficient of Determination

  • the square of the validity coefficient.
  • It shows how much of the variation in the criterion variable is explained by the test score
  • Also refers to the total variance shared within both variables

Visualizing Common Variance

  • Ppredictor Variable (Independent Variable) helps explain or predict a variables outcome
  • Criterion Variable (Dependent Variable refers that you are trying to predict or explain

Standard Error of Estimation and How to use it

  • Must interpret validity in terms of estimating score
  • In context to standard deviation Formula: SEest = Sy√1 - r²xy

Variables for Regression

  • Sy = SD for scale Y
  • r²xy= coefficient of determination for scales X and Y
  • To express acceptance through +- with the true score more than ~ 1.57

Predicting Criterion - Analysis

  • Requires some positive correlation
  • Obtain prediction through regressing the line

Regression Analysis and Equations

  • Regression involves a variables
  • Simple regression : use one var
  • Multiple regression : use two varaibles
  • Simple Equation : bX+a
  • Multiple equation : Y = b₁X₁ + b2X2 + b3X3 + bo
    • Y = predicted criterion score
    • X= predictors 1, 2, 3
    • b = weights of the respective predictors
    • bo = intercepts

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser