Measuring Reliability - Chapter 2, Part 2

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following best describes the purpose of Internal Consistency Reliability assessment?

  • To evaluate the stability of measurements across different time points.
  • To check the reliability of a composite score and identify problematic items. (correct)
  • To assess the correlation between two different versions of the same measurement tool.
  • To measure the degree to which different raters or observers give consistent estimates of the same phenomenon.

In the context of Internal Consistency Reliability, what does a high correlation between two items in a questionnaire suggest?

  • They measure completely different constructs.
  • There is a high level of disagreement among respondents.
  • They measure very similar things, potentially indicating redundancy. (correct)
  • One of the items is not reliable.

When evaluating Internal Consistency Reliability using Cronbach's alpha, what is generally considered a good alpha level?

  • Below 0.5
  • At least 0.8 (correct)
  • Between 0.2 and 0.4
  • Exactly 1.0

In the context of assessing reliability, what is the primary use of the Kappa statistic?

<p>To measure inter-rater agreement for categorical measures. (C)</p> Signup and view all the answers

What does a Kappa value of 0 indicate?

<p>Agreement is equivalent to chance. (B)</p> Signup and view all the answers

Interclass Correlation is most suitable for?

<p>Measuring the agreement between interval or ratio variables. (C)</p> Signup and view all the answers

Under what circumstances would it be more appropriate to use ICC rather than simple correlation?

<p>When the goal is to assess agreement between measurements where absolute agreement is important. (A)</p> Signup and view all the answers

When conducting a test-retest reliability analysis, why might a researcher choose to measure 'only once' and use only that single measure as the score, according to the material?

<p>Because in real-world applications, the measurement is typically taken only once; this approach reflects that scenario. (B)</p> Signup and view all the answers

How does increasing the number of items on a questionnaire typically affect Cronbach's alpha, assuming the new items correlate well with the existing ones?

<p>It can increase Cronbach's alpha up to a point, after which it may level off or decrease. (A)</p> Signup and view all the answers

If a researcher finds that removing a particular item from a scale increases the overall Cronbach's alpha, what does this suggest about the item?

<p>The item is not consistent with the other items in the scale. (B)</p> Signup and view all the answers

What is the potential consequence of dropping an unreliable item from a questionnaire, as described in the content?

<p>It may affect the coverage of the questionnaire and content validity. (D)</p> Signup and view all the answers

When is it acceptable to have a relatively lower reliability coefficient (RC)?

<p>If the tool will identify those with poor knowledge (public health setting) then to send for training, relatively low RC might be acceptable. (D)</p> Signup and view all the answers

If the test-retest coefficient is not acceptable and you want to identify the items responsible for the Inconsistency, what can you do?

<p>Do test-retest for each item score (item1 from test vs item1 from retest). (D)</p> Signup and view all the answers

Why is it important to examine the inter-item correlation matrix when assessing internal consistency?

<p>To identify items that are highly correlated with each other, suggesting potential redundancy. (B)</p> Signup and view all the answers

In the context of a questionnaire, what does 'content validity' refer to?

<p>Whether the questionnaire comprehensively covers the concept it aims to measure. (D)</p> Signup and view all the answers

When calculating inter-rater reliability, Cohen's Kappa is often used. What type of data is most appropriate for this statistic?

<p>Categorical data (D)</p> Signup and view all the answers

What does a 'negative' Kappa value indicate about the agreement between raters?

<p>The level of agreement is less than would be expected by chance. (D)</p> Signup and view all the answers

What is the primary reason for preferring Intraclass Correlation Coefficient (ICC) over Pearson correlation when assessing the reliability of measurements?

<p>ICC assesses both the degree of correlation and agreement in absolute values, whereas Pearson correlation only measures the degree of linear relationship. (A)</p> Signup and view all the answers

In the context of Intraclass Correlation (ICC), what is the key difference between a 'single measure ICC' and an 'average measures ICC'?

<p>Single measure ICC assesses the reliability of a single measurement, whereas average measures ICC assesses the reliability of the average of multiple measurements. (B)</p> Signup and view all the answers

How is the choice between using a 'one-way' versus a 'two-way' random effects model in Intraclass Correlation (ICC) determined?

<p>By whether the raters are considered a random sample from a larger population or are the only raters of interest (D)</p> Signup and view all the answers

When should a 'two-way mixed' model be used for Intraclass Correlation?

<p>When respondents are considered 'random' and the interviewers are fixed. (C)</p> Signup and view all the answers

Compared to situations with linear measurements using a caliper, why is it often more difficult to achieve high reliability (e.g., ≥ 0.95) with scores derived from questionnaires?

<p>Questionnaire scores are 'soft measures' that capture subjective constructs, while caliper measurements capture objective physical dimensions. (C)</p> Signup and view all the answers

Under what circumstances is it ethically imperative to report the 95% Confidence Interval (CI) of the Intraclass Correlation Coefficient (ICC)?

<p>If the study is a validation study with a reasonable sample size. (A)</p> Signup and view all the answers

For categorical measures, what is an important consideration?

<p>Repeated measures (at least twice) are often necessary when evaluating categorical measures. (D)</p> Signup and view all the answers

Flashcards

Internal Consistency Reliability Assessment

Assess the reliability of a composite score by identifying problematic items.

Cronbach's Alpha

A measure of internal consistency, indicating how well items in a set measure a single unidemensional latent construct.

Intraclass Correlation Coefficient (ICC)

A statistical measure indicating how much scores within a group resemble each other.

Kappa Statistic

A statistical measure of inter-rater agreement for categorical data, correcting for agreement occurring by chance.

Signup and view all the flashcards

Content Validity

The degree to which an instrument measures what it is supposed to measure.

Signup and view all the flashcards

Test-Retest Reliability

Repeating measurements and comparing results.

Signup and view all the flashcards

Inter-rater Reliability

A type of reliability analysis that measures the extent to which data collected by different raters or observers are consistent.

Signup and view all the flashcards

Item-Total Correlation

Examine the correlation between each item and the total score. Removing items with low correlation might increase reliability.

Signup and view all the flashcards

Acceptable Reliability Coefficient

Linear measurements e.g. done by caliper, which makes acceptable RC high, due to it being easy to achieve.

Signup and view all the flashcards

Study Notes

  • Measuring Reliability - Chapter 2, Part 2

Contents

  • Internal Consistency Reliability (Cronbach's Alpha)
  • Intraclass Correlation
  • Kappa (agreement measure)

Internal Consistency Reliability

  • Internal consistency reliability checks the reliability of a 'composite score.'
  • It identifies 'items' that may have problems forming the composite score.
  • Vector concept of the data includes explained variance and negative results for the data reliability.
  • When taking a questionnaire, as an example QOL, there can be 2 domains or scales: Physical Function (PF) & Mental Health (MH).
  • There could be 4 items for each. For PF: PF1, PF2, PF3, PF4 and for MH: MH1, MH2, MH3, MH4
  • Items should behave consistently, as shown in figures demonstrating correlation.
  • A possible problem for PF4 is that questions like "ask about you can go wedding ceremony in your neighboring house or not?" can influence results.
  • If person A is stronger than person B, but person B is happier, person A could get higher scores in PF1 to 3, while person B gets a higher score in PF4.
  • Those who score high in PF1 may not necessarily score high in PF4.
  • PF1 to 3 might be correlated, but may not be with PF4.
  • Item-total correlation can be corrected
  • Internal Consistency Reliability Coefficient

Internal Consistency RA (Procedure)

  • Use PF-MH.sav: Analyze >>> Scale >>> Reliability Analysis.
  • Move all PF Items. You can list item labels, with a model alpha.
  • Click Statistics to get descriptives for Item and Scale, and Inter-Item Correlations and Covariances.
  • Cronbach's Alpha (if .869), Cronbach's Alpha Based on Standardized Items (.869), 10 Number of items.
  • View scale mean if the item has been deleted, variance, corrected item-total correlation, squared multiple correlation, and Cronbach's Alpha if the item is deleted.
  • Good alpha is >= 0.8.
  • Shorter questionnaires are better to get cooperation from respondents, providing an opportunity to reduce the number of items.
  • High correlation between 2 items means they measure similar things, suggesting one item could be dropped if desired.
  • Check the 'inter-item correlation matrix' table to find highly correlated items.
  • If r is the highest between pf07 & pf08 (0.71), consider dropping one.
  • Check Mental Health scale reliability
  • If the alpha is not favorable, it could be due to mh5. If, when mh5 is removed, the aplha becomes 0.719, drop the item.
  • Analysis identifies problem items.
  • For PF scale, consider dropping some to improve alpha as long as if not affect content.
  • You cannot drop items based on only the statistics; statistics only indicate the problem.

Internal Consistency RA - Scenarios

  • Scenario 1: Removing mh5 affects the content validity of the questionnaire; mh5 cannot be dropped.
  • Researcher must improve mh5 (sources of errors, ambiguous words, double-barreled phrases, jargon, technical terms, etc.).
  • Scenario 2: Removing mh5 does not affect the overall coverage of the questionnaire.
  • Researchers are happy with remaining items; mh5 can be dropped.

Intra-class Correlation (Procedure)

  • Measurements include single measures
  • We repeat 2 times to conduct this test-retest analysis.
  • In real studies, we measure 'only once' and this 'single measure' is used as the score.
  • If both measures are single, use single measure ICC
  • If we measure twice and take average of them, the reliability is 'average measures'
  • Intraclass correlation coefficient.
  • Model: One-Way Random, Confidence interval: 95%
  • Use inter-intv test.sav: Analyze >>> Scale >>> Reliability Analysis.
  • Move both scores and click statistics. Two Way Mixed analysis is used.

Intra-class Correlation - Models

  • In inter-interviewer and inter-observer scenarios, use a two-way model.
  • If tested interviewers will conduct the study (only tested 2), then the interviewers are 'fixed'. However, respondents are random. Therefore, use 'two-way mixed'.
  • If interviewers are a random sample of real interviewers then the 'two-way random' model should be applied.

Why ICC? Why not Simple Correlation?

  • Simple correlation gives 'perfect correlation', i.e., 1.0.
  • ICC gives less than 1, even with correlation.
  • ICC will give 'perfect correlation' (i.e., 1.0) only if the test and retest are the same (equal).

Categorical Measures Procedures & Statistics

  • Categorical measures (e.g., screening for depression) need replicated analysis (test-retest), with >= 2 interviewers.
  • Use 'kappa' analysis in such situations for 'absolute agreement' like 'ICC'.
  • Our data, kappa.sav, has >=2 Interviewers. In both situations, several workers will be needed to determine the 'excessive job stress' or not.
  • Use kappa.sav: Analyze >> Descriptive Statistics >> Crosstabs.
  • Click Statistics >> Click 'Kappa', and then click Cell >> Click percentages 'total'
  • Kappa statistic is 0.46, there are 23 cases agree between >=2 Interviewers out of total 30 cases.
  • If Kappa is "zero", the agreement is only at the level expected by chance.
  • If Kappa is negative, the level of agreement is less than by chance.
  • Kappa: 0-0.2=slight; 0.2-0.4=fair; 0.4-0.6=moderate; 0.6-0.8=substantial; >0.8=almost perfect

Reliability Coefficient Considerations

  • It is difficult to put one setting for acceptable reliability coefficient (RC).
  • Linear measurements on face using caliper should be as high as 0.95 or higher.
  • Score derived from questionnaire (soft measures) makes it difficult to achieve 0.7.
  • Scale application influences acceptable RC.
  • Clinical screening needs higher RC for identifying people wth high stress, or depression.
  • Tools identifying those with poor knowledge for training, relatively low RC might be acceptable.
  • If the questionnaire (soft measures) is concerned: Cronbach's alpha of >=0.8, or >=0.7. Corrected item-total correlation of 2.4 or higher is acceptable.
  • ICC of 0.7 level would be acceptable.
  • Test-retest would show total/composite score.
  • Do test-pretest for each item score can Identify items responsible for the inconsistency.
  • SPSS gives 95% CI of ICC. If from pilot study, the CI is wide and not realistic because of small sample size.
  • If study itself is a validation study with reasonable sample size, present the 95% CI of the ICC.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser