Podcast
Questions and Answers
Which of the following best describes the purpose of Internal Consistency Reliability assessment?
Which of the following best describes the purpose of Internal Consistency Reliability assessment?
- To evaluate the stability of measurements across different time points.
- To check the reliability of a composite score and identify problematic items. (correct)
- To assess the correlation between two different versions of the same measurement tool.
- To measure the degree to which different raters or observers give consistent estimates of the same phenomenon.
In the context of Internal Consistency Reliability, what does a high correlation between two items in a questionnaire suggest?
In the context of Internal Consistency Reliability, what does a high correlation between two items in a questionnaire suggest?
- They measure completely different constructs.
- There is a high level of disagreement among respondents.
- They measure very similar things, potentially indicating redundancy. (correct)
- One of the items is not reliable.
When evaluating Internal Consistency Reliability using Cronbach's alpha, what is generally considered a good alpha level?
When evaluating Internal Consistency Reliability using Cronbach's alpha, what is generally considered a good alpha level?
- Below 0.5
- At least 0.8 (correct)
- Between 0.2 and 0.4
- Exactly 1.0
In the context of assessing reliability, what is the primary use of the Kappa statistic?
In the context of assessing reliability, what is the primary use of the Kappa statistic?
What does a Kappa value of 0 indicate?
What does a Kappa value of 0 indicate?
Interclass Correlation is most suitable for?
Interclass Correlation is most suitable for?
Under what circumstances would it be more appropriate to use ICC rather than simple correlation?
Under what circumstances would it be more appropriate to use ICC rather than simple correlation?
When conducting a test-retest reliability analysis, why might a researcher choose to measure 'only once' and use only that single measure as the score, according to the material?
When conducting a test-retest reliability analysis, why might a researcher choose to measure 'only once' and use only that single measure as the score, according to the material?
How does increasing the number of items on a questionnaire typically affect Cronbach's alpha, assuming the new items correlate well with the existing ones?
How does increasing the number of items on a questionnaire typically affect Cronbach's alpha, assuming the new items correlate well with the existing ones?
If a researcher finds that removing a particular item from a scale increases the overall Cronbach's alpha, what does this suggest about the item?
If a researcher finds that removing a particular item from a scale increases the overall Cronbach's alpha, what does this suggest about the item?
What is the potential consequence of dropping an unreliable item from a questionnaire, as described in the content?
What is the potential consequence of dropping an unreliable item from a questionnaire, as described in the content?
When is it acceptable to have a relatively lower reliability coefficient (RC)?
When is it acceptable to have a relatively lower reliability coefficient (RC)?
If the test-retest coefficient is not acceptable and you want to identify the items responsible for the Inconsistency, what can you do?
If the test-retest coefficient is not acceptable and you want to identify the items responsible for the Inconsistency, what can you do?
Why is it important to examine the inter-item correlation matrix when assessing internal consistency?
Why is it important to examine the inter-item correlation matrix when assessing internal consistency?
In the context of a questionnaire, what does 'content validity' refer to?
In the context of a questionnaire, what does 'content validity' refer to?
When calculating inter-rater reliability, Cohen's Kappa is often used. What type of data is most appropriate for this statistic?
When calculating inter-rater reliability, Cohen's Kappa is often used. What type of data is most appropriate for this statistic?
What does a 'negative' Kappa value indicate about the agreement between raters?
What does a 'negative' Kappa value indicate about the agreement between raters?
What is the primary reason for preferring Intraclass Correlation Coefficient (ICC) over Pearson correlation when assessing the reliability of measurements?
What is the primary reason for preferring Intraclass Correlation Coefficient (ICC) over Pearson correlation when assessing the reliability of measurements?
In the context of Intraclass Correlation (ICC), what is the key difference between a 'single measure ICC' and an 'average measures ICC'?
In the context of Intraclass Correlation (ICC), what is the key difference between a 'single measure ICC' and an 'average measures ICC'?
How is the choice between using a 'one-way' versus a 'two-way' random effects model in Intraclass Correlation (ICC) determined?
How is the choice between using a 'one-way' versus a 'two-way' random effects model in Intraclass Correlation (ICC) determined?
When should a 'two-way mixed' model be used for Intraclass Correlation?
When should a 'two-way mixed' model be used for Intraclass Correlation?
Compared to situations with linear measurements using a caliper, why is it often more difficult to achieve high reliability (e.g., ≥ 0.95) with scores derived from questionnaires?
Compared to situations with linear measurements using a caliper, why is it often more difficult to achieve high reliability (e.g., ≥ 0.95) with scores derived from questionnaires?
Under what circumstances is it ethically imperative to report the 95% Confidence Interval (CI) of the Intraclass Correlation Coefficient (ICC)?
Under what circumstances is it ethically imperative to report the 95% Confidence Interval (CI) of the Intraclass Correlation Coefficient (ICC)?
For categorical measures, what is an important consideration?
For categorical measures, what is an important consideration?
Flashcards
Internal Consistency Reliability Assessment
Internal Consistency Reliability Assessment
Assess the reliability of a composite score by identifying problematic items.
Cronbach's Alpha
Cronbach's Alpha
A measure of internal consistency, indicating how well items in a set measure a single unidemensional latent construct.
Intraclass Correlation Coefficient (ICC)
Intraclass Correlation Coefficient (ICC)
A statistical measure indicating how much scores within a group resemble each other.
Kappa Statistic
Kappa Statistic
Signup and view all the flashcards
Content Validity
Content Validity
Signup and view all the flashcards
Test-Retest Reliability
Test-Retest Reliability
Signup and view all the flashcards
Inter-rater Reliability
Inter-rater Reliability
Signup and view all the flashcards
Item-Total Correlation
Item-Total Correlation
Signup and view all the flashcards
Acceptable Reliability Coefficient
Acceptable Reliability Coefficient
Signup and view all the flashcards
Study Notes
- Measuring Reliability - Chapter 2, Part 2
Contents
- Internal Consistency Reliability (Cronbach's Alpha)
- Intraclass Correlation
- Kappa (agreement measure)
Internal Consistency Reliability
- Internal consistency reliability checks the reliability of a 'composite score.'
- It identifies 'items' that may have problems forming the composite score.
- Vector concept of the data includes explained variance and negative results for the data reliability.
- When taking a questionnaire, as an example QOL, there can be 2 domains or scales: Physical Function (PF) & Mental Health (MH).
- There could be 4 items for each. For PF: PF1, PF2, PF3, PF4 and for MH: MH1, MH2, MH3, MH4
- Items should behave consistently, as shown in figures demonstrating correlation.
- A possible problem for PF4 is that questions like "ask about you can go wedding ceremony in your neighboring house or not?" can influence results.
- If person A is stronger than person B, but person B is happier, person A could get higher scores in PF1 to 3, while person B gets a higher score in PF4.
- Those who score high in PF1 may not necessarily score high in PF4.
- PF1 to 3 might be correlated, but may not be with PF4.
- Item-total correlation can be corrected
- Internal Consistency Reliability Coefficient
Internal Consistency RA (Procedure)
- Use PF-MH.sav: Analyze >>> Scale >>> Reliability Analysis.
- Move all PF Items. You can list item labels, with a model alpha.
- Click Statistics to get descriptives for Item and Scale, and Inter-Item Correlations and Covariances.
- Cronbach's Alpha (if .869), Cronbach's Alpha Based on Standardized Items (.869), 10 Number of items.
- View scale mean if the item has been deleted, variance, corrected item-total correlation, squared multiple correlation, and Cronbach's Alpha if the item is deleted.
- Good alpha is >= 0.8.
- Shorter questionnaires are better to get cooperation from respondents, providing an opportunity to reduce the number of items.
- High correlation between 2 items means they measure similar things, suggesting one item could be dropped if desired.
- Check the 'inter-item correlation matrix' table to find highly correlated items.
- If r is the highest between pf07 & pf08 (0.71), consider dropping one.
- Check Mental Health scale reliability
- If the alpha is not favorable, it could be due to mh5. If, when mh5 is removed, the aplha becomes 0.719, drop the item.
- Analysis identifies problem items.
- For PF scale, consider dropping some to improve alpha as long as if not affect content.
- You cannot drop items based on only the statistics; statistics only indicate the problem.
Internal Consistency RA - Scenarios
- Scenario 1: Removing mh5 affects the content validity of the questionnaire; mh5 cannot be dropped.
- Researcher must improve mh5 (sources of errors, ambiguous words, double-barreled phrases, jargon, technical terms, etc.).
- Scenario 2: Removing mh5 does not affect the overall coverage of the questionnaire.
- Researchers are happy with remaining items; mh5 can be dropped.
Intra-class Correlation (Procedure)
- Measurements include single measures
- We repeat 2 times to conduct this test-retest analysis.
- In real studies, we measure 'only once' and this 'single measure' is used as the score.
- If both measures are single, use single measure ICC
- If we measure twice and take average of them, the reliability is 'average measures'
- Intraclass correlation coefficient.
- Model: One-Way Random, Confidence interval: 95%
- Use inter-intv test.sav: Analyze >>> Scale >>> Reliability Analysis.
- Move both scores and click statistics. Two Way Mixed analysis is used.
Intra-class Correlation - Models
- In inter-interviewer and inter-observer scenarios, use a two-way model.
- If tested interviewers will conduct the study (only tested 2), then the interviewers are 'fixed'. However, respondents are random. Therefore, use 'two-way mixed'.
- If interviewers are a random sample of real interviewers then the 'two-way random' model should be applied.
Why ICC? Why not Simple Correlation?
- Simple correlation gives 'perfect correlation', i.e., 1.0.
- ICC gives less than 1, even with correlation.
- ICC will give 'perfect correlation' (i.e., 1.0) only if the test and retest are the same (equal).
Categorical Measures Procedures & Statistics
- Categorical measures (e.g., screening for depression) need replicated analysis (test-retest), with >= 2 interviewers.
- Use 'kappa' analysis in such situations for 'absolute agreement' like 'ICC'.
- Our data, kappa.sav, has >=2 Interviewers. In both situations, several workers will be needed to determine the 'excessive job stress' or not.
- Use kappa.sav: Analyze >> Descriptive Statistics >> Crosstabs.
- Click Statistics >> Click 'Kappa', and then click Cell >> Click percentages 'total'
- Kappa statistic is 0.46, there are 23 cases agree between >=2 Interviewers out of total 30 cases.
- If Kappa is "zero", the agreement is only at the level expected by chance.
- If Kappa is negative, the level of agreement is less than by chance.
- Kappa: 0-0.2=slight; 0.2-0.4=fair; 0.4-0.6=moderate; 0.6-0.8=substantial; >0.8=almost perfect
Reliability Coefficient Considerations
- It is difficult to put one setting for acceptable reliability coefficient (RC).
- Linear measurements on face using caliper should be as high as 0.95 or higher.
- Score derived from questionnaire (soft measures) makes it difficult to achieve 0.7.
- Scale application influences acceptable RC.
- Clinical screening needs higher RC for identifying people wth high stress, or depression.
- Tools identifying those with poor knowledge for training, relatively low RC might be acceptable.
- If the questionnaire (soft measures) is concerned: Cronbach's alpha of >=0.8, or >=0.7. Corrected item-total correlation of 2.4 or higher is acceptable.
- ICC of 0.7 level would be acceptable.
- Test-retest would show total/composite score.
- Do test-pretest for each item score can Identify items responsible for the inconsistency.
- SPSS gives 95% CI of ICC. If from pilot study, the CI is wide and not realistic because of small sample size.
- If study itself is a validation study with reasonable sample size, present the 95% CI of the ICC.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.