Validity & Reliability in Mental Health Research PDF
Document Details
Uploaded by WorldFamousZombie1045
UCL
Tags
Summary
This document discusses the core principles of validity and reliability in mental health research. It outlines the importance of accurate measurement and explores psychometric properties of questionnaires and rating scales, including inter-rater and test-retest reliability. The document also briefly touches upon internal consistency and different types of validity.
Full Transcript
**[Core Principles In Mental Health Research ]** **[Testing the Validity & Reliability of Measures in Mental Health Research ]** [Learning Outcomes:] - For student to be familiar with the criteria for assessing questionnaires and rating scales in mental health research. - For students to...
**[Core Principles In Mental Health Research ]** **[Testing the Validity & Reliability of Measures in Mental Health Research ]** [Learning Outcomes:] - For student to be familiar with the criteria for assessing questionnaires and rating scales in mental health research. - For students to understand the main types of reliability and validity. - For students to be able to outline plans for investigating the psychometric properties of questionnaires and rating scales. [Importance of accurate measurement in mental health research:] - Ensuring were talking referring to the same people - UK-US diagnostic project (1971) where found that US psychiatrists had a much broader concept of schizophrenia - Generally, no lab tests for mental illnesses - Research relies on being able to measure symptoms, functioning etc., through questionnaire, rating scales - Many of our interventions make a relatively small difference - Important to be able to measure this accurately [Main Psychometric Properties of Measures:] - Reliability - Are measurements replicable and consistent - Validity - Is it measuring what it's supposed to be measuring? - Feasibility and Acceptability - Is it realistically possible to administer, is it burdensome or intrusive? - Sensitivity to Change/Responsiveness - Are changes assessed as clinically significant/subjectively important detected? - Appropriate Scaling - Does it tend to produce floor or ceiling effects, where most people score very low or very high - Relevance - Does what's measured matter **[Reliability: ]** - Inter-rater reliability - Test-retest reliability - Internal consistency - Parallel forms reliability [Inter-Rater Reliability:] - Agreement between 2+ raters/observers - Applies to rating scaled where an independent interviewer/observer makes a judgement - **E.g., The Brief Psychiatric Rating Scale (BPRS),** where trained observers rate the severity of 24 different types of symptoms based on a structures interview with a patient - **Cohen's Kappa:** measure of agreement considering how much agreement you'd expect by change (Kappa \> 0.75 -- excellent inter-rater reliability) \*no clear cutoff (0.6 also decent) [Test-Retest Reliability:] - Replication of measurements over time to see how stable the responses are - Circumstances of testing ideally constant - Correlation Coefficient **(R-Value)** - In general *r*-values are considered good if *r* \>- 0.7: but need to consider how stable you expect phenomenon to be [Deciding when to repeat measures for test-retest reliability: ] - Need to balance potential that the construct measured has changed (e.g., symptoms have improved) V.S. - If repeat too soon, ma be influences by remembering test or by getting feedback - 2-weeks best practice but timeframe can depend on how much you expect the construct you are measuring to change over time [Internal Consistency:] - Do scores for individuals tend to be consistent, suggesting items are measuring the same thing? - Applicable with a single underlying construct, not so much with a battery of items in a questionnaire - May have multiple internally consistent sub-scales within a measure - **Cronbach's Alpha**: measures internal consistency among a group of items combined to form a single scale - Reflects the extent to which items are inter-related - Interpret like a correlation coefficient (\>0.70 is good) [Other Methods of Assessing if items in a scale all measure the same thing: ] **Not very common** Measure internal consistency but Cronbachs alpha better - Parallel forms reliability: Two versions of questionnaire developed: how well do results from them agree? - Like split-half reliability: whereby you randomly split an instrument that you wish to use as a single measure into two halves -- how strongly do they correlate? [Lack of Reliability in practice:] - May result from imprecise measurement instruments OR - From poor rater training/performance (do raters really have the intended training) - Training research team to achieve good reliability generally part of setting up a research study with rating scales - Instruments may also seem unreliable because of fluctuations in what's being measured **[Validity:]** Internal Validity: whether the effects observed in a study are due to the manipulation of the independent variable and not some other confounding factor Can be improved by: - Controlling extraneous - Using standardized instructions - Counterbalancing - Eliminating demand characteristics and investigator effects - Face validity - Content Validity - Criterion Validity - Concurrent Validity - Predictive Validity - Construct Validity - Convergent and divergent validity - Structural validity Two main types of Validity: 1. Content Validity: extent to which a measure represents all aspects of the intended content domain 1. Criterion Validity: performance of measure based on correlation with a known external criterion [Face Validity:] - Does it look as though it measures the right thing? May ask a range of people their views - Possibly the most basic and simplest form of validity - Less formalised than a full assessment of content validity [Content Validity:] - Examined the extent to which the concepts of interest are comprehensively represented by the items in the questionnaire - More in-depth and structured process than for face validity of ensuring concepts of interest represented - Good content validity achieved through a comprehensive development process e.g., - Literature review aimed at a full conceptual understanding - Discussions with experts - Consultation with stakeholders e.g., service users and carers - Reviewing the way a scale was developed helps assess content validity [Criterion Validity:] - How far does the measure agree with other relevant measures or outcomes? - Comes in two forms 1. Concurrent validity: agreement with other that have already been validated - Usually using a "gold-standard" from the field 2. Predictive validity: how far the measure predicts relevant outcomes [Construct Validity:] - How meaningful the measure is when it's in practical use - Does it perform as you'd expect it to If it's really measuring the intended construct? - Convergent and Divergent Validity: are the measures relationships with other relevant concepts as you'd expect them to be if the underlying construct is real? - Structural Validity: does the measure behave statistically as you'd expect from your ideas about what construct or constructs are being captured? (most often assessed with factor analysis [Group Task 1:] Answer: Convergent Reliability [How to assess other forms: ] - Divergent (Discriminant) Validity: test if engagement measure has a correlation with something theoretically unrelated to work engagement. - Internal Consistency: checking if all the items intended to measure engagement are highly correlated with each other by calculating Cronbach's Alpha - Content Validity: ensure that the measure taps into all dimensions of engagement such as enthusiasm, involvement, and commitment by engaging experts - Concurrent Validity: comparing the new engagement measure's results with another established measure of engagement [Group Task 2:] Answer: A/B/C [The Context of Reliability and Validity:] - Culture may affect whether a test is reliable and valid - E.g., a measure of disability that asks about doing domestic tasks may not be valid where not culturally normal for men - Instruments should really be validated in each new culture where used (but practicalities often limit and many instruments developed in multicultural setting e.g., London) - Translation: needs to be a careful process, including back-translation, some validation work in new culture