Questionnaire Design & Assessing Psychometric Properties PDF
Document Details
Uploaded by AmenableHurdyGurdy5261
University College London
Alice Wichersham
Tags
Summary
This document is a session 2 handout on questionnaire design and assessing psychometric properties. It covers principles of designing questionnaires, assessment of psychometric properties, and includes key concepts and notes for a class prep. The document is suitable for undergraduate mental health research methods.
Full Transcript
Session 2: Fri 27th Sep Created @August 10, 2024 9:48 AM Tags Questionnaire Design & Assessing Psychometric Properties: Teacher: Alice Wichersham To be covered: Principles of designing questionnaires and research instrum...
Session 2: Fri 27th Sep Created @August 10, 2024 9:48 AM Tags Questionnaire Design & Assessing Psychometric Properties: Teacher: Alice Wichersham To be covered: Principles of designing questionnaires and research instruments, Assessment of Psychometric Properties Reading list: PSBS0002: Core Principles of Mental Health Research | University College London (talis.com) Class Prep: NOTES: Selecting, designing, and developing your questionnaire Questionnaires offer an objective way of collecting information about knowledge, beliefs, attitudes, and behaviour. They can be used as the sole research instrument (e.g. cross sectional survey) or within clinical trials or epidemiological studies. What info are you trying to collect? It’s advised to use qualitative methods, like focus groups, as initial steps when lacking familiarity with the research area or specific population subgroup. Or if unable to predict range of possible responses / if relevant details are not available in existing literature. It’s important to explore and map key areas before conducting more detailed studies - helps to gain insights and identify important topics for further investigation. Is the questionnaire appropriate? Session 2: Fri 27th Sep 1 Participants must be able to give meaningful answers (with help from a professional interviewer if necessary). Questionnaires are sometimes used for questions that may require different methods. A questionnaire may be suitable primarily when employed in a mixed-methods study, such as to enhance and quantify results from an initial exploratory phase. What is Validity in Psychology Administering, analysing, and reporting your questionnaire How to design a questionnaire Video: Question types & piloting Exploratory factor analysis: 5-step guide To Do: Read: Selecting, Designing and Developing your Questionnaire Read: What is Validity in Psychology Read: How to Design and Questionnaire Watch: Question Types & Piloting - Part 1 of 3 on Questionnaire Design Read: Exploratory Factors Analysis: A Five-Step Guide for Novices Read: Administering, Analysing, and Reporting your Questionnaire Notes of prep: Notes: Outcomes: Be aware of the ways in which self-report scales/questionnaires are used in MH research Understand how to write good questions and items for scales/questionnaires Understand how scales and questionnaires are piloted Designing Questionnaires and Scales in Mental Health Research Session 2: Fri 27th Sep 2 What are questionnaires and scales? Aimed at reliably quantifying experiences, characteristics, behaviours or attitudes. Questionnaires: Any instrument for eliciting information in a standard way, may incl. closed and open-ended questions. Closed Questions: Pros: Cons: standardised boredom/fatigue efficiency Scales: Measuring a specific construct/s - need to show items hang together theoretically and statistically to measure the construct in question. Pros: Cons: Why use fixed-response scales and questionnaires? Validated by research with known properties, developed to measure a clear construct. Can be more objective/standardised than a free form assessment: Same questions, same way, every time - no variation in how something is asked Can be mailed, delivered over the internet or left with the person (or you go through questions in person) - efficient But can be found constraining, unnatural, miss out subjectively important issues. Can be frustrating as pts unable to elaborate on answers/responses What are scales used for in mental health research? Uses include: Diagnosis and symptom identification Measuring psychological processes (e.g. metacognition) Measuring outcomes (e.g. quality of life, satisfaction with treatment) Overcoming communication problems (e.g. visual analogue e.g. pain scale) Session 2: Fri 27th Sep 3 Same measures may be used in clinical practice - though often professionals other than clinical psychologists don’t use structured tools - however used across mental health. Formats Administration: Self-report: you complete the questionnaire yourself Interviewer-administered: where a clinician or researcher goes through questions with you Informant-based: where someone who knows the person completes the questionnaire or is interviewed - e.g. parent or carer. Format: Paper or online. Examples of standardised questionnaires and scales in mental health: BDI - Beck Depression Inventory (self-report scale for severity of depression symptoms) PHQ-9 - Patient Health Questionnaire (to screen and measure severity of depression) GAD-7 - Generalised Anxiety Disorder (screen for and assess the severity of GAD) HADS - Hospital Anxiety and Depression Scale (screening measure of anxiety and depression symptoms) For many constructs in mental health, there are usually already a scale or tool created. Thinking of developing your own measure? Not advised Most constructs already have measures Development, validation & piloting of instruments is lengthy Invalid measures are hard to publish - difficult to know validity Adapting an existing instrument may be useful approach BUT often need to write some questions in a study - important skill Development Process Identify construct(s) to be measured in a scale/scope of a questionnaire Literature review - to check no other existing questionnaire/scale Consult stakeholders iteratively Session 2: Fri 27th Sep 4 Write questions and select response type Pilot - refine cycle Reliability and validity study Refine and study further if necessary Identify Target - what is being measured? Why is there a need to develop a standardised questionnaire or scale? Can you clearly define the scope of what you’re aiming to measure in your questionnaire or scale? Are there new research questions that you will be able to address? Especially for scales: can construct to be measured be clearly defined and can you describe its theoretical basis? Literature Review - to confirm no existing questionnaire Is there another validated questionnaire or scale that already measures this? - often there is. Are existing measures appropriate and acceptable to the population you want to test? - e.g. for children/adolescents (consider reading level) Could it be adapted? (e.g. removing inappropriate questions - such as q’s about self-harm and suicide for young participants) If there is no existing scale, do you have a good enough understanding of the concept or phenomenon/enough resources to be able to begin developing your own? Stakeholders Inputs Very important to have input from people with relevant (lived) experience e.g. service users, clinicians - with what you’re trying to measure or the condition you’re researching. Gather their views before starting e.g. through qualitative interviews. Make sure concepts and language are meaningful to them. Make sure it covers what they feel is most important. Go back and forth with them to help refine - maybe through co-production process. Key Considerations Session 2: Fri 27th Sep 5 Coverage: will you be able to make the case that your questionnaire covers important aspects of the construct to be measured? Stigma: E.g. impact of labels like “personality disorder” - review with those with lived experience to understand most appropriate language. Relevance: E.g. What concepts are relevant to the user? Is it age-appropriate? Is it culturally appropriate? What about people with intellectual disabilities? Length, layout, technical performance and ease of navigations: Many new pitfalls with advent of online questionnaires - consider usability and platforms used (e.g. allowing to skip questions or navigate the questionnaire dependent of answers chosen - ensure constant review) Neutrality Avoid questions with an implicit premise built in. This could mean leading questions or descriptions which suggest a value judgement. Leading Question: E.g. Do you think your life could be improved by being more accepting of the present moment? Value Judgement: E.g. How often do you break down and cry? Clarity and Precision Questions should be comprehensible to the full range of typical readers. Get specialist advice for groups like kids and people with intellectual disability. Tools are available to identify the age group/reading level of language used - to review if questions are appropriate. Avoid: Overly complex language (use plain English & check reading level) Psychological jargon Ambiguous descriptions or time frames Mixed questions (e.g. are you aware of and do you understand xyz?) Overlapping categories or answers on different dimensions Group Task Identify problems with the questionnaire items: Session 2: Fri 27th Sep 6 1. too vague (emotional problems) - ambiguous, non-specific frequency/timeframe & not the most clear scale of answers 2. Vague answers, used physiological answers 3. Leading question and could include more options for responses, mixed question 4. Stigmatised language (severe personality disorder), influence to negative response, limited scale of responses, leading question Framing and Introduction Friendly initial explanation of purpose and content of questionnaire. Instructions on completing questionnaire. How long it will take. Demographics - in the form of the UK Census categories where possible (if research based in the UK) - keep it consistent with national standards. Question Format Open questions (Qualitative analysis) - “What factors influence your referral decisions?” Closed questions (Quantitative analysis) - “Do you ever feel as if your own thoughts were being echoed back to you?” - discrete answers that likely can be coded Statement rating - “Unpleasant thoughts come into my mind against my will and I cannot get rid of them” Response Formats Likert and Likert-like Scales Typically and ordinal scale - consider the difference between each category Value entered Session 2: Fri 27th Sep 7 Visual Analogue Scale Pictorial Visual Scales - Useful when it makes a good visual analogy (like anger) Often but not exclusively used with children or with people with intellectual disabilities Yes/No - Single Choice — Can avoid ‘central tendency’ as people tend not to give middling ratings - limits amount of information pts can give. Some sources of Bias Boredom - repeated filling out of forms may reduce their validity Translation problems - some common psychological terms do not directly translate into other languages or cultures Session 2: Fri 27th Sep 8 Presentation biases - social desirability biases up to outright deception (faking ‘good’ and ‘bad’) Acquiescence bias - keep ticking yes/the same place on the page : reverse scoring may help Group Task - Question Writing Create and present some questions for a brief questionnaire or scale to measure the following (and justification): Visual Hallucinations 1. How often do you ‘see things that aren’t there’? a. Never b. Once a month or less c. 2-3 times a month d. 1-2 times a week e. Daily f. Multiple times a day 2. Which emotions do you feel before experiencing visual hallucinations? - choose all that apply a. Stress b. Sadness c. Anxiety d. Fear e. Happy f. Excitement g. Other: please specify 3. At what age did you start experiencing visual hallucinations? a. Free write answer Session 2: Fri 27th Sep 9 b. Unsure Consider how responses will be analysed. Piloting Pilot on participants who are representative of your sample, at least in later stages. Points to check: Are the instructions easy to follow? Is the wording easy to understand? Does the questionnaire collect the information you want? Are all the response choices appropriate or used? Do you get a wide range of responses? How long does it take? Pilot Process Generally multiple stages, with back and forth questionnaire refinement. May use interviews for feedback or focus groups. Explore acceptability and clarity Ask participants to explain items in their own words May involve a stakeholder group in iterative refinement of tool Validating the Scale The next step is to complete a study to show that your scale is: Valid - measures what it says it measures Reliable - does so reliably Testing the Validity and Reliability of Measures in Mental Health Research Outcomes: To be familiar with the criteria for assessing questionnaires and rating scales in MH setting. Session 2: Fri 27th Sep 10 To understand the main types of reliability and validity. To outline plans for investigating the psychometric properties of questionnaires and rating scales. The importance of accurate measurement in mental health research Importance of ensuring we’re talking about the same people: UK-US diagnostic project (1971) found US psychiatrists had much broader/different concept of Schizophrenia to UK Psychiatrists. Generally no lab tests for mental illnesses: Research relies on being able to measure symptoms, functioning etc. through questionnaires, rating scales. Many of out interventions make a relatively small difference: Important to be able to measure this accurately. Full development and testing of a new measure - a substantial research project. Avoid where possible! Main Psychometric Properties of Measures Reliability: are measurements replicable and consistent? Validity: is it really measuring what it’s supposed to measure? Feasibility and Acceptability: is it realistically possible to administer, is it burdensome or intrusive? Found out through piloting. Sensitivity to change/responsiveness: are changes assessed as clinically significant/subjectively important detected? Appropriate Scaling: does it tend to produce floor or ceiling effects, where most people score very low or very high? Relevance: does what’s measured matter? Reliability - are measurement tools producing replicable and consistent results? All types of reliability are not relevant to all measures e.g. a self-report questionnaire does not require inter-rater reliability. Forms of reliability: Inter-rater reliability - agreement between 2+ raters/observers. Applies to rating scales where an independent interviewer/observer makes a judgement. Mainly relevant if you have Session 2: Fri 27th Sep 11 informants or clinicians filling out forms based on observations. E.g. The Brief Psychiatric Rating Scale (BPRS) - where trained observers rate the severity of 24 different types of symptom based on a structured interview with a patient (e.g. 2 raters will assess one patient and see how similar their scores are). Cohen’s Kappa: Measure of agreement taking into account how much agreement you’d expect by chance. Kappa >0.75 - excellent inter-rater reliability Test-retest reliability - replication of measurements over time to see how stable the responses are, circumstances of testing are ideally constant. Correlation coefficient (r-value) - in general, r values are considered good if r ≥ 0.7: but need to consider how stable you expect phenomenon to be. Deciding when to repeat measures for test-retest reliability: Need to balance potential that the construct measured has changed (e.g. symptoms have improved) Vs If repeat too soon, may be influenced by remembering test or by getting feedback. Also consider how much you would expect the construct to change: symptoms change a lot, personality traits not so much Internal consistency - the degree to which items on a test or survey that are designed to measure the same construct, produce consistent results. Do scores for individuals tend to be consistent, suggesting items are measuring the same thing? Applicable with a single underlying constructing, not so much with a battery of items in a questionnaire. May have multiple internally consistent sub-scales within a measure. Cronbach’s Alpha: Measures internal consistency among a group of items combined to form a single scale Reflects the extent to which items all inter-related Interpret like a correlation coefficient (≥0.70 is good) Other methods of assessing if items in a scale all measure the the same thing: Session 2: Fri 27th Sep 12 Parallel forms reliability - two versions of questionnaire developed: how well do results from them agree? Similar to Split-half Reliability - whereby you randomly split and instrument that you wish to use as a single measure into two halves - e.g. split a questionnaire in half and compare how strongly do they correlate? Lack of reliability in practice May result from imprecise measurement instruments Or From poor rater training/performance (do raters really have the intended training, e.g. observer rated scale) — Training research team to achieve good reliability generally part of setting up a research study with rating scales Instruments may also seem unreliable because of fluctuations in what’s being measured Group task on reliability A, D and E Validity - are you really measuring what you aim to measure? Face validity - does it look as though it measures the right thing? May ask a range of people their views. E.g. A scale is being developed to measure resilience (ability to adapt to stress/trauma). To establish face validity, it’s sent to a number of researchers in this field and to trauma clinicians. They are asked to say whether they think it captures the concept of resilience. ?most basic/simplest form of validity Less formalised than a full assessment of content validity Content validity - examines the extent to which the concepts of interest are comprehensively represented by the items in the questionnaire. More in depth and structured process than for face validity of ensuring concepts of interest represented Good content validity achieved through a comprehensive development process: Literature review aimed at a full conceptual understanding Session 2: Fri 27th Sep 13 Discussions with experts Consultation with stakeholders e.g. service users and carers Reviewing the way a scale was developed helps assess content validity Criterion validity - how far does the measure agree with other relevant measures or outcomes? Concurrent - agreement with other measured that have already been validated. Usually using a ‘gold standard’ from the field. E.g. the association is examined between the new brief measures of resilience and a longer, well-validated measure generally acknowledged to be the gold standard. Predictive - how far the measure predicts relevant outcomes. E.g. in a prospective study, the association is tested between the new short measure of resilience and whether people become depressed after experiencing significant trauma or major physical health problems. Construct validity - a measure of how meaningful the measure is when it’s in practical use. Does it perform as expected it’s really measuring the intended construct? Convergent and divergent validity - are the measure’s relationships with other relevant concepts as you’d expect them to be if the underlying construct is real? Convergent: does the measure have associations as expected with measures of related concepts? - is present if it correlates strongly. E.g. literature on resilience suggests it’s associated with hope, confidence and well-being. Convergent validity of the new resilience scale could be tested by examining associations with measures of these. Discrimination (divergent): are associations lacking with measures of unrelated concepts? - is present if it correlates weakly. E.g. resilience is not thought to be associated with obsessional traits or with IQ. Could be examined on a new scale by testing for associations with these - if very strong, doubtful validity. Structural - does the measure behave statistically as you’d expect from your ideas about what construct or constructs are being captured? - e.g. most often assessed with factor analysis Session 2: Fri 27th Sep 14 Does a single main factor emerge from a scale, or does a scale consist of several different factors that measure related, but distinct concepts? Group tasks Convergent - two measures that are related to each other but not the same. A, B & C The relationship between reliability and validity Can you have one without the other? - you can! The context of reliability and validity Culture may affect whether a test is reliable and valid - consider audience E.g. a measure of disability that asks about doing domestic tasks may not be valid where not culturally normal for men. Instruments should really be validated in each new culture where used But practicalities often limit and many instruments developed in multicultural setting e.g. London Translation: needs to be careful process, including back-translation, some validation work in new culture Session 2: Fri 27th Sep 15