ITRIP Lecture 4 - Validity and Threats to Validity 2024 PDF
Document Details
Uploaded by Jordynoco
La Trobe University
2024
Dr Melanie Murphy
Tags
Related
- ITRIP Lecture 5 - Experimental Research Design 2024 PDF
- Psychological Science & Global Health Equity Day 1 Slides PDF
- Psychology: Thinking Critically With Psychological Science PDF
- Chapter 1: Introducing Psychological Science PDF
- Chapter 2: Reading and Evaluating Scientific Research PDF
- امتحان الشهر 1 دكتوراه بحث علمي PDF
Summary
This ITRIP lecture from 2024 covers validity and threats to validity in psychological science research. It explores concepts like internal and external validity, experimental design, and potential biases. The lecture features readings on learning statistics and research methods in psychology.
Full Transcript
SCIENTIFIC FOUNDATIONS OF PSYCHOLOGICAL SCIENCE Lecture 4: Validity and Threats to Validity Dr Melanie Murphy [email protected] Reading: Navarro DJ and Foxcroft DR (2022). learning statistics with jamovi: a tutorial for psychology students and other beginners. (Version 0.75). Section 2.6 - Ass...
SCIENTIFIC FOUNDATIONS OF PSYCHOLOGICAL SCIENCE Lecture 4: Validity and Threats to Validity Dr Melanie Murphy [email protected] Reading: Navarro DJ and Foxcroft DR (2022). learning statistics with jamovi: a tutorial for psychology students and other beginners. (Version 0.75). Section 2.6 - Assessing the validity of a study Optional Reding: Howitt, D., & Cramer, D (2011). Introduction to research methods in psychology. Pearson/Prentice Hall. (pp. 266-279) What is Validity? The extent to which the score ‘behaves’ as expected from theory. A Theory K N C B F X Y VALIDITY A researcher needs to keep validity in mind when designing a study so they can eliminate as many of the potential biases as possible – not doing so could make interpreting the results difficult. Types of validity can fall into three broad categories: Measurement validity, Internal validity and External validity. INTERNAL AND EXTERNAL VALIDITY Validity of a research design (and how much we can trust the findings reported) depends on our confidence that: The outcome is due to the manipulation/measures we used. internal validity: X (and X alone) leads to Y That what we found applies to the real world external validity, or generalizability: the x y findings apply to the (outside) population INTERNAL VALIDITY EXPERIMENTAL DESIGN Advantages of between-subjects design intact groups (e.g., gender) irreversible changes (e.g., learning) treatments (A, B) with carryover effects time-consuming treatments (e.g., therapy) Advantages of within-subjects design large individual differences expected (e.g., attitudes, weight, psychological health) studying change short duration experiments (e.g., perception) small sample size HIGH INTERNAL VALIDITY Maximising hypothesized systematic variance e.g., stronger manipulation Eliminating other possible systematic variance Or, if we cannot, balancing these effects between groups e.g., using equal number of each gender in the groups Minimising error variance (i.e., random errors, measurement errors, standardised testing procedure) Threats To Internal Validity INTERNAL VALIDITY § Internal validity can be compromised by a number of factors : ̶ Events affecting participants during data collection. ̶ ̶ ̶ ̶ ̶ Bias in allocating participants to groups. o Can be avoided by using RCT Long term changes in participant's response. o An issue for longitudinal studies Interaction between testing and participant's refusal to continue. o Maybe there are characteristic differences between the people who stay in a study and those who drop out Effect of differences in testing conditions or procedures. o Should always try to use the same testing situations and measures/equipment. Experimenter expectancy Just as testing can affect performance, the participant can be influenced by their perception of the experimenter's expectations. SELECTION BIAS (Threat) Definition non-random factors responsible for participants being in one group or condition and not another e.g., self selection bias, assignment bias Remedy random sampling or assignment of participants to groups REGRESSION TO THE MEAN (Threat) Definition tendency for extreme scores on one occasion to be closer to the mean on another occasion Remedy control group (with same characteristics) e.g., top versus bottom quartile groups getting more‘average’over time THREATS TO INTERNAL VALIDITY Maturation Definition: changes within participants over an (usually) extended period of time E.g. fatigue, practice (computer tasks), people getting better in long-term therapy Remedy: control group History Definition: concurrent events happening between pretest and posttest E.g. Major socio-political event Remedy: control group, control over test environment, monitoring confounds Testing Definition: aspects of testing (not hypothesized) that affects the participant’s response e.g., placebo effect (case of the Hawthorne studies) Remedy: more equivalent control group/condition, counterbalancing order of treatment THREATS TO INTERNAL VALIDITY Mortality Definition: loss (attrition) of participants that is related to the treatment e.g., death, loss of interest, avoidance (addiction programs) Remedy: control group, matched controls on factors related to, subject attrition Instrument change Definition: changes in the instrument (or observer) over repeated testing e.g., observer bias, effect of the pretest itself (evaluation of racial tolerance), change in scoring criteria (essay marking) Remedy: double-blind (participant, observer) Experimenter Bias Definition: contamination of participant’s response by experimenter Remedy: triple-blind (participant, observer, tester) e.g., standardised testing of groups (case of Clever Hans) Use a control group and random assignment ENSURING HIGH INTERNAL VALIDITY selection bias regression to the mean history maturation with special control testing (e.g., sham operation) with covariate pretesting and yoking mortality (e.g., motivation) with double-blind testing instrument change (e.g., scoring criteria) experimenter bias (e.g., drug testing) INTERNAL AND EXTERNAL VALIDITY Validity of a research design (and how much we can trust the findings reported) depends on our confidence that: The outcome is due to the manipulation/measures we used. internal validity: X (and X alone) leads to Y That what we found applies to the real world external validity, or generalizability: the x y findings apply to the (outside) population EXTERNAL VALIDITY: ! How well can the results of the study be applied (generalised) to other similar situations with different people at different times? In other words - To what populations, settings, treatment variables and measurement variables can this effect be generalized? To maximise External Validity, a study a researcher should ask the following How representative are our participants? Are they a good snapshot of the wider population of interest? Ecological Validity We need evidence that use of the test is valid for its context (e.g. can it be generalized to real life situations in practice?) Is this test valid? On the whole, I am satisfied with myself. At times I think I am no good at all. I feel that I have a number of good qualities. I am able to do things as well as most other people. I feel I do not have much to be proud of. I feel that I’m a person of worth, at least on an equal plane with others. I certainly feel useless at times. I wish I could have more respect for myself. All in all, I am inclined to feel that I am a failure. I take a positive attitude toward myself. [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] (examine items to determine face and content validity) [] [] [] [] [] KINDS OF VALIDITY Face validity ‘looks’ like a measure of self-esteem? Content validity items covers various aspects of self-esteem? Construct validity convergent validity (i.e. correlates with other measures of self-esteem) discriminant validity (i.e. does not correlate with measures of a different trait, such as IQ) known groups validity (i.e. distinguishes between groups that have previously been shown to differ) Convergent Validity Convergent validity is demonstrated by moderate to high correlations between measures of the same trait (e.g., two measures of self-esteem, or self-efficacy) Convergent (shared) measurement Self-Esteem Convergent Validity (cont.) Note that scores from similar methods are used to measure a trait tend to be correlated better than scores from dissimilar methods. For example, Beck Anxiety Questionnaire would correlate higher with Spielberger Anxiety Questionnaire than with physiological (heart rate, finger tremor) measures of anxiety because both are questionnaires. Correlations Spielberger Anxiety Q Heart Rate Beck Anxiety Q High Moderate Finger Tremor Moderate Moderately High Discriminant Validity Discriminant validity is demonstrated by low correlations between measures of different traits (e.g., self-esteem and memory, aggression and mindfulness), even when the methods employed are the same (e.g., both pencil-andpaper questionnaires). Researchers/clinicians need to be sure they are not measuring something else ! EXTERNAL VALIDITY: How representative are our variables? Are the variables being tested the best ones to examine in order to answer the question? Do they cover all bases? How representative is our test situation? It the situation where data is being collected suitable for gaining realistic measures that could be generalised? How stable are the above factors over time (how enduring is the finding)? Can this design be replicated by someone else and still get the same results? ! HIGH EXTERNAL VALIDITY Depends on how representative your sample of: participants (to population) X (to domain/concept of X) Y (to domain/concept of Y) test situation (to all situations) Threats To External Validity Representative sample of participants Random selection is rare due to ethics and logistics. e.g., self selection, volunteers Remedy adequate description of the sample replication with other samples (e.g., another culture, age group) Representative sample of Xs and Ys Accurate and complete operationalisation of a concept is rare due to artificial and multifactorial nature of the people and behaviours we measure, ethics and logistics e.g., anxiety, restricted range Remedy adequate description of all variables possible replication with other operationalisations (e.g., do physiological representations of anxiety produce the same results?) Representative sample of situations Behaviour is often dependent on context e.g., laboratory versus natural settings Remedy adequate description of test situation replication with other situations When designing an experiment, a researcher must always keep in mind the balance between Internal and External Validity. It can be a delicate balance EXPERIMENTAL VS CORRELATIONAL RESEARCH Correlational Research Higher external validity Uses of a group sampled with varying amounts of X and Y Looking at the relationship between measures Experimental Research Higher internal validity Uses a (homogenous) sample of participants who are randomly allocated to groups of treatments (X), including a control treatment, and tested on Y Manipulation of X to study its effect on Y, while keeping other factors constant. POINTS (3 points) Concepts in Action Montreal Cognitive Assessment MoCA [ [ ] ] [ ] [ ] [ /5 ] https://www.nytimes.com/article/trump-cognitivetest.html [ ] [ ] [ ] Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I.,... & Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society [ ] /2 [ ] [ ] ] [ [ ] /3 /1 [ [ ] /1 /2 ] ] /3 /2 ] /5 [ ] X2 [ © Z. Nasreddine MD : [ ] ] [ ] [ [ X3 ] [ [ ] [ ] (MIS) [ FBACMNAAJKLBAFAKDEAAAJAMOFAAB [ ] Memory Index Score (MIS) https://www.mocatest.org [ ] ] [ ] [ ] [ ] [ ] /6 www.mocatest.org TOTAL / 30 ASSESSING RELIABILITY AND VALIDITY FOR THE MOCA Reliability Validity Face validity Content validity Known groups validity The items look like they assess cognitive skills Test re-test reliability Tested participants approx. 30 days after 1st assessment. (r =.92) Parallelforms Different but equivalent tasks to test within each domain Internal consistency Large Cronbach’s alpha (.83) The items cover a range of different skills associated with cognitive ability Shown to distinguish between different conditions (MCI and AD from control) Convergent validity High degree of agreement with similar, previously validated and accepted cognitive assessment (MMSE) Discriminant validity Not specifically reported in paper for other constructs, but pattern of results suggests a degree of discriminant validity for diagnostic classification Ecological Validity Works well in hospital/allied health settings and in different languages Inter-rater reliability Training required for administration and scoring MINI-MENTAL STATE EXAMINATION (MMSE) RCD.9999.0087.0001 7. Show pencil. Ask: What is this called? Standardised Mini-Mental State Examination (SMMSE) Please see accompanying guidelines for administration and scoring instructions best you can. /1 /1 /1 /1 What country are we in? (accept exact answer only) /1 What state are we in? (accept exact answer only) /1 What city/town are we in? (accept exact answer only) /1 What is the street address of this house? (accept street name and house number or equivalent in rural areas) /1 What is the name of this building? (accept exact name of institution only)/1 e) What room are we in? (accept exact answer only) /1 What floor of the building are we on? (accept exact answer only) /1 Time: ------------------------------------------------------------------------------ ---------- 10. Hand the person a pencil and paper. Say: Write any complete sentence on that piece of paper (allow 30 seconds. Score one point. The sentence must make sense. Ignore spelling errors). /1 11. Place design (see page 3), pencil, eraser and paper in front of the person. Say: Copy this design please. Allow multiple tries. /1 Wait until the person is finished and hands it back. Score one point for a correctly copied diagram. The person must have drawn a four-sided figure between two five-sided figures. Maximum time: one minute. 3. Say: I am going to name three objects. When I am finished, I want you to repeat them. Remember what they are because I am going to ask you to name them again in a few minutes (say slowly at approximately one-second intervals). 12. Ask the person if he is right or left handed. Take a piece of paper, hold it up in front of the person and say the following: Take this paper in your right/left hand (whichever is non-dominant), fold the paper in half once with both hands and put the paper down on the floor. Car Man For repeated use: Bell, jar, fan; bill, tar, can; bull, bar, pan Say: Please repeat the three items for me (score one point for each correct reply on the first Takes paper in correct hand_________ /3 Allow 20 seconds for reply; if the person did not repeat all three, repeat until they are learned or up to a maximum of five times (but only score first attempt) 4. Say: Spell the word WORLD (you may help the person to spell the word correctly). Say: Now spell it backwards please (allow 30 seconds; if the person cannot spell world even with assistance, score zero). Refer to accompanying guide for scoring instructions (score on reverse of this sheet) /5 /1 Folds it in half___________ /1 Puts it on the floor________ /1 TOTAL TEST SCORE: ADJUSTED SCORE: /30 / The SMMSE tool and guidelines are provided for use in Australia by the Independent Hospital Pricing Authority under a licence agreement with the copyright owner, Dr D. William Molloy. The SMMSE Guidelines for administration and scoring instructions and the SMMSE tool must not be used outside Australia without the written consent of Dr D. William Molloy. /3 (score one point for each correct answer regardless of order; allow ten seconds) 6. Show wristwatch. Ask: What is this called? /1 Then, hand the person the sheet with CLOSE YOUR EYES (score on reverse of this sheet) on it. If the subject just reads and does not close eyes, you may repeat: Read the words on this page and then do what it says, a maximum of three times. See point number three in Directions for Administration section of accompanying guidelines. Allow ten seconds; score one point only if the person closes their eyes. The person does not have to read aloud. a) b) c) d) 5. Say: Now what were the three objects I asked you to remember? /1 9. Say: Read the words on this page and then do what it says /1 2. Allow ten seconds for each reply. Say: attempt) 8. Say: I would like you to repeat a phrase after me: No ifs, ands, or buts (allow ten seconds for response. Score one point for a correct repetition. Must be exact, e.g. no ifs or buts, score zero) 1. Allow ten seconds for each reply. Say: Ball /1 (score one point for correct response; accept ‘pencil’ only; score zero for pen; allow ten seconds for reply) Say: I am going to ask you some questions and give you some problems to solve. Please try to answer as a) What year is this? (accept exact answer only) b) What season is this? (during the last week of the old season or first week of a new season, accept either) c) What month is this? (on the first day of a new month or the last day of the previous month, accept either) d) What is today’s date? (accept previous or next date) e) What day of the week is this? (accept exact answer only) RCD.9999.0087.0003 RCD.9999.0087.0002 /1 CLOSE YOUR EYES Molloy DW, Alemayehu E, Roberts R. Reliability of a standardized Mini-Mental State Examination compared with the traditional Mini-Mental state Examination. American Journal of Psychiatry, Vol. 14, 1991a, pp.102-105. (score one point for correct response; accept ‘wristwatch’ or ‘watch’; do not accept ‘clock’ or ‘time’, etc.; allow ten seconds) 1 3 2 LET’S LOOK AT SOME RESEARCH OUTCOMES The top figure compares the average score of participants administered the MMSE and MoCA NC = controls. They have highest scores of both assessments MCI = Mild cognitive impairment. There is a smaller difference between these scores for the MMSE compared to the MoCA This is why the authors argue the MoCA is more sensitive for identifying this condition AD = Alzheimer’s Disease This group shows the lowest scores on both, but the greatest difference on the MoCA* The bottom figure looks at the relationship between scores on the MMSE and MoCA for the different groups (conditions) Another way of showing that there is more overlap between control and MCI scores for the MMSE compared to the MoCA (looking for clusters of dots with triangles), whereas the AD has more scores at the lower end of the scale (looking for squares) Importantly, if we consider approx. 25 as typical cognitive ability, this graph shows that the MMSE is more likely to classify someone already known to have MCI or AD (less severe) within the range of typical function. An indication of the ‘sensitivity and specificity’ of the scales Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I.,... & Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society SUMMARY Measurement Error Reliability Validity Arises due to discrepancies between the construct we intend to measure and how well we actually capture it. The extent to which a score is consistent (i.e., reproducible) across time and between observers The extent to which the score is consistent with theoretical expectations about how the construct should behave Summary Research design is a balance between internal validity and external validity. Internal Validity Extent to which X (and X alone) is associated with the observed change in Y stricter experimental control External Validity Extent to which the X Y relationship can be generalised beyond the study more natural representations NEXT WEEK Experimental Research