Reliability & Validity Lecture PDF

Summary

This lecture discusses reliability and validity in measurement, covering different types of reliability like test-retest, internal consistency, and inter-rater reliability. It also introduces the concept of true score and random error, and their implications for assessing the reliability of a measure.

Full Transcript

Reliability & Validity Reliability and Validity of Measurement Validity – Is the measure measuring the intended construct? Reliability – Is the measure consistently producing similar results under the same situation? True Score Theory Theoretically… Observed score = TRUE scor...

Reliability & Validity Reliability and Validity of Measurement Validity – Is the measure measuring the intended construct? Reliability – Is the measure consistently producing similar results under the same situation? True Score Theory Theoretically… Observed score = TRUE score + Random Error 70% test score True knowledge Noisy room Lack of sleep Confusing scantron True Score Theory Reliability is basically how much of observed scores are “real” vs. just error or noise Reliability = variance of true score total variance Reliability = variance of true score (variance of true score + random error) Implication Observed score = TRUE score + Random Error Reliability =.20 → 20% systematic variance, 80% random error Reliability =.70 → 70% systematic variance, 30% random error Reliability =.90 → 90% systematic variance, 10% random error ❖Acceptable reliability =.70 or higher Random Error Factors that make some people score higher and some people score lower in a random fashion: Participant – random participant factors Mood, hunger Measure – the instrument is faulty Ambiguous items Environmental – random factors in the environment Noisy test room, weather, time of day Observer – responses are not measured or recorded in the same way Coder misses behaviours, participant marks wrong response Take away points Rule of thumb: reliability of.70 or higher Low reliability = low construct validity Conceptually, what are you measuring if it’s just error?? Low reliability = low statistical validity More random error Reduces relationships among variables (i.e., weaker effect size) Weaker power → can’t find the relationships that are actually there! Assessing reliability 1. Test-retest reliability Consistent over time 2. Internal reliability Consistent across items measured at one time point 3. Interrater reliability Consistent across raters Test-retest reliability Consistency on a measure over two time points BDI BDI Test-retest reliability Consistency on a measure over two time points Time 1 score predicts Time 2 score → Pearson r =.70 or higher Time 2 Time 1 Test-retest reliability When to use When not to use Stable attributes Attributes that might not E.g., intelligence, be stable personality E.g., mood Test-retest reliability Consider the length of time Taking the tests too far apart → actual change might have occurred Taking the tests too close together → carryover effects BDI BDI Test-retest reliability Carryover effects Taking the same measure twice might bias people’s responses Parallel forms – two different forms of a test Try to ensure that they are as similar as possible! ANXIETY ANXIETY Worry about things. Get stressed out easily. Fear for the worst. Get caught up in my problems. Am not easily bothered by things. Am not easily disturbed by events. Am relaxed most of the time. Adapt easily to new situations. Internal reliability AKA internal consistency Items are consistent within the measure Relevant to multiple item measures (e.g., surveys) Items are conceptually similar Measured at one time point Internal reliability Split half reliability Correlate odd and even items or first half and second half Am the life of the party. Am the life of the party. Don't talk a lot. Don't talk a lot. Feel comfortable around people. Feel comfortable around people. Keep in the background. Keep in the background. Don't like to draw attention to myself. Don't like to draw attention to myself. Start conversations. Start conversations. Talk to a lot of different people at parties. Talk to a lot of different people at parties. Have little to say. Have little to say. Don't like to draw attention to myself. Don't like to draw attention to myself. Don't mind being the center of attention. Don't mind being the center of attention. Make friends easily. Make friends easily. Find it difficult to approach others. Find it difficult to approach others. Take charge. Take charge. Often feel uncomfortable around others. Often feel uncomfortable around others. Know how to captivate people. Know how to captivate people. Feel at ease with people. Feel at ease with people. Bottle up my feelings. Bottle up my feelings. Am skilled in handling social situations. Am skilled in handling social situations. Wait for others to lead the way. Wait for others to lead the way. Am a very private person. Am a very private person. Am the life of the party. Am the life of the party. Don't talk a lot. Don't talk a lot. Feel comfortable around people. Feel comfortable around people. Keep in the background. Keep in the background. Don't like to draw attention to myself. Don't like to draw attention to myself. Start conversations. Start conversations. Talk to a lot of different people at parties. Talk to a lot of different people at parties. Have little to say. Have little to say. Don't like to draw attention to myself. Don't like to draw attention to myself. Don't mind being the center of attention. Don't mind being the center of attention. Make friends easily. Make friends easily. Find it difficult to approach others. Find it difficult to approach others. Take charge. Take charge. Often feel uncomfortable around others. Often feel uncomfortable around others. Know how to captivate people. Know how to captivate people. Feel at ease with people. Feel at ease with people. Bottle up my feelings. Bottle up my feelings. Am skilled in handling social situations. Am skilled in handling social situations. Wait for others to lead the way. Wait for others to lead the way. Am a very private person. Am a very private person. Internal reliability Split half reliability There are so many ways to split the measure, did you pick the right one? Cronbach’s alpha (⍺) Essentially the average of all possible split halves Rule of thumb: ⍺ =.70 Sensitive to number of items Interrater reliability Consistency among two or more raters/ observers i.e., consensus or the degree of agreement between raters Coding behaviour How often did they interrupt each other? Interrater reliability Cohen’s kappa Categorical/ nominal scale (e.g., yes/ no the behaviour is present) Intraclass correlation (ICC) Interval or ratio scale (e.g., 1-7 sale) Did they interrupt each other? Yes/No How rude were they? Scale: 1 (not at all rude) to 7 (very rude) How many times did they interrupt each other? Improving reliability Longer surveys are more reliable Tradeoff: short surveys – less tiring for participants, greater participant compliance, good for multiple ratings Consistency of items Minimize error Clear instructions and questions Pilot test Train coders, define what to look for, break down coding segments Reliability & Validity Finger length as Single item Mood as Wonderlic IQ a measure of IQ IQ test measure of IQ test Take away points Just because it’s When reliability is low, We like valid, reliable does not it can’t be valid (it’s all reliable mean it’s valid error) measures! Construct Validity Subjective validities 1. Face validity 2. Content validity Does the pattern of relationships make sense? 3. Convergent validity 4. Divergent (discriminant) validity Criterion validity 5. Predictive validity 6. Concurrent validity Face Validity Does the measure look like it measures what it is supposed to measure? If the measure is transparent/ items clearly measure the construct, then it is high in face validity High face validity Do you have high self-esteem? Are you happy in your relationship? Low face validity Do you take baths or showers? (MMPI) ? Content Validity Does the measure assess the full range of the construct? Example: narcissism High: are you vain? selfish? arrogant? hostile? Low: are you arrogant? Example: Test 1 High: covers all lecture material and textbook readings Low: covers just ethics Convergent & Divergent Validity: Does the pattern of correlations make sense? Convergent validity: related to measures of similar constructs Emotional intelligence (EQ) is related to social skills Social EQ skills Divergent (discriminant) validity: unrelated to measures of dissimilar constructs Emotional intelligence is not related to IQ EQ IQ Criterion Validity Predictive validity: correlated with an expected behavioural outcome Do people who score higher on the SAT in high school perform better at university? Concurrent validity: correlated with the gold standard measure of the same construct Self-report of height should be correlated with actual height Criterion Validity Known groups paradigm – tests whether the measure can distinguish between groups whose behaviour is known Construct Validity Subjective validities 1. Face validity 2. Content validity Does the pattern of relationships make sense? 3. Convergent validity 4. Divergent (discriminant) validity Criterion validity 5. Predictive validity 6. Concurrent validity Threats to construct validity Experimenter bias – expectations affect manipulation or how researcher interacts with participants Rosenthal & Fode (1963) – smart rat study Single (researcher) blind study Double blind study – researcher and participant are blind Threats to construct validity Demand characteristics – factors that clue participants into what the study is about or how to behave Good subject role – try to help Bad subject role – try to undermine Single (participant) blind study Double blind study – researcher and participant are blind Threats to construct validity Reactivity – people know they are being observed and act differently AKA Observer Effect Threats to construct validity Experimenter bias Demand characteristics Reactivity Note: internal validity threat if these effects systematically affect conditions Reminders Practice questions

Use Quizgecko on...
Browser
Browser