Advanced Research Methods PDF

**[Week 1 -- Lecture 1]** **Refreshing vocabulary** 1. **True experiment**: A design in which participants are assigned randomly to treatments. 2. **Quasi experiment:** A design that resembles that of an experiment in that discrete groups are used, but participants aren't randomly assigned to treatments nor are treatments randomly determined for groups. 3. **"Between groups\" vs. \"within groups\" design:** A treatment between conditions and a measurement referring to differences within the participants at different times. 4. **Construct variable: A** theoretical variable that has 'reality status," such as competition, attractiveness, or negative mood. 5. **Operational variable:** The researchers' precise operational definition (i.e., measurement) of a theoretical construct. 6. **Independent variable:** The variable presumed to cause a change in the dependent variable. 7. **Dependent variable**: The variable presumed to be affected by the independent variable. 8. **Hypothesis**: A statement of a proposed relation between constructs. 9. **Theory**: A well-established principle that has been developed to explain some aspect of the natural world. 10. **Construct validity:** The degree to which the operational definition accurately measures the construct of interest. 11. **Convergent validity**: Overlap among variables presumed to measure the same construct. 12. **Discriminant validity**: The extent to which it is possible to discriminate between dissimilar constructs. 13. **Random assignment**: The process by which subjects receive an equal chance of being assigned to a particular condition. 14. **Manipulation check**: A measured variable designed to assess whether the manipulation worked and tapped a desired construct. 15. **Demand characteristic**: Aspect of the experiment encouraging the participant to respond according to situational constraints. 16. **Reliability**: The extent to which a construct is measured without error or bias. 17. **Subject expectancies**: A demand characteristic whereby subjects think they know the experimenters' interests and act accordingly. 18. **Double-blind procedure**: A procedure in which neither the experimenter nor the participant knows to which condition the participant is assigned. 19. **Order effects:** The effects on behavior of presenting two or more treatments to the same participants. 20. **Counter-balancing:** A technique for controlling order effects by which each condition is presented first to each participant an equal number of times; present each condition in each possible order. 21. **Moderation (interaction):** The effect of an independent variable on a dependent variable depends on the level of a third variable (i.e., the moderator) 22. **Mediation**: The effect of an independent variable on a dependent variable is explained by the change in a third variable (i.e., the mediator) **\ ** **[How to measure psychological constructs?]** **Measurement** - Theoretical construct Observations Values/Scores (measure or operational variable) - How people score on a measure should reflect the theoretical construct - Construct *validity*: Does it measure what it is supposed to measure? - *Reliability*: Is the construct measured reliably, that is, without too much error? Ein Bild, das Kreis, Text, Design enthält. Automatisch generierte Beschreibung 1. Picture: not valid as is only measures some part -- it never measures what it should measure 2. Picture: valid as is measures on average what it should measure 3. Picture: reliable as it always in the same spot but not valid as it only measures the one spot 4. Picture: reliable and valid **Types of measure** - There are different ways of measuring a psychological construct. Choosing an appropriate measure is an important aspect of the final assignment. - Self-report - Other-report - Physiological - Implicit measures - Behavioural - Now (today) self-report and other-report - Focus on practical recommendations - Some of this might sound familiar\ -- In this 'advanced' course we expect understanding, insight, ability to apply **Self-report** What is a self-report measure? - You ask participants for example how they feel Why use self-report measure? - Easier - Introspection Examples of statements that can be used as self-report measures -- I try to restrict my consumption of meat\ -- I believe COVID-19 is real\ -- Generally, I like working on course assignments **Self-report Question wording** - Easy to understand - Unambiguous - *I identify with my groups* (unspecific) - *I work hard* (not objective, unspecific) - Avoid double-barrelled questions - *Do you think that parents and teachers should teach 12-years-old about birth control options?* **Type of question** - Open versus closed questions - **Open**: open answer, respondent is free to write whatever - **Closed**: respondent chooses from a restricted number of options - *\'What is the most important problem you face at work?\'* - *\'Thinking about the past 7 days, what has made you happy?\'* - *\'What was the last behaviour that you consciously changed for environmental reasons?\'* - Open questions can lead to better, more valid responses - But it takes time to code the responses to open questions - Open questions are not used often but can be very helpful - Less of a concern if a well-validated closed question/scale exists **Response options** - Statement with agree-disagree response options is widely used - *The government should make wages more equal (1-Disagree, 5= Agree)* - *Same-sex relationships should have the same legal status as heterosexual relationships (1=Disagree, 5=Agree)* - In many cases it is better to use response options that match the question - *I am angry; I have high self-esteem (1= Disagree, 5=Agree)* - *How angry are you (0=Not at all, 5=Very angry); My self-esteem is (very low, low, moderate, high, very high)* (this is the better option) - Respondents prefer all response options to be labeled - How many response options? - *5 for unipolar scale* - *7 for bipolar scale* - But depends on specific question and on target respondents **Creating scales for multiple items** - Measuring a construct with a single item/statement is often undesirable - Reliability is assessed with Cronbach's Alpha, or with Omega\ -- Reliability increases with number of items and with correlation between items - Number of items? - Bandwidth versus fidelity problem - Broad construct requires many diverse items. - Diversity in item content leads to low fidelity/reliability (lower correlation between items) - Artificial to repeat the same question in a slightly different way to increase αlpha - If you want to measure a broad construct, then item content should reflect this - For example extraversion includes sociability, assertiveness, and talkativeness - So can you measure extraversion with five items about how much people like parties? Of course liking parties is relevant, but it is only a small part of the construct - These five items lead to a scale with high reliability but low validity because only one aspect of extraversion has been assessed - Acquiescence: some people tend to agree with most statements - Solution: include both forward-scored and reverse-scored items - *I like to work in a group* - *I do NOT like to work in a group* - New problem: negation in reverse-scored item is often overlooked by respondents - Good practice to highlight the negation (e.g., underline) - Sometimes different substantive meaning - *I like my boss* - *I dislike my boss* - (maybe I just don\'t care about my boss, or maybe I sometimes like and sometimes dislike her but these are not completely dependent on each other) - Important to consider when constructing a scale **Construct validity of self-report scales** - **Content validity**: Are items representative of construct? (e.g., MC exam) - **Structural validity:** Is the factor structure (dimensions) consistent with the structure of the theoretical construct? (e.g., personality measure) - **Generalizability**: measure works in different contexts and for different populations - Convergent and discriminant validity - Correlation with related constructs - No correlation with unrelated constructs - Method variance - When two constructs are measured with the same method (e.g., self-report, maybe even with structurally similar items) this inflates their correlation - E.g. relation between self-esteem and health when both are self-reported - Duo to real relation or to common metjod variance? - Avoid common method variance nakes a stronger design **Construct validity** - Assessing construct validity can be a long and complex process **[\ ]** **[What are potential problems of self-report measures?]** **What are the problems of self-report methods?** - Answers do not only reflect the underlying attitude - Social desirability bias - Response bias - e.g., acquiescence response style: respond positively ("true" or "yes"), - Questions are misinterpreted or interpreted differently by different subjects - Survey fatigue - Low introspective ability (people do not know their own mental state) Concern about construct validity of self-report methods - many influences other than item content; not clear what is precisely measured - Solution? - Observation by researcher - Others report on target individual (e.g., co-worker) - Target itself completes non-self-report measure (see Meeting 2) **Observation** - Observation of individuals - Natural environment - Lab setting - Observation can be *unobtrusive* - Does not interfere with behaviour that is being observed - People are now aware of being observed - The drawbacks of observation are - It can take a lot of time - You can only observe behaviour that is observable! - Archival research (not really observation, but similar) - E.g., public records of legal decisions to study discrimination **Other-ratings** - Other-ratings/ Observer-ratings frequently used in occupational psychosocial risk assessment - Moderate to high agreement between workers self-rated (*N* = 669) and observer- rated occupational psychosocial demands - Observer-ratings by occupational safety and health (OSH) committees - OSH experts: e.g., occupational health physicians, health & safety experts, industrial & organizational psychologists - Both methods are useful assessment strategies in the context of psychosocial risk assessment of adverse working conditions - \...may bring other biases - **Halo:** inappropriate generalization from one aspect of a worker (e.g., outstanding trait) on other aspects - **Contrast effects:** tendency to evaluate a person relative to other person of the team - **Liking bias:** giving more positive feedback to persons they like - **Rater motivation, emotions, intentions** - Processes that might lead raters to intentionally or unintentionally provide inaccurate ratings (e.g., role of power motive, trust, threat perceptions; Schmitt et al., under review; Urbach & Fay, 2018) - A lot depends on the relation between the rater and the target - Other-ratings (i.e. colleagues, supervisors) of organizational citizenship behavior (OCB) could lead to underestimation à some OCBs may difficult to observe (Ilies et al., 2009) e.g., -- keeping up with what is going on with the organization -- preventing work-related conflicts with others (courtesy) - Only public behaviors are observable -- Not private behaviors\ -- Not (always) mental states - People might behave differently in front of others; few others have a complete view on who we are -- In front of supervisors, e.g., impression management (Spector, 1994; 2019) -- Different friends can bring out different aspects of who we are **Multi-rater example: 360° Feedback** - Feedback regarding an employee\'s behavior from a variety of points of view - Used for personnel development, performance appraisal, decisions (e.g., compensation) - Multi-rater assessments may generate diverging feedback - Self-ratings tend to be higher - ***Reactivity, defensiveness, and dissatisfaction*** in terms of discrepant evaluations - ![page37image65991264](media/image2.jpeg) **How good are other-reports (summary)?** - Varying degrees of agreement depending on both the *construct* (e.g., behavior, affect, attitude) and the *source* (e.g., supervisor, external committee) - Other-ratings (i.e. colleagues, supervisors) could lead to underestimation. Some states are difficult to observe e.g., keeping up with what is going on with the organization - Only public, not private behaviors are observable for others - People might behave differently in front of (different) others - Other-ratings may also introduce other biases so be aware of those when you use them in your study design - Other-ratings fail to capture individuals subjective experience and perception - Some constructs are by definition perceptual in nature à self-report (values, attitudes, affect) **Different research designs** - Experience with designing correlational (e.g. cross-sectional) and experimental designs. What was easier, and why? - What advantages/disadvantages can you name for the different designs? - Correlational designs: describe social reality (not unimportant!) - Experimental designs: test causal relationships - Choice depends on research question - *"Which groups in society are not following coronavirus measures and why?"* - correlational - *"Do social norms affect people's willingness to get tested and self- quarantine?"* - experiment (manipulate social norms **Why experiments?** - - - Random assignment eliminates individual differences between the participants in different conditions - Important for theory and for effective interventions/policy - Poll: who did experiment for bachelor thesis? - Disadvantages? - (This part on experiments loosely based on Smith (2014) chapter, but chapter is more advanced and is not required knowledge) **Manipulation** - Construct validity of the manipulation: does it manipulate (only) what you wanted to manipulate? - Control condition - Everything equal apart from the crucial element of the manipulation - If you are unsure about the best control condition and have plenty of resources, you can include more than one control condition - Manipulation check - Checks wheather the manipulation have the intended effect? - For example self-affirmation: check how positively participants view themselves after self-affirmation vs control - Often better to validate a manipulation in a separate study - For final assignment, need to include manipulation check in main study - Avoid attrition (= dropping out of participants), even if equal between conditions - Avoid demand effects - Demand effects = people guess based on the study materials what the research question is about and try to help researcher with giving the \`right answer\` - Participants guess research question or hypothesis and change their behavior - E.g., evaluate job candidates with typical Dutch versus Turkish name - Solution can be to construct a good cover story so that participants are not aware of what you are actually interested in - Ideally, experimenter is not aware of experimental condition - Then they can also not signal it, even not unconsciously - *Example: priming and walking slowly* - *Bargh, Chen, & Burrows (1996): priming participants with 'old' stereotype, makes them walk more slowly after experiment* - *Doyen, Klein et al. (2012): this only occurs when the experimenter is expecting this to happen!* - *Bargh findings might have been due to experimenters being aware of the condition participants were in, and unconcsciously giving subtle cues to participants about being slow/fast\ * - Avoid demand effects - At the end of a study on environmental values and behavior, ask participants to throw away a cup and they have to choose between general bin or recycling - At the end of a study in health attitudes and behavior, ask participants to choose between apple and candy bar - These are too obvious for participants - Avoid demand effects - Ask people to imagine a situation - Here people simulate how they think they would respond - This might be different to their actual reaction if the situation was real - Extremity of manipulation - Ecological: what exists in the real world - Good for generalizability (external validity) - Preference for strong manipulation - Have large effect sizes - 'Start manipulation strong and you can't go wrong' only when it is unrealistic as it is sooo strong - Statistical power is higher (more likely to find effect if it exists) - When interested in theory and causality, often simply important to show that the effect exists - Of course participants still need to believe it\ -- Psychological realism: experiment is psychologically meaningful, it engages participants - Some things are difficult to manipulate, or difficult to mimic the strength of real-world factors - *Exposure to violence on TV, videogames, media* - How can an experiment reflect the effect of years of exposure? - *Background of immigrants* - Are higher educated natives more negative towards higher educated immigrants due to competition for resources? - Can ask reactions towards an immigrant with or without a degree - Can ask reactions towards 'higher educated' or 'lower educated' immigrants - But if opposition to higher edcuated immigrants is a result from years of local exposure to higher educated immigrants and the experience of threat, this is very difficult to manipulate - Sometimes the manipulation can't make that much as you would want to - Experimental material should be a representative sample from all possible relevant material - *You want to study the effects of having a female versus male teacher* - Same content delivered by one female and one male teacher - Effect of teacher gender might be due to any of the other differences between the male and the female teacher - There are many **confounding** factors (e.g. attractiveness) - Solution is to use a larger sample of different female and male teachers - Same principle holds for all manipulations - Need to vary the specific content of the manipulation, use a sample of stimuli rather than just one - Conceptual replication (replication with stimuli and measures that are conceptually identical) **Experimental design: crossed factors** Crossed: several factors, each combination of conditions exists -- - 2 x 2 is the simplest example - To control for an important source of variance - E.g., study on preferences for lower or higher educated political candidates - Created fake profiles of politicians - Interested in effect of candidate education - Need to add candidates' political opinion, as this will be an important source of variance in preferences - Solution is to cross candidate education with candidate political opinion - For theoretical reasons (see next slides) - Dual processing and attitude change - Effect of argument quality and argument source on persuasion depends on involvement/motivation of participants **Dual processing and attitude change** Ein Bild, das Text, Screenshot, Schrift, Visitenkarte enthält. Automatisch generierte Beschreibung - In systematic prossessing it is decided wheater it leads to arguments - When there is no processing capacity heuristic is used - Petty, Cacioppo & Goldman (1981) investigated whether they could persuade high school students to support changes to exam policy - Personal involvement - Low: to be implemented in 10 years - High: to be implemented next year - Argument quality - Weak - Strong - ![Ein Bild, das Diagramm, Screenshot, Text, Reihe enthält. Automatisch generierte Beschreibung](media/image4.png) - Source ( = heuristic cue) - Expert - Non-expert - Ein Bild, das Screenshot, Diagramm, Text, Design enthält. Automatisch generierte Beschreibung **Experimental design: confounding factors** Confounding factors - Effect is caused by manipulation, but which aspect of the manipulation? - Sometimes conditions differ in more aspects than only in the key aspect that they were supposed to differ in - Or, a manipulation might have effects in addition to the effect on the key construct it was supposed to affect - *Example: male versus female teacher, gender was confounded with all other differences between the two teachers* - *Example: educational background of politicians* - Manipulate the educational background of fictional political candidates to see whether higher or lower educated politicians are preferred - Educational background might be confounded with perceived candidate competence - Solution: crossed design with independent manipulations of education and competence **Between vs within designs** - **Between participants:** each participant is in one condition - **Within participants:** each participant is in all conditions - (Mixed design: combination of between and within) - Advantages within design - Higher statistical power (=more likely to find an effect if the effect indeed exists in the population) - smaller sample needed - Disadvantages within design - Demand effects: participants experience all conditions and are more likely to start thinking about the purpose of the study - Carryover effects (e.g., emotion manipulation): - Important to counterbalance: systematically vary order of conditions (and check for order effects) **Controlling for covariates?** - Controlling might make design more sensitive because uninteresting variance is explained away by covariate - Even with randomisation groups can differ in relevant characteristics - Should we control for covariates? - Maybe, but only when\ -- Theoretically relevant\ -- Specified beforehand (preregistration)\ -- Covariate must have been measured before the manipulation **Randomisation** - Simple randomisation\ -- Each participant is randomly allocated to a condition - Block randomisation\ -- Controlling for a nuisance factor, a covariate you are not interested in - Not interested in participant gender, but important to control for it - Can add as covariate (see previous slide), but better to use block randomisation - Separate men and women, and do randomisation within each group - Result: conditions will be balanced in terms of participant gender - Given the independence (correlation=zero) between intervention and gender, gender is automatically controlled for **Quasi-experiment** - No random allocation of participants to conditions - Intervention in one company but not another, because only one company wanted to allow it - Lots of potential confounds: all differences between companies - You can measure important covariates and control for them in the analyses, but this design can never give the same information as a true, randomized experiment **[Week 1 -- Lecture 2]** **[Beyond self-report: Measures of attitudes and other mental states]** How to measure attitudes/values/motivation? ------------------------------------------- What is the easiest way to measure people's attitudes/values/motivations? - Ask them! Why does this not always work? - People cannot verbalize psychological processes - People do not want to verbalize psychological processes - Social desirability bias Different types of measures - Explicit versus implicit - Attitudes - Values - Motivation - Goals The example of attitude measures -------------------------------- Types of attitude measures ========================== - Explicit - Direct - Self-report - Indirect - Implicit (more effortful) - → Implicit measures were developed to bypass the idea of social desirability - → Social desirability is not the only difference Explicit, direct, self-report ============================= - Self-report: Participants control the outcome of the measurement procedure - Direct: Just ask participants what their attitude is - Traditional questionnaire measures - *"I don't like old people"* Indirect ======== Attitude is inferred from participant's responses or behavior - *Seating distance from older person* - *Discriminate older person in hiring procedure (e.g. giving people a CV)* - *Facial expression in natural interaction with older person* Are these measuring attitudes or behaviours? Implicit -------- [Priming] - **Affective priming task** - *Task is to decide whether word is positive or negative* - *Preceded by prime related to 'young' or 'old'* - *Response facilitation* - *relying on response time* - *The prime influences **speed** of response to related stimuli* - *Implicit reaction times (faster/slower responses)* - *Participants are often aware of the prime-target relationship, but can\'t control automatic responses* - **Affect Misattribution Procedure** - *Based on **evaluation** of a neutral target stimulus* - *Judge whether a neutral stimulus is pleasant/unpleasant* - *The prime influences **judgments** about unrelated stimuli* - *Participants are unaware of the prime\'s influence on their judgments* - *Especially useful for measuring attitudes that people might not want to consciously disclose (e.g., racial or political attitudes)* - *not relying on response time* [Response competition tasks] - **Implicit Association Test (IAT)** - *Two pairs of (related?) concepts to be categorized left or right* - *For example, good-bad and old-young* - *Task with many different trials (first trial: good and bad; second trail: coke and pepsi; if coke and good is one the same button and you like coke it should be easy for you)* - *Judgement whether it is one category or another* - ![Test Your Implicit Bias - Implicit Association Test (IAT) - Loyola Marymount University](media/image6.png) - **Go/No-go Association Test (GNAT)** - Participants are asked to \"go\" (respond) by pressing a key when a target category (e.g., images or words representing a group or concept) appears along with a positive or negative attribute (e.g., \"pleasant\" or \"unpleasant\"). - They must \"no-go\" (withhold response) when non-target stimuli appear. - Faster and more accurate responses for certain pairings (e.g., \"Go\" for a target group and positive words) suggest a stronger implicit association between the concepts. These are indirect too! Awareness ========= - Implicit measures have often been claimed to assess unconscious attitudes - No evidence for unawareness - Recent evidence for some level of awareness (Hahn, Judd, Hirsch, Blair, 2013) Controllability =============== - Related to issue of social desirability - Much evidence that naïve participants cannot control measurement outcome - However, informed participants can control or even fake their responses on measures like the IAT and the AMP Attitude object =============== - Self-report measures: anything - Implicit measures: object must be simple if it is to be represented by a word or picture - Difficult to compare self-reported attitude to a complex object and implicitly measured attitude to a simple object - *For example: symbolic racism (\~ideology) versus Black-White evaluative priming* - Importance of **structural fit** Links between different measures and a behavior (Gawronski & DeHouwer, 2014) ============================================================================ Ein Bild, das Text, Diagramm, Reihe, parallel enthält. Automatisch generierte Beschreibung![Ein Bild, das Diagramm, Plan, Reihe, technische Zeichnung enthält. Automatisch generierte Beschreibung](media/image8.png) Simple Association Patter: only one measure adds to the behavior Additive Pattern: Both types of measureadd to behavior Double Dissociation Pattern: explicit measure measures different aspect of behavior than the implicit measure Moderation Pattern: Both add to behavior but there is also a moderation in between them strengthening or weakening relationship Multiplicative Pattern: both measures combine to predict behavior **The link between explicit and implicit measures of attitudes** Correlations between the two types of measures are small to medium Interpretations of this finding: - They could measure different things - They might matter in different situations → finding the moderators **Stability of individual differences on implicit measures** Gawronski, Morrison, Phills, & Galdi (2016) - Hypothesis: Individual differences on implicit measures are more stable over time than individual differences on explicit measures - Variance in measures can be caused by the situation, while the rank order of individuals stays the same - Longitudinal studies using IAT and Affect misattribution procedure (AMP) (as well as explicit measures), two months interval between measurements - Implicit measures turned to be less (!) stable over time than explicit measures - Explanation: Implicit measures rely on memory associations that can be triggered by situational factors. For explicit measures potential conflicts between such associations and a person's beliefs can be deliberately resolved → resulting in more stability for explicit measures **[Values]** **Values** - Definition: General predisposition towards something, and is not an attitude toward specific object Self-report - Values measure (e.g., Schwartz) often very abstract - *"How important is... as a life-guiding principle"* - *"Achievement, that is, success, capability, ambition, and influence on people and events"* - *"Power, that is, social power, authority, wealth"* - There are variations of this type of measure, for example to focus on biospheric values (Van der Werff, Steg, & Keizer, 2013) when you just ask people on the street they are often hard to answer for them **More indirect measures of value(s)** - In consumer/economic psychology (and economics in general), value is oftentimes taken quite literally: - *"How much would you be willing to pay for.... ?"* - But: monetary value is not necessarily equal to the psychologial value construct - More indirect measures of the psychological value construct: - Decision scenarios in which the response options correspond to different types of values (Mumford et al., next slide) ***Example from Mumford et al. (2002)*** *1. What is most important to you when considering what to do? (Circle 3)* *◦ a. Risking your reputation with family and friends you see every day.* *◦ b. Keeping your job and supporting your family.* *◦ c. Keeping your spouse's respect.* *◦ d. Keeping your job and position within the organization.* *◦ e. Being proud of the job you do.* *◦ f. Doing what is right.* *◦ g. Having others see you as a moral and upstanding person.* *◦ h. Having the clients respect you* Coding manual for which options respond to which type value. **[Motivation]** **What do you want to measure?** A specific goal (in a given situation): - An end state that an organism has not yet attained (and is focused on attaining in the future) and that the organism is committed to approach or avoid. (Moskowitz, 2012) - *E.g. you want to end the course with a specific grade* A motive - A class of incentives that are intuitvely attractive to person (McClelland, 1985) - Power, affiliation, intimacy, achievement - Motives can be the source of goals - *E.g. you want to be an active student during the course* **Measuring goals** Self-report: - Just directly ask questions about whether people have a certain goal - *E.g. ask which grade someone wants* Directly observe the behavior - Underlying logic: if a behavior is absence or presence, a goal was activated (or not) - Or a more organizational setting: measure the actual performance Indirect measures: - Approach and avoidance behavior (e.g. pulling/pushing joystick) - Response time tasks for goal accessibility (e.g. goal-related words or images as stimuli) - Persistence in a task (e.g. how long people work on very difficult tasks) - Picking up a task after an interruption **Motives** Explicit measures - Questionnaires, oftentimes specific for one motive (e.g. achievment, or power) **Achievement motivation (Elliot & Church, 1997)** Ein Bild, das Text, Screenshot, Zahl, Dokument enthält. Automatisch generierte Beschreibung **Implicit motives** Nonconscious motivational needs that orient, select and energize behavior (McClelland, 1987) Power, achievement, affiliation most commonly assessed Methods used to assess these motives are usually indirect measures: - Most commonly used: projective measures - TAT: Participants are shown ambiguous pictures and asked to tell a story about what's happening - Picture story exercise: A variation of the TAT, where participants are asked to write stories in response to pictures. - Multimotive grid measure: Combines projective storytelling with self-report methods. Picture Story exercise ◦ 1. What is happening? Who are the people? ◦ 2. What has led up to this situation? That is, what has happened in the past? ◦ 3. What is being thought? What is wanted? By whom? ◦ 4. What will happen? What will be done? **Example (taken from Fodor, 2010)** The picture shows a man and a woman performing a trapeze act. - *The people are Superwoman and her arch nemesis Rhinosorous (sic) Man. She is finally going to catch him and rid the world of his evil ways. Up to this point, Rhino Man has been terrorizing the world in his attempt to gain ultimate control. He wants to make every person his servant. Superwoman wants to rid the world of his evil ways. The only way she can do this is to knock him into the next galaxy. The people of the world will be forever grateful for her services. For her efforts she will be made Honorary Princess of the World. She can travel anywhere and do anything she wants all over the world.* → Coding into categories **Links between implicit motives and other variables** Schultheiss, Campbell, & McClelland, 1999 - individuals high only in personalized power motives had elevated testosterone after imagining a success in a subsequent dominance contest and continued to have high testosterone levels after actually winning, but not after losing, the contest. **[Physiological measures]** **Physiological measures (see Blascovich, 2014 for an overview)** - Heart rate & blood pressure → challenge and threat - EMG → detect muscle movements to so affect/emotions - Startle eyeblink → enhanced during positive affect/inhibited during negative affect - Cortisol level → stress - Brain-imaging techniques → activation/co-activation of regions in brain Do Physiological Indexes Provide the "Gold Standard" for Psychological Measurement? =================================================================================== Blascovich (2014): - *"That physiological measures enjoy intrinsic superiority over other types is a naïve notion that perhaps stems from the mystique of physiological measurement \[...\]. Rather, physiological indexes provide a third set of measurement methods in addition to subjective and behavioral ones. Together with the other types of measures, physiological ones add to the power of multimethod triangulation."* Mixed-method studies ==================== - A combination of different methods - It's important to make sure that there is sufficient evidence that the various measure do actually measure what they are supposed to measure, otherwise convergent/divergent results become uninterpretable - The methods should also not interfer with each other - *Example: An explicit measure of sexism might well already give away what you are planning to measure. An implicit measure might therefore be completed differently by informed participants.* - Keep in mind, the relation between two measures to the dependent variable(s) that we dicussed in detail for attitudes are also important here since these patterns may emerge whenever you use different types of measures in the same study. **How to create appropriate stimuli for a study** - *How do Dutch participants interact with refugees from Muslim vs. Non-Muslim countries?* - How would you set up a study answering this question? - Oftentimes, a study design would be chosen in which participants interact with one specific person of each category - → even though participants are randomized, the stimulus isn't - Issues arise in terms of generalizability, but also in terms of construct validity **Stimulus sampling** - Many methods and designs compare the responses of a certain group of participants to a certain set of stimuli - Example: *Do people in a happy vs. sad mood prefer sweet over non-sweet drinks?* - ![Ein Bild, das Text, Screenshot, Schrift, Reihe enthält. Automatisch generierte Beschreibung](media/image10.png) - What if all of the potential findings are just produced by preference for a specific drink? - Averaging across stimuli may conceal differences between the stimuli - Stimulus sampling especially necessary if individuals are aware of variations of instances within a category (e.g. using specific exemplars of a group) - If possible, stimuli should be pretested, in order to avoid systematic but unintended variations - A very accessible description of the issue: Wells & Windschitl (1999) - Statistical methods exist to deal with the issue: - Judd, Westfall, & Kenny (2012) **Qualitative methods** - Examples. - Interviews - Focus groups - Historical analyses - Discourse analyses Exercise on creating more indirect measures ------------------------------------------- Think of an implicit and an explicit measure of these concepts: [Lying about environmentally friendly behavior] **1. Explicit Measure:** - A self-report questionnaire could be used. For instance, participants could be directly asked about their environmentally friendly behaviors and attitudes, such as: - \"How often do you engage in recycling?\" - \"How frequently do you make efforts to reduce your energy consumption?\" **2. Implicit Measure:** - An Implicit Association Test (IAT) could be designed to assess the automatic associations people have between themselves and environmentally friendly behaviors. In this case, the test would involve measuring the speed and accuracy with which participants associate words related to **themselves** (e.g., \"I,\" \"me\") with **environmentally friendly behaviors**(e.g., \"recycling,\" \"sustainable\") versus **environmentally harmful behaviors** (e.g., \"littering,\" \"wasteful\"). If participants are faster to associate themselves with harmful behaviors, it might indicate an implicit truth about their actual behavior, even if their explicit responses suggest otherwise. - This IAT would serve as an **implicit measure** by tapping into unconscious attitudes or behaviors, revealing discrepancies between actual and claimed environmental behavior. [Sad feelings in response to the end of a friendship] - Showing a movie about such a situation and observing their behavior **[Advanced Research Methods in Social and Organizational Psychology (PSMSM-1)]** **[Week 2 - Session 3: Longitudinal Research]** **[Understand what longitudinal research is and the suitable research questions for longitudinal designs]** **1. What is Longitudinal Research** › Research that emphasizing the study of change for which data are collected - From the same individual - at [three] or more points in time - on at least one of the constructs of interests - using the same instruments **Wave**: one application of the study Are two measures not sufficient? › **Two** measures are not sufficient - Cannot determine the form/shape of change over time (Rogosa, 1995) - Difficult to differentiate the true change from measurement error - Increases reliability - But useful if temporal processes are not part of the research question **Longitudinal Research vs. Cross-Sectional Studies** › What are the differences between them? - Cross-Sectional is only one wave & Longitudinal is 3+ waves - Cross-Sectional is mostly self-reported **Cross-Sectional surveys** **Apt to** (geeingnet für): › Validate between-person measures › Study the structure underlying constructs › Examine how two or more variables interact to explain the criterion variable (Moderation) **Insufficient to:** › study almost everything else › E.g., capture within-person processes indirect links **[2. To explain when a longitudinal design is useful]** **2. Why Longitudinal Design?** **If you are interested in\...** 1\. study change, growth, development over time. ▪ E.g., How does life satisfaction change over the life course? 2\. understand how previous experiences affect later processes or outcomes ▪ E.g., how does experiences at university predict career success? 3\. understand the temporal order 4\. separate between- and within-person effects ▪ E.g., People who chronically experience more stress, exercise less ▪ On days A experiences more stress than they usually do, they exercise less 5\. limit common method bias: you have certain factor when filling in a questionnaire may lead to same format responses \- responses look more equal than they really are \- there may also be a response shift -- people start to think differently than they did before \- noticed that they answered a bit extreme before and realized that they need to go a bit more to the middle \- assume that people have the same understanding of the items but that may also change over time **[3. the different types of longitudinal designs]** **3. Research Questions and Types of Longitudinal Design** › Experimental and quasi-experimental interventions -\> very strong methodology - disadvantage = external validity - most rigorous way of doing study › Diary/experience sampling studies/Intensive longitudinal design - several surveys per day or week › Panel studies - you have long periods of time between the waves **Experimental and quasi-experimental interventions** › Goal: effect of intervention on outcome › Comparison of change in dependent variable (s) between experimental and control groups › Data collected from same individuals ▪ Pre-/Post- intervention (most used; ideally with control condition -\> high causality) ▪ Multiple post-measures to invest process and durability ▪ During intervention **Example: Mindfulness Intervention** › Mindfulness Intervention (Hülsheger et al. 2015) › Intervention: ▪ Daily guided mindfulness exercises (in total 4 types) - Ein Bild, das Text, Screenshot, Panorama enthält. Automatisch generierte Beschreibung › Groups: ▪ Intervention group: 67 ▪ Control: 73 › Measurements: ▪ Trait mindfulness, psychological detachment, sleep quality and duration › Results: ![Ein Bild, das Reihe, Diagramm enthält. Automatisch generierte Beschreibung](media/image12.png) **Diary Studies/Intensive Longitudinal Design** - "Psychology needs to concern itself with life as it is lived." \~Gordon Allport, 1942 › Goal: ▪ to understand the "particulars of life" that can detect within-person processes and ruling out between person differences › Data collected on multiple days/weeks/events from same individuals ▪ Participants receive signal to complete questionnaires ▪ 1 versus multiple measures per time unit **Experience sampling methods** › Week-level studies (one self-report per week) - e.g. ask them at the end of the work week what happened during the week and how they are feeling - helpful when talking about rare positive / negative events; e.g. yearly performance appraisal - good to capture a long period of time › Daily diary studies (one self-report per day) - almost not possible to do over a year - e.g. how did you perceive the daily stress › Experience sampling studies (several self-reports per day) mostly used as an umbrella term › Ecological momentary assessment (EMA) - focused on subjective, self-reported data (e.g., mood, stress, behaviors) › Ambulatory assessment (often incl. physiological data) - includes both subjective reports and **physiological** measures › Ecological momentary interventions (EMI) - e.g. get reminders to dring or to move; mindfulness training **Example: Week-level study** Ein Bild, das Text, Reihe, Diagramm enthält. Automatisch generierte Beschreibung **Diary Studies/Intensive Longitudinal Design** › The strengths of Intensive Longitudinal design is\... ▪ Document that certain processes actually occur in everyday life ▪ Examine how varying contexts/situations influences daily affect, cognition, and behavior ▪ Minimize retrospective bias: distortion that occurs when people recall and report past events ▪ Provide daily process data that complement macro-level longitudinal designs **Example: daily diary study** › Pindek et al. (2021) › Reactivity to incivility across the working week › 139 participants › 681 self-reports › Predictors: day of week, incivility, organizational constraints › Criterion variable: job satisfaction › Reactivity to incivility across the working week ![Ein Bild, das Quittung, Text, Reihe, Diagramm enthält. Automatisch generierte Beschreibung](media/image14.png) - Result: at the start of the workweek there is a stron negative link -- at the end there is not that negative -\> people are happy that the work week is over **Example: ESM -- Casper & Wehrt (2022)** › Indirect links between psychological detachment during off-job time and appraisal of the job › Three surveys per day: Afternoon, evening, and morning › Across two consecutive working weeks › 183 participants › 1,220 daily self-reports Ein Bild, das Diagramm, Reihe, parallel, Schrift enthält. Automatisch generierte Beschreibung **Example: EMA -- Kosenkranius et al.** › 8 self-reports of human energy per day across 4 days ▪ Bi-hourly from 8.00 to 22.00 › 4 self-reports of needs-based crafting per day ▪ Every four hours from 8.00 to 22.00 › 110 participants › 396 days › 2,358 self-reports › Predictors: time and needs-based crafting › Criterion variable: human energy › Trajectories of human energy within the day ![Ein Bild, das Text, Screenshot, Reihe, Schrift enthält. Automatisch generierte Beschreibung](media/image16.png) **Panel Studies** › Goal: Analyse the direction of effects; › Data collected repeatedly from same individual over longer period of time ▪ Months, years, etc. ▪ Sometimes \- Observe outcomes or events during the study period such as graduation, transition to retirement, birth of first child, illness, pandemic › There are three different types of panel studies - **Cohort panel surveys**: tracks the same group of individuals, often based on a shared characteristic (e.g., birth year), over a period of time to observe changes - **Household panel surveys**: Longitudinal survey that collects data from the same households repeatedly over time to study dynamics like income & employment - **Rotating panel surveys**: Survey design where participants are periodically replaced by new respondents, allowing for both longitudinal data on some individuals and cross-sectional data on the overall population. **Panel Studies: Cohort Panel Study** › Focuses on a population subgroup that experienced the same event during the same period (a cohort) ▪ E.g., born in a particular month, graduated from university in a given year, or having been married during a given year. ▪ Can study long-term change and individual development processes. \- E.g., transition into adulthood; \- E.g., 1970 British Cohort Study (BCS70); 1975 (age 5), 1980 (age 10)...1986 (age 16), 1996 (age 26)...2004/2005 (34/35)37 \| **Panel Studies: Household Panel Survey** › They have an indefinite life and is set up to study individual and household changes › Well-known examples: ▪ German Socio-Economic Panel (SOEP), U.S. Panel Study of Income Dynamics (PSID) **Example 1: Social Problems** › Example 1: Concerns about social problems decline with age (Chow et al., 2018) › Data drawn from Edmonton Transition Study (ETS) which began in 1985 and continued for 25 years.39 \| **Example 2: Well-being and Covid-19** › Examples 2: Well-being and Covid-19 (Zacher & Ruldolph, 2020) › Assumptions: ▪ Quarantine, physical distancing, isolation leads to feelings of uncertainty and loneliness ▪ People experience more stressors such as job insecurity, work-family conflict, discrimination ▪ *Hypothesis*: well-being decreases during early stages of COVID-19 pandemicEin Bild, das Text, Screenshot, Diagramm, Schrift enthält. Automatisch generierte Beschreibung![Ein Bild, das Text, Screenshot, Diagramm, Schrift enthält. Automatisch generierte Beschreibung](media/image18.png) - Most of the indicators dropped even only for a short part **Research Questions** › Two goals when assessing changes over time 1\) Descriptive ▪ Change process over the study period in one or more variables 2\) Explanatory ▪ Why did the change process occur the way it did ▪ Establish temporal order of a relationship **Research Questions: Change** › Investigate **change** in one or more variable(s) ▪ Trajectory is of interest Ein Bild, das Diagramm, Reihe, Text enthält. Automatisch generierte Beschreibung › Main focus is on change in one variable ▪ How does self-esteem develop over the lifespan? ▪ Does a time management training reduce stress? › Change in one variable links to change in other variable ▪ Does a change in workload correlate with a change in burnout? ▪ Does an increase in workload predict an increase in burnout? › Honeymoon-Hangover effect: Changes in job satisfaction after employees start a position with a new employer - First goes up after changing the job - Afterwards goes down again ![Ein Bild, das Reihe, Diagramm, Screenshot, Text enthält. Automatisch generierte Beschreibung](media/image20.png) › Moderators to investigate ▪ speed of change or ▪ shape of change Ein Bild, das Text, Reihe, Diagramm enthält. Automatisch generierte Beschreibung **Research Questions: Temporal Order** › Mutual relationships and temporal order ▪ Does A predict B and/or does B predict A? › To establish causality of a relationship... ▪ predictor and outcome variables are correlated ▪ predictor must precede outcome temporally ▪ alternative explanations for the relationship can be ruled out › Third variables: need to be considered but cannot be ruled out completely ▪ Causal statements not warranted › Example: Job satisfaction and life satisfaction (Unanue et al., 2017) › Assumption: Higher job satisfaction predicts higher life satisfaction or vice versa (top-down vs. bottom-up models) › 275 Chilean working adults participants in both two waves of data collection![Ein Bild, das Reihe, Diagramm enthält. Automatisch generierte Beschreibung](media/image22.png) › Justice and depression (Lang et al. 2011) ▪ Established theoretical assumption: Organizational (in)justice leads to mental health problems \- H1. There is a time-lagged effect of \[...\] justice perception on depressive symptoms. ▪ Employees with mental health problems may perceive more organizational injustice \- H2. There is a time-lagged effect of depressive symptoms on \[...\] justice perception dimensions. › Tested assumptions with 625 active soldiers; 6 months between wavesEin Bild, das Entwurf, Zeichnung, Diagramm, Lineart enthält. Automatisch generierte Beschreibung **Research Questions: Mediation Models** › Mediation implies a mechanism/causal chain › Idea that causes take time to manifest and show effects ▪ Mediator can only be caused by values of prior variables › Also allows for testing of alternative models (e.g., Y -- M -- X) ![Ein Bild, das Reihe, Diagramm, Schrift enthält. Automatisch generierte Beschreibung](media/image24.png) **[4. To know how to analyze longitudinal data]** **4. Data Analysis** **Interested in Predicting Change** › Outcome changes in response to predictor ▪ Workload predicts less green behavior at work ▪ Relationship conflicts predict more negative emotions › Outcome variable is measured at least twice ▪ Be mindful about slow versus fast change in outcome **Difference Scores** › Calculate difference score t2x -- t1x, t3x -- t2x, etc. › Difference scores can be used as ▪ predictors ▪ outcomes › Controversial in the literature ▪ Concerns about reliability of the difference score ▪ Measurement intervals of the scale need to be equal **Regression Analysis** › Adding fatigue T1 changes the meaning of fatigue T2 ▪ Deviation from fatigue T1 (more or less fatigue at T2 than at T1) ▪ Workload predicts the deviation in fatigue Ein Bild, das Text, Reihe, Diagramm, weiß enthält. Automatisch generierte Beschreibung![Ein Bild, das Text, Quittung, Schrift, Reihe enthält. Automatisch generierte Beschreibung](media/image26.png) **Interested in Change Patterns** › How does a construct evolve over time? ▪ Self-esteem after graduation from university. › Does the change pattern look different for different groups? ▪ Females vs males ▪ Intervention vs control group › Variable of interest is measured on multiple occasions (3 or more) Ein Bild, das Text, Säugetier, Screenshot, Pelz enthält. Automatisch generierte Beschreibung **Repeated Measures ANOVA** ![Ein Bild, das Quittung, Text, Reihe, Diagramm enthält. Automatisch generierte Beschreibung](media/image28.png)Ein Bild, das Quittung, Reihe, Diagramm, Text enthält. Automatisch generierte Beschreibung **The Problem of Nested Data** **Nested data: **refers to a data structure in which observations are organized in a hierarchical manner, where units at one level of analysis are contained › Observations are nested in participants, similar to ▪ Students nested in schools ▪ Employees nested in teams ![Ein Bild, das Diagramm, Reihe, weiß, Schrift enthält. Automatisch generierte Beschreibung](media/image30.png) › Requires use of multilevel analysis or structural equation modeling ▪ Aggregation (loss of power) ▪ Disaggregation (violation of independence) **Advanced Techniques** › Does A predict B or B predict A? › (Random intercept) cross-lagged panel analysis: - A method that examines the relationships between variables over time while accounting for individual differences in baseline levels (random intercepts) of the variables. Ein Bild, das Diagramm, Reihe, Plan enthält. Automatisch generierte Beschreibung › Is change in A correlated with change in B › Latent Growth Curve Models: - A statistical approach that models how a variable changes over time by estimating individual growth trajectories based on repeated measures. ![Ein Bild, das Diagramm, Entwurf, Reihe, Zeichnung enthält. Automatisch generierte Beschreibung](media/image32.png) › Does (changes in) A predict change in B or does (change in) B predict change in A? › Latent change score models: - A model that focuses on the amount of change in a variable between time points, allowing for the analysis of how changes in one variable are related to changes in another over time. Ein Bild, das Entwurf, Diagramm, Reihe, Zeichnung enthält. Automatisch generierte Beschreibung **Discontinuous growth models** ![Ein Bild, das Text, Diagramm, Screenshot, Reihe enthält. Automatisch generierte Beschreibung](media/image34.png) - A model that analyzes changes in a variable over time, allowing for abrupt shifts or \"discontinuities\" in growth patterns, often due to interventions or significant events. - There are different phases - Vitality increases during the weekend and drops on Monday again (Blue Monday) **[5. the challenges of longitudinal designs]** **5. Issues to Consider** › Missing cases and values ▪ Report how many people participated when ▪ Allow for missed waves? › Dropout analysis ▪ Are participants significantly different from participants who dropped out? - Drop out -\> Missing data -\> Biased data › Clarify shape and duration of expected change ▪ What is the theoretical reasoning to expect change? ▪ Intra- or interindividual change? ▪ Choose appropriate number and spacing of measurement occasions › Attrition: estimate dropout from first to last measurement ▪ Number of measurements ▪ Length of questionnaires ▪ Compensation and benefits › How to account for missings in analysis? › Quality of measures ▪ Select reliable and valid measures › Power is affected by number and spacing of measurement and sample size ▪ Increasing sample size increases power more than increasing waves ▪ Recommendation (Gabriel et al. 2019) **Recommendations** › Ensure theory specifies longitudinal relationships among constructs › Determine optimal number of subjects, repeated measurements occasions, and timing of measurment occassions. › Address the attrition issue (loss of participants from a study over time) before conducting the study › Understand the analytic method to be employed and its limits **Conclusions** › Longitudinal research designs useful for ▪ temporal research questions (development, process) ▪ overcoming some limitations of cross-sectional research › Important to theorize about change ▪ Why could we expect change? › Data collection is resource intense for researchers and participants **[Week 2 - Session 4]** **[Open science and dealing with data]** How is the current research/publication process organized? ========================================================== - Individual researchers/small teams study a specific phenomenon based on a specific theoretical perspective - Studies are then written up in the form of an empirical journal article, mostly in the form of multiple study papers (single study papers are rare nowadays) - Authors choose a journal to which they submit the article (mostly based on quality of the data, potential readership, quality of the journal...) How is the current research/publication process organized? ========================================================== Most journals are **peer-review** journals - Every article is read by a Chief Editor and a Handling Editor - The Handling Editor sends the paper to 2-4 people working on similar topics (reviewers) and asks them to write a short evaluation of the paper (review) - The Handling Editor independently reads the paper, then looks at the reviews and decides whether he sees a chance for publication - Most papers are rejected. - Most other decisions are "revisions", meaning that the authors have to make changes (which can range from simple changes in the text to conducting new studies) - Papers can then be resubmitted, are sometimes sent out for review again.... What is the critique of the former status quo? ============================================== - For editors and reviewers, it was difficult to determine whether the results reported are the whole stories - Were there more studies than the ones reported? - Were there other variables? - Were the results expected or was the „story" constructed after the data were obtained? - The reporting guidelines made it very difficult to actually replicate a study due to the non-availablity of materials and no information on effect sizes and some other statistical indicators. - Studies were often underpowered. No clear guidelines for required sample sizes - Call for more transparency in the research process Main points by Simmons et al. (2012) ==================================== - False positives are quite common in psychological research. - Fining or hypothesis is seemed to be confirmed even though the data is npt strong enough to show - They occur due to \"researcher's degrees of freedom.\" - Researcher himself has a high power to determine outcome of his study - Guidelines for reviewers and authors. Suggested guidelines ==================== [Requirements for authors:] 1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article. 2. Authors must collect at least 20 observations per cell (e.g. with a 2x2 design about 80 people) or else provide a compelling cost-of-data-collection justification. 3. Authors must list all variables collected in a study. 4. Authors must report all experimental conditions, including failed manipulations. 5. If observations are eliminated, authors must also report the statistical results if those observations are included. 6. If an analysis includes a covariate, authors must report the statistical results of the analysis without the covariate. [Guidelines for reviewers:] 1. Reviewers should ensure that authors follow the requirements. 2. Reviewers should be more tolerant of imperfections in results. 3. Reviewers should require authors to demonstrate that their results do not hinge on arbitrary analytic decisions (= choices made during the data analysis process that are not based on clear theoretical or empirical justifications but are instead subjective or random). False positives might not be the only problem. ============================================== Ein Bild, das Text, Screenshot, Schrift, Zahl enthält. Automatisch generierte Beschreibung Fiedler et al. (2014) ===================== - False negatives may be a more common problem than false positives. - The reason for this problem is often vague theories and a lack of strong alternative hypotheses. Some remedies for the described problems ======================================== - Preregistration of a study (describe study before hand) - drastically reduces the degrees of freedom for the researcher. - Data can (and sometimes must) be made available along with publications. - A stronger focus on replication of existing findings helps determine their validity. - Increasing statistical power through for example larger sample sizes that match the expected effect. - Meeting open science criteria is made visible in some articles (e.g., badges on the front page). Important concepts for evaluating stability/reliability of findings =================================================================== - **Reproducibility** - When using the same analysis on the same data, the same result should occur. - So analysis should be described - **Robustness** - Slight variations in the analysis may or may not impact the findings. - A robust finding = one that is not dependent on these variations. - **Replicability** - Does the outcome occur again when the same study is repeated? Personal data and privacy ========================= - Can all research data be made open? NO - **Personal data:** all data that could potentially be linked to an identifiable individual, - contrasted with anonymous data - Identifiable individual? - **Direct identifiers**: name, email address - **Indirect identifiers**: any information that could lead to identification (e.g., detailed demographics in a small population) - When you collect personal data, EU privacy law (GDPR) applies - Informed consent is required - Collect only what is necessary and keep it only as long as needed it (data minimization) - This means: Properly anonymize data as soon as possible! - Sensitive personal data needs stronger protection, including: - Health data - Genetic or biometric data - Religion, race, ethnicity - Political beliefs (e.g., union or party membership) Dealing with data ----------------- On the one hand: p-hacking ========================== - Use research degrees of freedom to get **p \<.05**. - Measure the dependent variable (DV) in multiple ways and report the \"best\" result. - Add participants and monitor the p-value until it drops below 0.05. - Exclude certain groups (e.g., men or women, young or old, fast or slow) - Include or exclude outliers depending on which works 'better' - Include or exclude control variables depending on which works - Only report studies with significant effects in the expected direction. - All of these practices **inflate type I errors** (false positives). **On the other hand** - Research practice is messy -- not always clear solutions for problems - Even if data is not perfect, it might have some value - Exploratory research/analysis often contributes to new ideas - Have you seen examples of researchers dealing with data in questionable ways? Four real examples ------------------ - Job threat and attitudes towards immigrants (outliers) - Gender equality and Olympic medals (control variables) - Dropout in a longitudinal study (attrition) - Social Dominance Orientation and attitudes towards different out-groups (scale reliability) - Each group of 4 students discusses one problem, and later there will be a discussion in the large group Covariates/ Control variables ============================= - Why would you include control variables in an analysis? - Correlational: control for confounding variables - E.g. the more firemans arrive correlates with how much the damage is - Strengthens the design, so should always be included - Experimental: filter out variance due to factors unrelated to the manipulation - Include covariates that have a relation with the outcome variable Example: Gender equality and Olympic medals =========================================== - Olympic medals in 2012/2014 - Gender equality index - Country-level analysis - Women's medals: r=.22 (positive) - Men's medals: r=.24 (positive) - Gender equality leads to more medals for women *and* men! (Published in *Journal of Experimental* *Social Psychology*, 2015 - ![Ein Bild, das Text, Screenshot, Schrift, Zahl enthält. Automatisch generierte Beschreibung](media/image36.png) Control variables ================= - Is gender equality or GDP per capita related to the number of medals? - Ein Bild, das Text, Schrift, Screenshot, Reihe enthält. Automatisch generierte Beschreibung - What's the causal model? Indirect effects? - Difficult to say based on these correlations - Longitudinal relation equality-GDP is bidirectional Discussion control variables ============================ - Including control variables can affect results - In correlational research, control for potential confounding factors - See Becker et al. (2015) Statistical control in correlational studies. Journal of Organizational Behavior - Do not control for a variable that could be a mediator - In experimental studies the trend is against use of control variables - Trying out all possible control variables is a form of p-hacking - Do not control for variables measured after the manipulation - Preregister use of control variables, include reason and causal model Outliers ======== - What are outliers? - Unusual cases - Unusual values on dependent or independent variable, OR - Large influence on statistical test, for example a regression coefficient - ![Ein Bild, das Zeichnung, Entwurf, Handschrift, Kinderkunst enthält. Automatisch generierte Beschreibung](media/image38.png) Example: education and prejudice ================================ - Participants are undergraduate students - 2 by 2 between-subjects experiment - Highly versus lowly skilled immigrants - High versus low subjective educational level - Participants compare themselves to those with higher or lower education levels than a university bachelor - Prediction: Highest prejudice when - Immigrants are highly skilled - Own education level is low (=threat, as position in labour market is not so strong) - (Interaction) Should you exclude the outlier? =============================== - When is it acceptable to exclude cases? - Suspicious about manipulation? - Did not remember information about manipulation? - Large influence on results (Cook's distance)? - In this study only after all these outliers were excluded, the predicted effect was found Results leaving out outlier =========================== - Anger towards immigrants - Interaction: F(1,56) = 5.434, p =.02 - Excluding further influential cases strengthens interaction - Ein Bild, das Text, Screenshot, Schrift, Diagramm enthält. Automatisch generierte Beschreibung Should you exclude the outlier? =============================== - What did you think? - Can/should you publish a study if you removed an outlier? - Flexibility in data analytical choices can increase rate of false positives - Is it a reliable effect? - This case: - Follow-up studies have confirmed the effect - Excluding outlier seems to have uncovered a real effect - A good way of getting to know your data - This one particular study did not allow to make strong claims - Here excluding outliers led to an interesting research line - ![Ein Bild, das Menschliches Gesicht, Mann, Screenshot, Vorderkopf enthält. Automatisch generierte Beschreibung](media/image40.png) Attrition/dropout ================= - Test for differences between responders and non-responders - Within one measurement point/experimental session: complete vs non-complete - Across measurements: no response after one or more completed waves Attrition ========= - Example: abusive supervision, rumination, and physical health across 3 measurement points with 6 month time lags - T1: 253 (74% response rate) - T2: 184 (73% response rate) - T3: 182 (99% response rate) - Tested for differences between - Participants who continued versus dropped out - Participants who stayed with their supervisor versus changed supervisor Attrition: Questions ==================== - Why is it relevant to test for differences between responders and non-responders as well as differences between employees who changed their supervisor and those who didn't? - How would the results and the interpretation of the results be affected if abusive supervision would be systematically higher among non-responders? - How would the results and the interpretation of the results be affected if physical health would be systematically lower among responders? - What did you think? Discussion ========== - Systematic differences between responders and non-responders: - Responders and non-responders are systematically different on study variables: potential under-or overestimation of true effect - Participants in difficult situations or with problem behavior often drop out - Change in participants' situation: - Potential underestimation of effects if participants are no longer exposed to abusive supervisor - Attrition can lead to loss of power - 70% response rate for five waves - T1: 500; T2: 350; T3: 245, T4: 171, T5: 120 - Attrition can also systematically change the sample composition over time, e.g., - More females - More highly educated participants Scale reliability ================= - Example: Social Dominance Orientation (SDO) and attitudes towards outgroups - SDO is a preference for hierarchy between groups - SDO predicts negative attitudes towards many groups - Hyp: SDO predicts negative attitude towards Turkish Dutch but does not predict attitudes towards the less educated - Interaction between SDO and group (Turkish Dutch versus less educated) - Interaction between SDO and group: p=.06 - SDO and Turkish Dutch: r=.32\*\*\* - SDO and less educated: r=.15 - Not strong support for hypothesis because interaction is not significant - SDO 4 items - Validated scale but alpha=.65 - Items: 1. In setting priorities, we must consider all groups 2. We should not push for group equality 3. Group equality should be our ideal 4. Superior groups should dominate inferior groups - Item 2 and 3 correlate highly and form more reliable scale (alpha=.76) - Interaction between SDO and group: p=.02 - SDO and Turkish Dutch: r=31\*\* - SDO and less educated: r=.07 - What should researchers do? Discussion ========== Should we present the analysis with the two-item SDO scale? - Items 2 and 3 are very similar, therefore higher correlation - Content of construct changes when items are dropped - Other research shows that SDO has two dimensions - **SDO-dominance**: pref for dominance hierarchy, aggressive - **SDO anti-egalitarianism**: pref for hierarchy, ideological - Items 2 and 3 are from SDO anti-egalitarianism - So maybe SDO anti-egalitarianism is what moderates the effect? Additional analyses have led to new hypothesis Sometimes existing scales are not very good Conclusion ========== - Do you know what we mean by 'Dealing with data?' - Importance of reflection, argumentation, openness, consistency, preregistration Power ===== - Small samples produce unstable estimates (large confidence intervals) - Many articles in prestigious journals (used to) have small samples and large and/or sexy effects - Many of these turn out to be unreliable - Power of.80: 80% probability of rejecting null hypothesis (i.e., 'finding the effect') IF the effect is real - 'Hit', correct rejection of H0 (see slide 10) -.80 is often considered a minimum now - Problem with interactions - These have small effect sizes (and require larger samples) - Standardised effect size of an interaction is about half the size of a main effect of the same unstandardised size - Effect for women d=.5, effect for men d=0, interaction d\~.25 - Standardised effect size is used for power calculation - See and - Within-subjects design can be a solution for low power - By nature more powerful, as participants are their own control group, so lower variability - Smaller sample required, but not always possible - Need to justify sample size in final assignment - Problem: effect size unknown - Based on previous research (but can be unreliable) - Assume small to medium effect size, which is common - G\*Power can be used to calculate required sample size for experimental and correlational designs G\*Power ======== For main effect of manipulation in 2 by continuous design - Test family = F tests - Statistical test = ANCOVA - Type of power analysis = A priori - Effect size f =.18 (between small and medium effect) - Alpha =.05 - Power =.80 - Numerator df = 1 (df associated with test, is 2 when there are 3 conditions) - Number of groups = 2 (total number of groups in design) - Number of covariates = 1 - Total sample size = 245 For main effect of manipulation in design with 3 conditions - Test family = F tests - Statistical test = ANCOVA - Type of power analysis = A priori - Effect size f =.18 (between small and medium effect) - Alpha =.05 - Power =.80 - Numerator df = 2 (df associated with test) - Number of groups = 3 (total number of groups in design) - Number of covariates = 1 - Total sample size = 301 When comparing two conditions - Test family = t tests - Statistical test = Means: difference between 2 independent means - Type of power analysis = A priori - Effect size d =.35 (between small and medium effect) - Alpha =.05 - Power =.80 - Total sample size = 260 For interaction in 2 by 3 design - Test family = F tests - Statistical test = ANOVA fixed effects, special, main effects and interactions - Type of power analysis = A priori - Effect size f =.09 (standardized interaction effect is small) - Alpha =.05 - Power =.80 - Numerator df = 2 (df associated with test) - Number of groups = 6 (total number of groups in design) - Total sample size = 1193 (301 for f=.18) Week 3 - Lecture 5: Moderation & Mediation - Correlational & Intervention Study Designs --------------------------------------------------------------------------------------- Psychological models vs reality: ================================ - Preditor -\> Outcome - Reality: - Lots of variables and relations (like spaghettis) Variables that influence / explain / cloud predictor -\> outcome effect ======================================================================= Ein Bild, das Text, Schrift, Screenshot, Electric Blue (Farbe) enthält. Automatisch generierte Beschreibung - Predictor sometimes also called X Example: Men's collective action for gender equality ---------------------------------------------------- ![](media/image42.png) Moderator (=modifying) ====================== - Example Hypothesis: Zero-sum beliefs more strongly decrease CA intentions in more gender-equal countries (in which men's higher social status is less secure and more women - Country-level gender equality is the moderator in this example - Ein Bild, das Text, Screenshot, Schrift, Electric Blue (Farbe) enthält. Automatisch generierte Beschreibung Moderation example figure ========================= ![Ein Bild, das Text, Screenshot, Reihe, Diagramm enthält. Automatisch generierte Beschreibung](media/image44.png)Ein Bild, das Text, Screenshot, Reihe, Schrift enthält. Automatisch generierte Beschreibung - » The higher people are on zero-sum beliefs the lower they are on CA intentions - » 2 lines are showing negative slopes - » gender inequality reverses the effect Mediator ======== - Example Hypothesis: Zero-sum beliefs decrease CA intentions **because** such beliefs lead men to see women as insubordinate, manipulative, and needing dominative control by men (i.e., hostile sexism) - Hostile sexism is the mediator in this example - Mediator is about the process - Why / How does X influences Y - ![](media/image46.png) Covariate ========= - - Age is a covariate in this example - The covariate is only connected to the outcome variable Example: Men's collective action for gender equality ==================================================== ![](media/image48.png) ====================== - If the interaction term is not significant it is not considered a moderator - But is does significantly effect the outcome Confounding variable ==================== - Ein Bild, das Text, Screenshot, Schrift, Reihe enthält. Automatisch generierte Beschreibung![Ein Bild, das Text, Screenshot, Schrift, Reihe enthält. Automatisch generierte Beschreibung](media/image50.png) - Confounder and covariate are often confused - However, a covariate is only connected to the outcome variable and confounder is connected to both the predictor and outcome variable Variables that influence / explain / cloud predictor -\> outcome effect ======================================================================= Moderation & Mediation: What is what? ------------------------------------- Moderation & Mediation ====================== **Moderation** - Under ***what conditions** does a relationship occur* (e.g., under which conditions, in certain environments, for certain people)? - ***When*** is a relationship stronger or weaker / positive or negative? - Moderator variables change the *magnitude* and/or *direction* of the relationship between x and y - ![](media/image52.png) - ***Why*** does a relationship exist (i.e., explain underlying mechanisms or processes)? - ***How*** can a relationship or an effect be explained? - - **» Combinations** possible (i.e., conditional process analysis) Why Care about Moderation & Mediation? ====================================== - Key to ***theory*** building and testing (Whetten, 1989) - Advancing knowledge in psychology - Critical for understanding when and why certain ***practices*** (e.g., interventions) are effective - *Example: Tendency to believe in conspiracy theories* - **Moderation** - Under which condition does a positive effect on well-being exist or not? - For whom is the effect larger or smaller? Certain individuals, who do not benefit at all? - **Mediation** - Why does this effect exist; how can it be explained? - Why not beneficial for certain people? Moderation: Linear Regression Model with one Interaction Term (2-way Interaction) ================================================================================= [Multiple Linear Regression] - Preferred to examine moderator effects when either predictor or moderator (or both) are measured on a continuous scale (e.g. 0-10;1-29) or are categorical (e.g. different groups) - ![](media/image54.png) - X = Predictor; Z = Moderator; x\*z = interaction effect; Y = Outcome 1\. Is coefficient β3 significantly different from zero? - offers info on magnitude & direction of moderation 2\. Is increment in squared multiple correlation (R2) given by the interaction significantly greater than zero? - R2 as effect size for moderation/ interaction in regression analysis Moderation: Multiple Linear Regression ====================================== **How to calculate the interaction term?** - Some software (e.g. R; Process Macro) automatically - Other software (e.g. SPSS): new variable, computed manually - Center\* both X and Z around the respective sample means (i.e. subtracting the mean from the value of the original variable so that it has a mean of 0) **Steps** - Compute cross-product of X (predictor) and Z (moderator) (just multiply them) - (Regress Y on Control variables) - » Rule out alternative explanations for hypothesized effect (Bernerth & Aguinis, 2016) - in research designs that cannot prevent influence of confounding variables - Regress Y on X and Z - Regress Y on X \* Z - How much additional variance explained by interaction? Mediation ========= Ein Bild, das Text, Schrift, Reihe, Zahl enthält. Automatisch generierte Beschreibung Formulate the 4 hypotheses! Answer: 1. H1: Abusive supervision is **positively *associated*** with employee stress (words like "leads to" or "results in" only when you have an experimental setup). 2. H2: Employee stress is **positively** associated with alcohol consumption. 3. H3: Abusive supervision is **positively** associated with employee alcohol consumption *(not necessary: evidence for x-y direct link not required to establish indirect effects)* 4. H4: Employee stress mediates the relationship between abusive supervision and employee alcohol consumption, such that abusive supervision is positively associated with employee alcohol consumption ***through*** higher levels of employee stress. Mediation ========= *A: Simple mediational model* - Three paths (i.e., a, b, and c) represent direct effects - Indirect effect of X on Y calculated as product of paths a and b - ![](media/image56.png) *B: Parallel mediation* - Is the mediation independent of the effect of the other mediators? - Mediators should be conceptually different; not too highly correlated - *C: Sequential mediation* - ![](media/image58.png) Testing Mediation ================= **Sobel Test** - Multiplication: a \* b - Statistical assumption: a and b are normally distributed » a \* b is normally distributed - Assumption holds in large but not in small samples (= skewed a \* b) **Recommended: Calculate confidence intervals (CI) using Bootstrapping** - CI: consider uncertainty around estimate; indicates range of possible scores for indirect effect - Calculated using resampling methods such as bootstrapping - Resampling with replacement: drawing repeated samples from original data samples for many times (e.g., 5000 times) - Each time, the indirect effect is computed, a sampling distribution is generated - CI is computed; if interval excludes zero: indirect effect is different from zero - Does not rely on assumption of normal distribution PROCESS by Hayes ================ - PROCESS macro - Load it into your SPSS program - or SAS, R - Simple point-and-click interface - Many mediation, moderation, and conditional process models Quiz ==== - A researcher proposes that students' cognitive ability is positively related to their pro-environmental behavior **through their improved** **comprehension of complex topics** such as climate change; and this effect is assumed to be **stronger for female students and those high** **in conscientiousness.** - What are the moderator(s) and mediator(s) in this example? - Moderators: gender, conscientiousness - Mediator: comprehension of complex topics - **What could have been the hypothesis here?** - - Answer: Self-monitoring moderates the relationship between self-esteem and speaking up, *such* *that* this relationship is positive for low and negative for high self-monitorers (cross-over interaction) Choosing your research design ============================= - No design is perfect - Knowledge of - challenges - validity threats - can help to prepare for - analyses - interpretation of results Correlational Study Design: Cross-sectional Study Design ======================================================== - *Observing* what naturally happens in the environment *without interfering/ manipulating* - Based on: Naturalistic observations, **survey** method, **interview**, archival data (e.g., existing datasets) or a combination - **Cross-sectional** - Relationship between two or more variables at one point in time - Differs from longitudinal: Same sample multiple time points - **Key features & challenges** - **No cause-effect** conclusions - No conclusions about **temporal order** (e.g. first happens this than this) - **Impossible** to examine mediation hypotheses - Potential threat of common method bias - e.g. you measure X and Y with a self-report measure for both methods -- when you find a strong association you think they are associated but it can also because of the questions - Common systematic error variances of X & Y » inflated estimate of X-Y relationship - "There seems to be a universal condemnation of the cross-sectional design..." - Despite its limitations, popular and widely used in psychology - Easy to implement; less expensive; easy to analyze - **Why useful?** - When you use it as a first step... - Conducting exploratory research; lack of knowledge about patterns of relationships and timing - Description of a sample (individuals, teams, cultures) - How do people see the world? Investigate relationships with (various) potential correlates, e.g., - Differences in political attitudes across individuals, teams, cultural groups - *How do individuals across countries differ in their eating behaviors? Is life satisfaction higher in married people than in singles? Differences in personality traits between leaders and followers?* Ways to Optimize: How to get the most out of a Cross-sectional Study? ===================================================================== **Control variables** - - - - - - - - e.g., series of cross-sectional studies in different contexts/with different samples - -....but, other sources do not always provide information that is more accurate - e.g., peer ratings could lead to underestimation: only public, not private behaviors, thoughts, emotions, attitudes are observable - - T1: independent variable(s) - T2: dependent variable(s) Intervention Study Designs ========================== - Is an intervention effective? Safe? Liked? Cost-effective? - Theory and empirical evidence already available *(e.g., Job demands-resources model)* - Translate into intervention - Evaluate intervention (ideally: longitudinal - 3 or more wave *see last session*) Randomized Control-Group Intervention Designs ============================================= - Intervention group & control group - - Researcher-manipulated variable(s) - With or without pretest - Example - Randomized Control-Group Pretest-Posttest Intervention Design ============================================================= - Sample size determination: Estimate sample size needed to show an effect of the intervention based on expected power (e.g. 80%) - Randomly assign participants to intervention & control group(s) - Pre-test of both groups - Implement the intervention in the intervention group & standard method/alternative in control group - Run a minimum of one post-test in both groups (better more) - Measure differences between groups Internal Validity in Randomized Control-Group - Pretest-Posttest Intervention Design ==================================================================================== **Internal validity** - Ensure that observed changes in outcome(s) are caused by the intervention [Example threats to internal validity] - **History**: events, experiences that happen in-between and could cause the observed effects (e.g., corona crisis) - controlled if both groups are tested at same time points - **Maturation**: Naturally occurring changes over time within the participants that could account for the results (people become "wiser") - controlled in that they are manifested equally in both intervention & control group - **Regression to the mean:** When individuals are selected based on extreme pretest scores, they will have less extreme subsequent scores regardless of the success of the intervention (e.g., only highly exhausted students in stress intervention) - controlled if intervention & control groups are randomly assigned from the same extreme participant pool; random assignment: extreme scores distributed equally Randomized Control-Group Pretest-Posttest Intervention ====================================================== [Challenges & Limitations] - Cost & time constraints - Unethical (e.g., randomization: withholding treatment from a control group) - Not accepted, not possible (e.g., in a company, school) - Demand characteristics (purpose of intervention known to participants) - Use coherent and believable cover stories; keep experimenters, confederates unaware of participants' conditions - Pretest sensitization (e.g., participants are influenced by the pretest content) - (Add conditions without pretest measure; Randomized Solomon Four-Group Design) - Randomization not successful - e.g., small sample: even distribution of confounding variables between the intervention and control group cannot be accomplished - Attrition & Differential treatment-related attrition - Sample retention procedures e.g., regular contact, incentives, reducing barriers to program attendance - Treatment adherence - e.g., treatment condition participants may refuse treatment; Control group participants seek out treatment **Quasi**-Experiments: Non-Randomized Intervention Design ========================================================= - can be with/without control groups - can be with/without pretest **Widely used in applied psychology!** Quasi-Experiments ================= Used when e.g., - Random assignment is not feasible or unethical - e.g., compare effectiveness of intervention for e.g.: students with and without stress symptoms; students who are organized in classes; males vs. females; singles vs. married - Rely on existing group memberships - Only small sample available **Key challenges** - Absence of random assignment: Are the outcomes due to the intervention or other influences? - Difficulty in measuring or controlling for important confounding variables - *Lower internal validity* Quasi-Experiments - Example =========================== - What are the effects of a **recovery training program** on employees' recovery experiences (e.g., psychological detachment from work, relaxation during off-job time)? - **Training goal:** Increase knowledge about recovery & recovery strategies - **Method/Measures** - T1: Before the training - T2: One week after the training - T3: Three weeks after the training - Training group *N* = 48 individuals - Waitlist control group *N*= 47 individuals - Tested if training group & control group participants differed in demographic variables, recovery experiences and outcome variables (t-tests; chi-square difference tests) - no significant differences **Results** - Participants showed higher psychological detachment and relaxation one week and three weeks after the training and higher mastery experiences one week after the training **Limitations** - -- Limited implications regarding longer-term effects - -- No randomization - -- *Waiting control group* - Better: add second control group that receives another kind of "intervention" (e.g., lecture on the beneficial effects of recovery from work) ***(Active control group)*** Intervention Study Designs - Some Advice ======================================== - Randomized controlled design studies should be used whenever possible - Highest level of credibility - If one decides for quasi-experimental intervention study - Apply pretest-posttest design with control group - Measure potential confounding variables - Apply longer time post-test intervals » longer-term effectiveness of intervention, e.g., Mensmann & Frese (2018): - T0: 6 months before the training - T1: 1 month after the training (short-term effects) - T2: 5 months after the training (mid-term effects) - T3: 25 months after the training (long-term effects) - If intervention consists of several parts (e.g., different training components & coaching sessions): examine which parts are effective. - Is a single part responsible for the effects or the combination of all parts? - Isolate and systematically vary specific parts of the training Quiz ==== 1\. Sam developed an intervention, which involves randomly assigning 180 employees to three separate conditions such that there are 60 participants in each condition. This study is a \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ design. a\. within-subjects, quasi-experimental

Advanced Research Methods PDF

Document Details

Tags

Related

Summary

Full Transcript