3MD3 Notes PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document reviews the credibility crisis in research, discusses various types of non-empirical thinking, introduces the theory-data cycle, and highlights the importance of scientific method concepts like parsimony and falsifiability, as well as the importance of being skeptical of the information, such as anecdotal evidence, and how to identify, describe and study human behaviour.
Full Transcript
(1) Credibility crisis - Publish or perish phenomena causing this issue to arise whether the information is even credible in the first place - Currently, researchers have the pressure of publishing in high-end journals or else they perish - People use...
(1) Credibility crisis - Publish or perish phenomena causing this issue to arise whether the information is even credible in the first place - Currently, researchers have the pressure of publishing in high-end journals or else they perish - People used biased research designs (design it in such a way that you end up finding the expected outcome easily rather than being able to disprove the hypothesis) Peer review system: having someone in your field review your work; nowadays people don't have enough time to properly review work or they didn’t have any proper training for it P hacking: Solutions: - Proposed by APS: There is a badge present on the piece of research which proves that you are preregistered [so you only research what you said you would do] Open material: in other research, if something was used and you need it for your research, you can use it as it is available (surveys, data, etc) Open data: can get all the raw data for the research if something similar was done in the past UNIT 1: foundations An introduction to thinking empirically: 2 ways of knowing: - Plato’s rationalism: reasoning through thinking rationally (knowledge within one’s self) - Aristotle’s empiricism: knowledge can be gained through our senses, data (if you want to understand something then you have to observe it) Modern empiricism: - Systematic empiricism - Skepticism Importance of being skeptical: Being unsure about something that you want data to have the research on it to have to believe it (2) 3 types of non-empirical ways of thinking: - Trusting experts can lead you astray (are you sure that person is an expert?): Through ways of going to talk shows, media like tiktok etc. Your biases can help with believing these “experts”. Experts are people too so they are bound to make mistakes Get a second or even a third opinion - Anecdotal evidence can be misleading (“I saw it with my own eyes”): Personal experience and the experiences of others are compelling Why should we reject these sources of information? ➔ Lack of comparison groups ➔ Presence of confounds - Comparison groups: Required for falsification Example: bloodletting (too much blood in the body so let some of it out) 2. Confounds - Alternate explanations for your effect - Illusionary correlations (something makes sense to you but isn’t necessarily true) - What is obvious to you isn’t always in effect Summary: anecdotal evidence - Anecdotal evidence is problematic because anecdotes - Dont include comparison groups - Tend to ignore confounding factors - Can’t be falsified - The research doesn’t describe you, because it is about an average person which you might not fit into The problem with intuition: - It’s lazy - Confirmation bias: our gut leads to this! - Very difficult to override How does our intuition operate? - Seeking belief-confirming information and avoid belief-disconfirming information - examples: Cherry picking: picking to looking into information you somewhat already believe (what information you are willing to expose yourself to) Availability heuristic: you believe that what easily comes to mind tends to be correct Tenacity: once we believe something then it is really hard to let go. (continuing to believe something just because of believing in something for a long period of time) (3) Theory data cycle: - Theory is a general statement about how a set of variables are related to one another - You take your theory and make very specific guesses about what to expect if the theory is correct - Broad theory > specific hypothesis > what specific thing you expect to happen in your research (prediction)> collect data (your set observations) - Pretty much helps discover if your theory is right or wrong (whether to use it or discard it) Good theories are…….. 1) Supported by data Replication Converging evidence across researchers/labs Converging evidence across methods 2) Falsifiable Challenges the theory Can’t prove right but can prove wrong 3) Parsimonious “All other things being equal, the simplest solution is the best” William of Ockham Simple solutions but also not violate the laws of the universe (green men living in the brain violates the laws; can’t falsify and not parsimonious ether) Summary: - Science progresses through the testing of theories, from which we derive hypotheses and predictions - The best theories: Are supported by data Are falsifiable Explain in the simplest way Dont violate the laws of the universe (4) Assessing research claims: what can researchers say about their data? Foundational concepts: - Variables: a thing that varies (varies across 2 levels) - Constants: doesn’t change - I.Vs: gets manipulated - D.Vs: gets measured to see how I.V impacts this - Random assignment - Participant (“subject”) variables: any aspect where the participant varies (height, gender, weight) - Quasi-experiment: aims to establish a cause-and-effect relationship between an independent and dependent variable - Confound: a third variable that affects independent and dependent - Extraneous variables: variables that you are not interested in but they still pop up Participant-related (mood, personality, intelligence, etc) Situation-related (research assistants) - Operationalization: define everything to understand how to measure something (5) 3 types of claims - Frequency - Association - Causal Frequency claims: - Describing a rate or level of something - Focus is on each variable in isolation - Variables are measured, not manipulated Association claims: - Argue that 2 or more variables are related to one another (heavy phone use and poor sperm quality) - Predicting Y from X - Prediction line (line of best fit) - Prediction error Causal claims: - One variable is responsible for changes in another variable Establishing causality: 1) The variables are significantly related 2) The causal variable came first, the outcome later Temporal precedence 3) No other explanations for the relationship between the varaibles (6) how do we asses research claims? 3 claims, 4 validities Frequency: construct, external Association: construct, external, statistical Causal: construct, external, external, statiscal, internal Tryon’s intelligence research on rats - He inbred the rats that were horrible on completing the maze (maze dull) and inbred the rats that were amazing at completing the maze (maze bright) and found that maze bright rats made less errors than the maze dull rats - Construct validity: Does the construct (the structure of the experiment) really measure what is claimed to be measured Does the researcher’s operationalization of their variables reasonably capture the constructs the researcher is studying? - Frequency claims: what does “sad” mean? (in the claim of people feel sadder in february) - Association claims: operationalize the claim (what does heavy cellphone use?; poor sperm quality?) - Causal claims: variables that are being manipulated? What were the levels in that manipulation (how much exercise?) what was measured? Other variables? Construct validity: back to tryon’s rats - Selective breeding produced maze peforming rats - Genes caused better performance - Does selective breeding = heredity? (YES!) - Are performance errors a good measure of “maze learning ability”?) - Does maze learning ability truly reflect “intelligence”? (or just memory?) Intelligence: - Alternatives? (memory?) Inquisitive vs uninquisitive Keen sensory vs dull sensory skills Motivated vs unmotivated Fearful vs not fearful Statistical conclusion validity: - Proper treatment of data assumptions come along with that data; the number of participants, distribution, measurement scale, etc Debate over the “robustness” of inferential statistical tests - The soundness of the researcher’s statistical conclusions Type 1 errors (“false alarm” Type 2 errors (“miss”) (7) Internal validity: - The degree of confidence in the exposure to the independent variable rather than some other factor would be an alternative explanation to cause changes in the dependent variable Tryon’s experiment: environmental control before and during testing Everything is controlled so the only difference would be in the genes - Example: research using humans Random assignment to levels of the IV or orders of levels of the IV Controlled environment ❖ Lab room ❖ Instructions/cover story ❖ Researchers ❖ Confederates External validity: can your claim be generalized across context (does it apply to everyone) - Generalizability of the research across: Populations Stimuli (maintain the same conceptual definition through different tasks) Situations (would work in a lab, but does it work in the real world?) Establishing external validity: - Ecological validity: Mundane vs psychological realism (8): Satisfaction with life scale (SWLS): answer these questions on a scale of 1-5 (self-report) - In most ways, my life is close to ideal - The conditions of my life are excellent - I am satisfied with my life - So far, I have gotten the important things I want in life - If I could live my life over, I would change almost nothing Intelligence: - “A mental ability that involves the capacity to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience” (Gottfredsen, 1997) 3 different types of measures: - Self-report: surveys, polls, questionnaires, interviews; measures operationalize variables by recording people’s answers about questions about themselves (happiness, pain, stress) - Observation: operationalization through Observable behaviours (happiness, stress, intelligence) Physical traces of behaviours (shortcuts, stress, garbage/recycling) - Physiological: operationalize variables by recording biological data may not be getting the most honest thing because they can’t control (happiness [smiling], stress [sweating], intelligence [using fMRI]) Measurement: scales of measurement - Categorical: nominal (names) - Quantitative: ordinal, interval, ratio The problematic Likert scale: - It is an interval scale Summary so far… - Measurement is complex - Conceptual definition - Operational definition - Types of measure, measurement scale - Then assess the extent to which their measure is a good measure of their construct’s - Reliability: the measure that produces consistent scoring overtime; Tester-retest reliability: do the same testing a day or so apart Inter-rater reliability: the rate at which 2 or more observers agree. 0.8 or higher to be highly reliable. (Cohen’s Kappa is the name of the indicator that measures the agreement between 2 raters or observers, accounting for the possibility of chance agreement) Internal reliability: the same question posed differently would produce the same results. Cronbach's alpha (best practice); split-half reliability (not as good) ❖ If Cronbach’s alpha is low, try factor analysis ❖ Why use so many items: with a single measurement, people can react differently and over/underestimate items. (9) assessing a measure’s validity: - Validity: Subject validity: face validity, content Empirical assessment: criterion validity Subject assessment of validity: face validity - The extent to which your measure is plausible measure of construct (way before people thought that based on face, personality could be predicted) CONTENT VALIDITY: - Refers to the extent of whether your measure covers everything it should be covering (10) Criterion validity: - Is the new measure correlated with something concrete outcome it should be correlated with (matching a standard) Predictive validity: the ability to measure or test to predict the future outcome (example: testing IQ (2024) to predict CGPA (2028)) Concurrent validity: the extent to which a particular test or measurement corresponds to those previously established measurements for the same construct - Alternate method: known-groups paradigm Example 1: Stress Example 2: Beck Depression Inventory Empirical assessment of validity 2: Convergent validity: - High correlation with other measures of the same construct ( high correlation between test about self-esteem and test about extroversion; 2 different measures yet same contruct) - BDI wasn’t the best measure to diagnose depression in old people; therefore, Beck et al. created a BDI-2 - Segal et al. (2008) on convergent validity of BDI-2 with CES-D in older populations Empirical assessment of validity: discriminant validity: - Choosing your discriminant validity measures (how well the test measures the concept it was designed to measure) Segal et al. on discriminant validity of BDI-2 Summary: - Measures must be reliable and valid - If not valid, then useless - If unreliable, then usually useless (an exception to the rule?) Survey research: Goals of survey research: - Describe the characteristics of a population - Describe and compare the characteristics of different populations or groups within a population - Describe population time trends - Describe relations among psychological variables - Test hypotheses, theories and models (11) Developing a good survey: - Convert research goals into a list of specific topics - Develop your items Be careful with your grammar or else there is no construct validity if participants answer without understanding - Pretest the questionnaire (if things go bad and some issues arise; then the entire experiment was a waste) - Revise the questionnaire Item formatting choices: - Open-ended items: they are any items that allow participants to express their opinions to you Main advantage: having more space to work with; if you don’t have answers to build a survey then this works too Main disadvantage: sometimes not good enough answers - Close-ended items: Main advantage: Options are already coded so no silly answers Main disadvantage: the depth is gone when open-ended items are taken away Multiple choice format: - Single response (forced choice) Political polls Narcissistic Personality Inventory (Raskin and Terry, 1988) Buss’s jealousy questionnaire (1989) - Multiple responses - Ranking scale Likert and likert-type scales: - The standard scale is 1: strongly agree to 5: strongly disagree - Anything else is a Likert-type - Advantage: This is more fine-tuned. More distinction and has higher sensitivity - Disadvantage: they could be interval scales but could also be considered ordinal Semantic differential scales Could ask in a way where having adjectives (opposites of each other) at the ends of the scales and then have participants rate it Visual analogue scales: Marking on a line is a way to rate. Interesting because no numbers are involved Scenarios: - Can be answered using MC format, forced choice, Likert, VAS, etc Example: Burnstein et al. (1994) on prosocial responding in everyday situations vs when a building is burning and forced to pick only 3 people to save Summary: - Choosing your item format is just the beginning - 2 more things to worry about Writing good items Encouraging accurate responding - Common wording issues Leading items: they lead you to give a certain answer (they are responses that are influenced by biased language); avoid adjectives Loaded items: force people to give an answer that isn’t true about them (example: where do you buy your makeup?; what if someone doesn’t drink beer); don’t make assumptions or take assumptions away from the question Double-barreled items: a question that asks about more than one topic and asks about 2 different issues while allowing one answer (just split up the questions) Questions that use double negatives: having questions with words like no or not with a word that has the prefix “un-” (example: was the room not untidy?); avoid this by pretesting Questions with grammatical and punctuation errors: write clear words that will make sense (avoid ambiguous words) Item order: Order 1: Which survey question is presented first: - Help women and then help racial minorities (most people preferred this) OR - Help racial minorities and then women (not a lot of people picked this) (11) Survey research: encouraging accurate responses - People adopt a consistent way of answering questions Yea- saying: always agreeing Nay-saying: always disagreeing Fence sitting: ❖ always neutral ❖ lowers construct validity because it prevents measuring what is intended to be evaluated ❖ if the survey is asking for sensitive or controversial questions then they don’t want to answer openly and honestly about it ❖ it also happens when questions don’t make sense or double-barrel questions - Reverse scoring: the numerical scale of rating runs in the opposite direction 5: strongly disagree, 4: disagree, 3: neutral, 2: agree, 1: strongly agree It makes people slow down and think about their response rather than giving biased answers Raw data Math - Someone who is in a response set doesn’t get extreme results as someone who has“differing” answers - It is only in certain surveys the neutral option should be available to maintain construct validity but sometimes Likert scale could occur where people are forced to choose one or the other Socially desirable responding or “faking good”: - This lowers construct validity because answers are inaccurate not measuring what it is intended to measure Solutions: Guarantee anonymity Bogus pipeline: tell something made up (I would know if you lie) to make people tell the truth Use items meant to uncover socially desirable responses (I’m always a good listener) Use filler items (hiding it with the real questions that measure and this allows people to answer without being cued about when they should lie) Unconscious measures Ask questions people can answer - People know what but don’t necessarily know why (Nisbett & Wilson, 1977) - People have biased recall of the past - Hindsight bias - Flashbulb memories (they know exactly what happened in a memory; vivid recollection) (Pezdek et al., 2004) (12) Observational research: Non-experimental research design 2 - Survey are mostly used in non-experiemental or correlational studies Basic characteristics of observational research: no IV just DV 3 approaches - Qualitative: no stats, just holistics (time consuming) - Quantitative: numerical measurements - Mixed-methods: mixes the quantitative and qualitative concepts Example: Boesch (1991) on maternal influence in learning the hammer/anvil technique Types of observational research: - Natural observation: passively observe behaviour (avoiding any direct involvement with the participants) (high ecological validity) undisguised: participants know they’re being watched disguised: more ecologically valid but kind of unethical - Participant observation: becoming part of the group, Undisguised: ethnography, Jane Goodall Pros: you can do things other than just observe Cons: Disguised: Riecken, Festinger, and Schachter (1956), when prophecy fails; Rosenhan (1973), on being sane in insane places Pros and cons: Rosenhan experiment was unethical but it was worth it as it allows progression - Structured observation: also known as analogue behavioural observation Example: AINSWORTH’S strange situation technique No variable in structured observation compared to a regular experiment Manipulating the environment without having a true IV - Sampling: Goal: obtain a set of representative behavioural data 4 ways of sampling ➔ Who: focal sampling (focus at one target at a time; great for a lot behaviours or behaviours that take a long time to manifest), scan sampling (good for less behaviours or quick to be manifested) ➔ Where: situation sampling (observing behaviour in places occurs depending on the situation) ➔ When: time sampling (what time of day) Avoid the following (if possible): - Causation In order to make causal claim, the research needs to: ❖ As association between X and Y ❖ Temporal precedence ❖ Internal validity - Observer bias: A type of confirmation bias (fail to notice things that can affect your results different that the researcher’s expectations; seeing things you expect to see not what is actually there) How to fix it? Keep people blind to your hypothesis - Observer effects: - AKA reactivity - How to avoid this? Unobtrusive (disguised) observation - Habituation - Measure behavioral route - Use archival data (crime stats, corporate files) Summary: - Observational research is a useful way to gather behavioural data High ecological and external validity But - No causal inference - Observer bias - Observer effects (13) Interpreting correlational research: correlation and causation Correlation: statistical association between 2 variables Correlational research: anything that isn’t correlation Bivariate: how 2 variables covary linearly Multivariate: multiple variables correlating simultaneously Example: Gerner and Gross (1979) Gerner worried that if you are watching a lot of TV and getting a lot of exposure to crime, you might think that you are at hieghtened risk of being a victim of violent crime - Afraid to go out - Unreasonable attitudes about how treat criminals Correlation does not establish causation: Causal inference: - Association between X and Y - Temporal precedence - Internal validity Third variable problem Spurious correlations Violent crimes and ice cream (seem correlated but the third variable in this situation is hot weather - The directionality problem: X cause Y? Or Y cause X? Can correlation data ever establish causation - Rule out some 3rd variables using partial correlations and multiple regression designs - Use patterns of results to help us draw parsimonious causal conclusions Assessing third variables: Testing for mediators: - Simplest way of doing this is by using partial correlation analysis (does Z change the relationship between X and Y?) - 2 types of mediation: Complete mediation: when taking Z into account, the relationship between X and Y is completely spurious correlated (conisdered nonsignificant) Partial mediation: X and Y are still correlated but very weak, until Z is taken into account (14) partial correlation example 2: relationship disappears when controlling for SES Multiple regression designs: - Can control for multiple variables at once - Step 1: choose your dependent variable - Step 2: choose your predictor variables Interpreting multiple regression result - “The relationship between X and Y is significant even when Z is controlled for” - “........Independent of Z” - “.......When Z is held constant” - “Not attributable to the third variable of poverty Does this mean we have found evidence of a causal relationship? - NO - We’ve found significant association between recess and problem behaviour while controlling for many other things Making a causal argument using pattern and parsimony: using the balance of the evidence Parsimony and Abelson (1995): - Correlations between smoking and lung cancer in humans are significant and positive - Non-human animal experiments demonstrate a causal relationship between smoking and lung cancer Method: 1) Specify a mechanism for the causal path 2) Test predictions based on that mechanism 3) Pattern of results will either support or fail to support Experimental design: part 1 single-factor experiments (oct 21, 2024) Elliot et al. (2007): - 2 broad ways students approach learning Approach (trying to succeed) vs. avoidance (trying not to fail) learning orientations Red should cue an aviodance orientation (biohazard sign, stop sign, red pen used for marking) Should lead to reduced cognitive performance compared with other colours IV: participant ID ink colour (red vs. black vs. green) DV: anagram performance Experimental control It was found that participants with participant ID codes written in red performed lower than the other colours Logic of the experimentation: 1) Come up with a hypothesis and prediction(s) 2) Manipulate one or more independent variables 3) Choose one or more dependent variables 4) Regulate (control) other aspects of the research environment Experimental control: In order to make causal claim: 1) Association between X and Y 2) Temporal precedence 3) Internal validity: allows to rule alternate explanations Potential confounding of an IV More about confounds: Environmental factors: Hold lab environment constant (same task, same lab assistant, same lighting, same temperature, same outlook of the room) Distribute other potential confounds across conditions (random assignment is a solutions for this) ➔ Multiple lab rooms ➔ Multiple experimenters ➔ Timing Participant factors: Distribute differences across conditions (random assignment) everyone is different (easy way to solve is by random assignment) Independent variable considerations: Type: Quantitative Qualitative Number of levels: Personal preference (a lot of things in research happens bc of researchers preference) Resource availability Your research question Deciding on a control group: - How to create a control group? (in Asche et all paper) 1) Presence (having people to work with) vs. absence (all alone, no social presence) 2) Presence vs. neutral (temperature) - Is a control group necessary? Depends on the question Experimental design: part 2: single-factor, between-groups design Advantages of between-groups designs: (basically it is not a with-in group design) - No carryover between conditions (carryover effect: when the effects of one condition impact the behavior of participants in subsequent conditions) - No need to create equivalent versions of the same task - Low concern re.giving away your hypothesis (the higher the level of IV, the higher the chances of participants guessing the hypothesis) Disadvantages of between-groups designs: - Can take a lot of resources (time, money, more lab availability) - Creating equivalent groups Can never be as equivalent as in a within-groups design Extra variation makes it harder to find effects than when using a within-groups design When to use a between-groups design: - Your research question demands it (shoe tying method experiment; once you’ve learned how to tie shoes, you can’t unlearn that) - Your experiment is too demanding to do within-groups - You want to use a between-groups design (Oct 28) Creating equivalent groups: really be able to claim your IV caused change in your DV - Random assignment - Block randomization - Matched groups - Natural groups (non-equivalent by nature) Assignment to conditions 1: random groups designs - Random assignment distributes differences - Distributes natural occurring differences between participants to those different levels of IV - Ways to improve the odds of equivalency across groups Exploit the laws of probability by using a large (n) If we don’t have big enough (n) then random assignment wouldn’t work properly (doing an experiment with 100 year olds; there might not be enough 100 year olds in a certain area) Use block block randomization to ensure that a bad result for level of IV doesn’t happen by chance (Elliot et al experiment and if people perform low cognitively on monday then we want to make sure we don’t run the whole red group on monday) Assignment to conditions 2: block randomization Imagine you’re having a party and you want to play a game with two teams: Team Red and Team Blue. You want the teams to be equal, so there are the same number of players in each team. Instead of just randomly picking people one at a time (which might make one team bigger than the other), you first make small groups, or "blocks," with an equal number of players for each team. Then, within each block, you shuffle and assign players to Team Red or Team Blue. For example: You have 4 kids in a block: Anna, Ben, Carla, and David. You shuffle them and assign: Anna and Ben to Team Red, Carla and David to Team Blue. Then you do the same for the next block, and so on. This way, at the end of the game, Team Red and Team Blue will have the same number of players, and it’s all fair! Assignment to conditions 3: Matched groups design - Used when you have small (n) - Use matching variable - Imagine you and your friends want to race scooters. Some of your friends are super fast, and some are a little slower. To make the teams fair, you pair up friends who are similar in speed and then split them into two teams. - Why not always to use match groups?: Practicality Tipping of your participants Assignment to conditions 4: natural groups design - Used with quasi-experimental research (something you are interested can’t be randomly assigned) - 2 ways to set this up: Purely correlational: subject variable(s) plus DV Partly experimental: subject variable(s), IV(s), plus DV How many times to measure the DV? 2 options: - Post-test-only design - pre-test/post-test design Summary: between groups designs - Main concern is creating equivalent groups - Plain old random assignment is usually good enough - But if unsure, other options available - Main decision otherwise is whether to use pre-test/post-test design Experimental design: part3: Single factor, within group designs Key advantages: - You have equivalent groups - Require fewer participants to obtain same amount of data per condition Advantages of within groups - Time-saving benefits - Special populations - Can answer questions between-groups design can’t Disadvantages of with-groups: - Will participants figure out your hypothesis? Check by using post experimental interviews - Order effects (participants responses are affected by the order in which they are exposed to the levels of the IV) 2 types of order effects: 1) Progressive order effects: participants are changing their responses due to cumulative exposure to prior conditions (overtime they get tired, bored, they get practice; having exposure to one level of IV can effect their response to the second level of IV) 2) Carryover effects: participants in one condition are uniquely influenced by the condition before them (if they are in previous level of IV it could have a unique impact on the next level of IV) Solution to order effects: - Counterbalancing: Imagine you and your friend are testing two ice cream flavors: chocolate and vanilla. You want to find out which one tastes better. If you try chocolate first, then vanilla, you might think the second one tastes better just because it’s the last one you ate. To make it fair, your friend tries vanilla first, then chocolate. This way, you switch the order so both flavors get a fair chance, and you can know which one is truly the best. - Balance out order effects, essentially cancel them out - Analogous to random assignment (trying to distribute order effects across the levels of the IV) Types of counterbalancing: Exposed to levels of IV once: 1) Complete conterbalancing 2) Latin square design 3) Random-selected-orders design Exposed to levels of IV twice + 1) Block randomization 2) Reverse counterbalancing Complete counterbalancing: - Definition: arrange the levels of the IV before we see participants into every possible order and then we will randomly assign an equal number of participants to each of these possible orders (A followed by B; B followed by A) - Main advantage: completely counterbalancing so completely canceling out order effects - Main disadvantage: unwieldy - Recruitment and running (4 levels of IV; 24 orders but need to run 30 participants so need to run 48 participants) Latin square design: - A type of partial counterbalancing - What is it?: Imagine you and three friends (4 kids total) are playing with 4 different toys: a car, a doll, a ball, and a puzzle. You want to take turns so that everyone gets to play with each toy, but you also want to make sure you all play with the toys in a different order - For a 3 order counterbalancing, you should make a 3x3 matrix; for 4 order counterbalancing, you should make 4x4 matrix - Each order means each participants - Advantages compared with complete counterbalancing Recruitment - Disadvantages compared with complete counterbalancing Partial counterbalancing Solution? Randomly selected orders design: - Another way of engaging partial counterbalancing - Each participants would get randomly selected to order of IV - Only time to use this is when the (n; participants) is really high - Higher the (n), the likelier the chance to cover a good sample of all of these different orders and helps cancel out order effects Exposing participants to all conditions more than once: - 2 main methods - More dependable method is block randomization - Less dependable method is ABBA counterbalancing (Nov 4) - Linear effect is getting slightly tired and bored or practiced with each trial each trial Summary: - Single-factor designs are the simplest types of between and within groups designs - Between groups designs suffer from concerns about equivalent groups (we’ve learned ways for dealing with those concerns) - Within groups designs suffer from concerns about order effects Experimental design: Part 2: factorial designs Factorial designs: design more than one IV Three types: - Between-groups factorial design - Within-groups factorial design (your IV is being being run by subjects) - Mixed factorial design (mix of between and within) Labelling these designs - 2x2, 2x3, 2x2x3 - Number of numbers tell you the number of variables - The numbers themselves mean how many levels of IV Benefits of using a factorial design: 1) Interactions: does the effect of one variable (on our DV) depend on the effect of some other variable? (do 2 IV interact to produce impact on the DV?) - Types of interactions: Crossover interaction (people preferring hot or cold food; the result being an X; so depending on the food; where people prefer hot food when it is hot but prefer ice cream when it is cold) Spreading interaction: depending on if the owner has a treat ➔ IV 1: what the owner says ➔ 2: whether owner has a treat ➔ 3: probability of sitting 2) Generalizability: does the findings replicate with different kinds of people? In different situations? Example: Strayer and Drews (2004) braking time while using the phone on the road Testing for moderators: that changes how 2 other variables are related If there’s no interaction, there is generalizability but if there is interaction, there is moderators 3) Theory testing: - Set of statements how variable interact (and impact one another) - Wansink’s (1996) research on the relationship between product packaging and amount used - Tested the “low price” theory using a 2x2 between-groups factorial design - Spreading interaction for Wansink’s theory Summary: 3 benefits of using factorial design - Interactions: does the relationship between X and Y depends on Z - Generalizability: does z moderate the relationship between x and y? - Theory testing: what does the relationship between x and y mean? How to interpret factorial results? 2 types of effect - Main effects: The effect of a single independent variable on the dependent variable, while ignoring all other factors. There is always one main effect for each independent variable. - Interaction effects: the relationship between IV and DV change depending on a second variable (mediating variable) (e.g: whether you prefer hot or cold foods based on what food it is) Detecting main effects from a table: marginal means - Find the average of the means - Just get the difference between 2 means Describing results in words: main effects - Explain which main effects are significant, and which are non-significant Describing results in words: interaction effects - Describe what happens to level 1 of IV1 across the levels of IV2 - Repeat for level 2 of IV1 - Or vice versa Describing main effects when there is a significant interaction: - Wording:”these significant main effects were qualified by an interaction………..” Adding complexity: - You can add levels - You can add variables (contributes to exponentially increase numbers of effects when describing data) - Adding an IV: in a 3-way design (2x2x2): Three main effects Three 2 way interactions: ➔ IV1 x IV2 ➔ IV1 x IV3 ➔ IV2 x IV3 One 3 way interaction: IV1 x IV2 x IV3 Marginal means: Describe a three-way interaction? Describe main effects Main effects are qualified by any significant 2-way interaction effects → describe 2-way interaction effects are qualified by a significant 3-way interaction → describe (Nov 11): experimental design 3: threats to internal validity and how to avoid them Threats that are addressed by using proper experimental design and control 1) Design confounds: ex: varying the design and the colour of an ID number (Elliot et al) 2) Selection effects: failing to create equivalent groups using some form of random assignment or matched group (this is between-groups experiment) 3) Order effects: fail to counterbalance the order of your conditions in within-groups design 4) Experimental demand 5) Experimenter expectancy effects Experimental demand: (Orne, 1959;1962) - Demand characteristics of an experiment give participants clues about the researcher’s hypothesis (gets the participants to change their behaviour to what they believe the experiment is about and what is expected of them) - Participants are good at picking up on these and alter their behaviour to usually support the perceived research hypothesis - Participants want to be “good subjects” Orne and Scheibe (1964): Sensory deprivation research and demand Ways to assess and address demand: 1) Pilot testing: try it out with a small group of people 2) Suspicion probes: ask about participants opinion (probe participant about any suspicions we have) 3) Statistics 4) Manipulate awareness of hypothesis (and then more stats!) (so up to you to either tell the true hypothesis or completely opposite of it) 5) Cover stories (use deception to make sure participants think the research is about something else) 6) Psychological realism (particpants were really involved that they couldn’t think about anything else) 7) Unobtrusive (and similar) measures; reaction time, physiological responses 8) Between groups design: give far fewer groups than within groups design (Nov 13) Experimenter Expectancy effects: - Change the experiment based on their beliefs - They fall prey into their own biases - Provide these cues that participants pick up which causes change in participants behaviour during the experiment Intons-peterson (1983): - If i tell different experimenters different hypotheses, will I get different outcomes - What experimenters thought was happening IV (within groups): prime (imaginal [imagination] vs perceptual [you just perceive it] vs. control) DV: reaction time to target image - What actually happened? IV: (between groups): experimenter expectancy (imaginal priming is “better” vs perceptual priming is “better”) DV: reaction time to target - Assistants that had the expectations of imaginal priming images having better results read the instructions slower and in a way that was more understanding compared to perceptual priming images that weren’t read with the instructions like that Solutions? - Rigorous training (know their script and read it well) - Automated procedures and instructions - Keep experimenters blind to hypothesis and/or condition Avoiding internal validity threats 2: threats solved by using appropriate comparison groups 1) Maturation: occurs when you’re worried your testing has spontaneous change rather than reflecting your IV. How do you know whether maturation might be responsible for your effect? (ex: depression and 12 week therapy) 2) History: something historically happened externally to the participants which impacts the DV (september vs november electric bill example) 3) Regression to the mean: the scores of individuals on the dependent variable may not only be the due to the natural performance of those individuals, but also measurement errors (or chance) (an A student getting F just because of all the random bad things happened at the same time). Be concerned when time 1 and time 2 group, time 1 scores are extreme (something happened to change study of people) (extreme scores being regressed towards the average score) With maturation we aren’t concerned about extreme scores 4) Placebo effects: What are they? They are effects that make people to believe they are getting better through something that actually does nothing; people get better pretty much because people wholeheartedly believe it works Your treatment needs to work much better than the placebo Use a double blind placebo control design to rule out placebo effect 5) Testing threats: What is a testing threat? a threat in pre-post designs where subjects' posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam. Not conducting a pretest can help avoid this threat. Difference between order effects and testing threat is that with-in groups you say order effect and for between-groups you say testing threats How to deal with testing threats? - Best thing to do is dump the pre-test - Use alternate forms (version A in time 1 and version B in time 1) - Use a control group 6) Instrumentation threats: what are instrumentation threats? You believe something has changed but it hasn’t; only the instrument has (AKA instrument decayed) You believe you lost weight but actually the weight scale springs have loosened up and is telling the wrong weight Solutions: - Train coders well!! - Ensure that alternate forms are equivalent - Counterbalance alternate forms across pre- and post-test Summary: internal validity threats - In order to make a causal claim, you must deal with many internal threats - Main solutions Always use a comparison/control group Design an excellent experiment (think about demand characteristics of the experiment) Keep participants and researchers blind to hypothesis/condition Null results: experimental design 4 Four main reasons people get null results 1) Confounding variables are obscuring the effect 2) There is not enough variability between conditions that stats can’t pick it up 3) There is too much variability within conditions 4) There really was no difference between groups Types of confounds: 1) Reverse confounds: a potential bias in studies that occurs when the probability of an outcome is causally related to the exposure being studied. Confounds can both produce results and obscure them Solution? Do not let there be confounds 2) Low variability between conditions If IV is too subtle then you will have difficulty finding it to be statistically significant effect ➔ Manipulate forcefully enough so it does what it is supposed to do (people in the high stress group should be stressed enough and people in the low stress group should not be stressed) DV was insensitive (you want to have enough units to see if there is a change present or not) Floor or ceiling effects (ex: if everyone scores D or F on a test, thats floor effect; if everyone got an A on a test thats ceiling effect) Solutions? - Pre-test your IVs and DVs - Use manipulation checks 3) High variability within conditions: - High measurement error (DV is not precise enough) - High variation in participant characteristics (exclusion criteria needs to be used to solve this issue) - Lots of situation noise (the field experiments where noises around can distract participants) 4) There really was nothing to find Summary: - When is a null result really a null result? When you are the first person to test a concept and get null result, really as the question did I mess up a step in the experiment before giving up - Null results aren’t all that bad - Can null results be published? Pre-registration Open science Good science Questions to ask: - What is convergent validity? - Ask to explain this - Ask about partial correlations - How to calculate how many participants are required for a factorial design (both in between groups and within groups design) - Explain the concept of block randomization - Requirement to make a causal claim (covariance, internal validity and temporal precedence) (is this true?) - Tutorial: - The third variable issue leads to correlation research - Confound leads to experimental research - Criterion validity is the idea of having your survey match a standard (something concrete in the world) compared to convergent validity is the idea of having 2 different measures yet similar construct (self esteem and extroversion) - Discriminant validity: how well the test measures the concept it was designed to measure - Counterbalancing distributes order effects, solution to practice effect - Textbook notes: Chapter 1: Authority: - We place our trust in someone else who we think knows more than we do (when we were young, we trusted our parents; as adults we trust doctors, etc) - The scientific notion rejects the notion that once can accept simply on faith that the statements of an authority figure are true. We need good quality evidence and skepticism before any conclusions - Fundamental characteristic of scientific method is empiricism: gaining knowledge based on structured, systematic observations of the world Universalism: one group can use empiriscism and publish a research and draw one conclusion while another group can disagree and publish their own research that reaches a different conclusion and research from both sides can be evaluated objectively by others. (scientists try to find rules or ideas that are true for everyone, everywhere) Communality: methods and results are shared openly. Replications help to ensure that the effects being reporterd are not rhe results of chance, false positives, scientific fraud or something else. Disinterestedness: scientists should be motivated by an honest and careful quest for truth, not fame, ego or personal gain Organized skepticism: it is organized with the practice of peer reviewing. Scientists should be critical of work even if it supports their ideas and also be open to researching contradicting their own ideas Science as a way to ask questions and gather evidence: - God existing is not an uninteresting question, it is just not a falsifiable empirical question - Pseudoscience uses scientific terms to make a claim look more compelling scientific, but actually falls short of using scientific methods Four goals of science: - Describe behaviour: understand the phenomenon (MDD example); what it looks like; when and how often it occurs and for whom - Predict behaviour: when the behaviour will occur or not (if 2 events have been related consistently, it becomes possible to make predictions about when an event might occur and anticipate it - To determine the causes of behaviour: covaritation of cause and effect: when the cause is present, the effect occurs but if the cause is not present then the effect does not occur The cause must come before the effect in time (in the multitasking example it is called temporal precedence) Nothing other than the causal variable can be responsible for the observed effect (this is called ruling out) - To understand or explain behaviour: why the events and behaviours occur - Basic research: answer the fundamental questions about the nature of the behaviour; often focuses on testing theories rather than developing a specific application - Applied research: conducted to address practical problems in the real world and often to propose a potential solutions. Chapter 2: Observing the world around us: - Serendipity: the most interesting discoveries being the result of accident or sheer luck (Ivan Pavlov’s classical conditioning) - Practical problems: the existence of real-world problems can trigger an idea for a research idea (which can further help solve day-to-day problems (HM’s brain surgery helping to understand basic memory processes) - Theory: an organized system of logical ideas that are proposed to explain a particular phenomenon and its relationship to other phenomena (serve 2 functions in science: organize and explain a large number of previous data; theories also help generate new knowledge by pointing us in a direction where we can look to discover novel aspects of behaviour) - Chapter 6: Costa and McCrae (1985) developed the NEO personality inventory (NEO-PI) to measure 5 broad personality traits: - Openness to experience - Conscientiousness - Extraversion - Agreeableness - Neuroticism (another name: inverse, emotional stability)