Summary

This guide provides a comprehensive overview of research methods in psychology. It covers key concepts like validity, reliability, and different types of sampling. The guide discusses important aspects of research design.

Full Transcript

GOOD HYPOTHESIS Logical: follows from premises that were derived from a theory ● Empirically Testable: all variables can be observed (measured) ● Refutable: able to be shown to be false ● Positive: proposes the existence of something ● Specific: generates testable predictions for specific situations...

GOOD HYPOTHESIS Logical: follows from premises that were derived from a theory ● Empirically Testable: all variables can be observed (measured) ● Refutable: able to be shown to be false ● Positive: proposes the existence of something ● Specific: generates testable predictions for specific situations ● MEASUREMENTS Measurement: a systematic procedure for assigning scores or values to individuals so that the scores or values represent some characteristic of the ● individuals Much of what we want to know about in psychology is not directly observable, but can be measured indirectly in other ways ● Constructs are the ideas we care about which cannot be observed directly ○ Variables are the things we can measure which tell us about constructs ○ The way you choose to measure a construct in a particular study is called an operational definition ● The same construct can be (and often is) operationally defined differently in different studies! ○ TYPES OF MEASUREMENT Self report: participant reports on their own thoughts, feelings, or behaviors ● Behavioral: researcher observes and records some aspect of a participant’s behavior ● Physiological: researcher measures some part or aspect of a participant’s body ● WAYS TO ACCESS VALIDITY Construct validity: did I measure what I meant to measure? ● Internal validity: can I make a causal claim? ● External validity: do my results generalize? ● Statistical validity: how well does the data support my claim? ● CONSTRUCT VALIDITY Subjective assessments of construct validity ● Face validity: appears valid ○ Content validity: represents all parts of a construct ○ Empirical assessments of construct validity ● Convergent validity: relates to other measures of the same construct ○ Discriminant validity: does not relate to measure of a different construct ○ Predictive validity: predicts a relevant outcome ○ CONSTRUCT VALIDITY FOR INDEPENDENT VARIABLES How well was your independent variable (IV) manipulated? ● Manipulation check: extra dependent variable (DV) added to check if the experimental manipulation worked as intended ○ Pilot study: simple study with a separate group of participants to confirm the effectiveness of an experimental manipulation ○ ADDITIONAL CONSIDERATIONS Reliability: How consistent is your measure? ● Test-retest reliability ○ Interrater reliability ○ Internal consistency ○ Measurement artifacts: Were your measures affected by the circumstances or design of the study? ● Experimenter bias ○ Demand characteristics ○ Socially desirable responding ○ Range effects (ceiling, floor) ○ DIRECTIONALITY PROBLEM (TEMPORAL PRECEDENCE) When two variables are related, it is unclear which variable affects (causes a change in) the other ● THIRD VARIABLE PROBLEM (INTERNAL VALIDITY) When some third variable can explain the observed relationship between two variables ● POPULATIONS VS SAMPLES Population: group of interest ● Sample: individuals observed ● Representative sample: a sample that closely mirrors or resembles the population ○ Biased sample: a sample that differs in important characteristics from the population, often due to selection bias ○ TWO KINDS OF SAMPLING Probability sampling ● Every member of a target population is identified ○ Every member has a certain non-zero probability of being selected ○ Selection is random, based on probabilities of being selected ○ Nonprobability sampling ● Sampling in which one or more of the three above criteria are not met ○ TYPES OF PROBABILITY SAMPLING Simple random: all members have an equal chance of being selected ● Cluster: identify pre-existing clusters, randomly select some of the clusters, sample everyone in the selected clusters ● Systematic: put members of a population in an order, pick a random starting point, and then choose every nth membeR ● Stratified random: identify pre-existing groups, sample all of the groups equally ● Proportionate stratified random: identify pre-existing groups, sample all of the groups proportionately ● TYPES OF NONPROBABILITY SAMPLING Convenience: recruit participants who are easily accessible ● Quota: identify pre-existing subgroups, recruit a specific number of participants from each group (using a non-random selection method) ● Snowball: ask participants to help recruit more participants by asking people they know to also participate ● Purposive: select participants on the basis of some characteristic they share (using a non-random selection method) ● SAMPLING METHOD IMPACTS EXTERNAL VALIDITY Sampling method: how is your sample selected from your population? ● Question of external validity ○ Sample size: how many people do you sample? ● Question of statistical validity ○ Random sampling: sampling method in which you identify all of the members of your population and select a random subset ● Question of external validity ○ Random assignment: researchers place participants into groups by random ● Question of internal validity ○ CENTRAL TENDENCY A key component of describing a distribution is to indicate where it is “centered” ● Three common measures of central tendency are… ● Mean = the “balancing point” of the distribution = arithmetic average ○ Median = the central observation = 50th percentile ○ Mode = the most frequent value ○ MEASURES OF SPREAD Variability: the degree to which scores in a distribution are spread out or clustered together ● The goal here is to capture how much scores deviate from the mean ○ Deviation: difference between an individual score and the mean ● Sum of squares (SS): the sum of all squared deviations ● Variance (s2): mean of the squared deviations ● Standard deviation (s): square root of the variance ● Mean > Median → Positively Skewed ● Mean < Median → Negatively Skewed ● Ordinal variables → cannot find mean or standard deviation ● Nominal variables → cannot find median, mean or standard deviation ● ESTIMATION An estimator is a process that generates an estimate of a population parameter ● Population parameters are represented with Greek letters (mu, sigma, etc.) ○ An estimate is the actual numerical guess of what the population parameter is ● Sample statistics are represented with Roman letters (X, s, etc.) ○ SAMPLING ERROR Estimates of a parameter vary from sample to sample ● Sample size (n): the number of participants in a study ○ Sampling error: difference between the sample statistic and the true population parameter ○ Law of large numbers: the larger the sample, the more representative it is of the population, and so the less sampling error we expect ○ DISTRIBUTION TRIAD Population distribution: distribution of all possible scores from all possible individuals in the population ● Sample distribution: distribution of observed scores measured from one sample of observed individuals ● Sampling distribution: distribution of all possible values of a sample statistic measured from all possible samples of size n ● CENTRAL LIMIT THEOREM (CLT) The mean of all sample means (μx̄ ) is equal to the population mean (μ) ● The standard deviation of all sample means (σx̄ ) is smaller than the population standard deviation (σ), and gets smaller as the sample size (n) increases ● The distribution of sample means (x̄) is roughly normal when the sample size (n) is large, even if the population is highly non-normal ● CHARACTERISTICS OF SAMPLING DISTRIBUTIONS OF MEANS Sampling distribution of sample means always has a mean equal to μ (population mean), no matter the shape of the population from which the samples ● are drawn As sample size (n) increases, the standard deviation of the sampling distribution decreases by a factor of the square root of n ● The sampling distribution will be normally distributed IF the population is normally distributed or the sample size is sufficiently large (>~30) ● NORMAL DISTRIBUTIONS All normal distributions share this function, but they differ according to 2 parameters: mu and sigma ● Z SCORES Z-score: a representation of a score’s deviation from a mean in terms of standard deviations ● Sign (+ or -) indicates whether the score is above or below the mean ○ Value indicates the number of standard deviations the score is from the mean ○ A linear transformation preserves the relative position of the scores, while changing the center and the scale of the distribution ● Z = (x - μ)/σ ● USING Z SCORES Describing scores in distributions… with a single number ● We can tell whether a score is high, low, or average from just a z-score! ○ Equating and rescaling entire distributions ● Mean: always 0 ○ Standard Deviation: always 1 ○ Shape: same as original distribution ○ Making scores from non-equivalent distributions comparable ● A z-score of 1.2 has the same meaning, no matter what the original distribution ○ Z-scores are “distribution-free units” ○ HYPOTHESIS TESTING Hypothesis test: a statistical method that uses sample data to evaluate a hypothesis about a population ● Based on the probability of observing some sample statistic if a certain hypothesis is assumed to be true ○ Null hypothesis (H0) ● In the population there is no change, no difference, or no relationship ○ For an experiment, this means no effect of treatment ■ Any difference observed is due to sampling error only ○ Alternative hypothesis (H1) ● In the population there is a change, a difference, or a relationship ○ For an experiment, this means an effect of treatment ■ Any difference observed is due to sampling error AND a real effect ○ UNLIKELY ENOUGH TO REJECT H0? Alpha: the probability value that we use to determine which sample outcomes are considered very unlikely if the null hypothesis is true ● This probability level is chosen by the researcher! (does not have to be 0.05) ○ Critical region: the region of the sampling distribution that contains the sample outcomes that are considered very unlikely if the null hypothesis is true ● This area depends on: what alpha is and whether the test is one- or two-tailed ○ Critical value: the value(s) that define the boundaries of the critical region(s) ● These values depend on: what alpha is and whether the test is one- or two-tailed ○ TEST STATISTICS Z (z-test): compare sample mean to population mean (sigma known) ● t (t-test): compare 2 means or comparing sample mean to population mean (sigma unknown) ● F (ANOVA): compare 2+ means ● r (correlation): evaluate relationship between 2 quantitative variables ● X2 (chi-square): evaluate relationship between 2 categorical variables ● P-VALUE (PROPORTION MORE EXTREME VALUE) A p-value is a way of describing how extreme a score is in a distribution. A p-value is the proportion of a distribution more extreme than a given score ● CHARACTERISTICS OF A TRUE EXPERIMENT Manipulation: ● The experimenter actively changes some quantity or quality of an independent variable in order to observe the subsequent effect it has on a ○ dependent variable Establishes temporal precedence ○ Control: ● Alternative explanations are eliminated by controlling confounds (any other variables that are systematically different across conditions) ○ Improves internal validity ○ DIRECTIONALITY PROBLEM (TEMPORAL PRECEDENCE) When two variables are related, it is unclear which variable affects (causes a change in) the other ● THIRD VARIABLE PROBLEM (INTERNAL VALIDITY) When some third variable can explain the observed relationship between two variables ● KEY TERMS IN EXPERIMENTS Independent variable (IV): ● Any variable that the researcher intentionally manipulates (i.e. actively changes) across conditions. The “cause” ○ Dependent variable (DV): ● Any variable that the researcher measures as an outcome of the study. The “effect” ○ Extraneous variable: ● Any variable in the context of the study that has some relationship to the DV but is not an IV or DV ○ Confound: ● An extraneous variable that is correlated with levels of the IV (i.e., different in different conditions) and can provide an alternative ○ explanation of the results A confounded experiment lacks internal validity ○ COMMON SOURCES OF CONFOUNDS Environment: setting or context differs across treatment conditions ● Example: crowding, concentration, & temperature ○ Individual differences: assignment to conditions results in groups with different personal characteristics ● Example: pool temperature, endurance, & swimming experience ○ Time-related: treatment conditions occur at different times and experience over time causes a change in the dependent variable ● Example: cell phones, safe driving, & practice ○ METHODS OF CONTROLLING CONFOUNDS Randomization: use a random assignment process to avoid a systematic relationship between the potential confound and the conditions of the study so ● that differences are due only to chance Often used to control for individual differences, sometimes for time-related confounds ○ Hold constant: do not allow the potential confound to vary at all across participants or conditions of the study ● Often used to control environmental confounds, sometimes individual differences ○ Matching/counterbalancing: ensure that the average value of the confounding variable is the same across conditions of the study by matching ● participants or counterbalancing materials Often used to control time-related confounds, sometimes for individual differences ○ DESIGN CHOICES Between-subjects design: each participant does only one condition of a study ● Eliminates all passage-of-time confounds because each person is measured only once ○ ○ Introduces individual differences as a potential confound that must be controlled because differences between people in the groups can provide alternative explanations of the results Within-subjects design: each participant does all conditions of a study ● Eliminates all individual differences confounds because each person is compared to themselves ○ Can introduce time-related confounds because each person is measured more than once, so changes between conditions can result from ○ alternative explanations GENERAL FORM OF T-STATISTIC t = (sample statistic - population parameter) / estimated standard error of statistic ● t-tests look at the ratio of differences in 2 means to overall error ● TYPES OF T-TESTS One sample: used to compare the mean of one sample to the mean of a population ● Tells us how extreme the observed sample mean is relative to all the possible sample means we could have observed if the null hypothesis ○ were true Similar to a z-test ○ Only difference is whether the population standard deviation is known (z-test) or must be estimated from the sample data (t-test) ■ Dependent measures: used to compare the means of two conditions in a within-subjects or matched-pairs design ● Tells us if the observed mean difference is significantly different from zero (a.k.a population mean) ○ Independent measures: used to compare the means of two groups in a between-subjects design ● Tells us if the observed difference of means is significantly different from zero ○ WHY DO WE USE DEGREES OF FREEDOM (N-1) When n deviations from the sample mean are used to estimate variability in the population, only n-1 are free to vary ● Because of the restriction that the sum all deviations must equal to zero ○ Only n-1 sample deviations supply information for estimating variability ● Not doing this adjustment would cause an underestimate of variability in the population ● WHAT IS ANOVA? ANOVA stands for Analysis of Variance ● Analysis here means “breaking apart” or attributing —breaking apart the different sources of variance in a population ○ ANOVA looks at the ratio of variance between 3+ groups to variance within groups (due to random error) ● Test statistic for ANOVA is F ● H(0): all means are equal ○ H(1): not all means are equal; at least one group is different from the others ○ F(df treatment, df error) = F ratio, p = p-value ○ ANOVA does not tell us which groups are different and how ● Rune poc-hoc tests when the ANOVA is significant ○ DIFFERENT TYPES OF ONE-WAY ANOVA One-way (between-subjects) ANOVA ● One factor with three or more levels, where participants are in only one of the conditions ○ Example: (1) placebo, (2) old medication, (3) new medication for treating depression ○ One-way repeated measures ANOVA (within-subjects) ● One factor with three or more levels, where participants are in all of the conditions ○ Example: people taste and rate all 4 different kinds of wine ○ FACTORIAL NOTATION Written out like: # x # x # … ● Number of factors (IVs) = number of terms in the expression ○ Number of levels in each factor = the specific value of each term ○ Number of conditions = the product of the terms ○ Example: 2 x 2 ● 2 IVs, each with 2 levels = 4 conditions ○ Mixed factorial design ● One or more of the factors is/are manipulated between-subjects, and one or more is/are manipulated within-subjects ○ Each participant experiences more than one but not all conditions ○ STATISTICAL EFFECTS IN FACTORIAL DESIGNS Main effects: the effect of one factor on average across all levels of the other factor(s); difference between marginal means ● Simple effects: the effect of one factor within a level of another factor ● Interactions: the effect of one factor depends on the levels of the other factor(s); difference between simple effects ● Parallel lines indicate NO interaction ○ CHI-SQUARE TEST OF INDEPENDENCE When to use chi-square test ● ANOVA and t-tests: categorical predictors (x), continuous outcome (y) ○ Chi-square test: categorical predictor (x), categorical outcome (y) ○ Cannot compute a mean (or standard deviation) of a categorical variable ■ Compares observed frequencies to expected frequencies ● If they are “close enough”, the test statistic is small and the null is retained ○ If they are “different enough”, the test statistic is large and the null is rejected ○ Calculating p-value ● Probability is drawn from a chi-square distribution ○ Shape of distribution is affected by degrees of freedom, df = (r-1)(c-1) ○ CORRELATION Correlation: continuous predictor (x), continuous outcome (y) ● Not interesting to compare means of two different variables ○ Tells us how well the data fits the line ○ Assesses how consistently a change in x predicts a change in y ● Look at whether two continuous variables covary ○ When one variable deviates from its mean, the other variable should deviate in the same or directly opposite way ■ Covariance vs Variance ● Covariance = mean squared cross-product (sum of products over df) ○ Variance = mean squared deviation (sum of all squares over df) ○ Pearson’s correlation coefficient (r) is a standardization of covariance ● Degrees of freedom (df): n-2 ○ LINEAR REGRESSION Linear regression: use continuous x to predict continuous y ● Tells us what that correlation line actually is (and more) ○ y = mx + b ○ Least squares: an approach to fitting a model (line) to data where the sum of the squared distances to the data is minimized ● Criterion: Y(i) = b(0) + b(1)*X(i) such that Σ[Y(i) −Y (i)]squared = min ○ Residual error: an individual’s residual is the difference between that individual’s observed value and the value predicted by the model ● Intercept = the expected value for y when x is zero; b(0) ● Slope = the expected change in y for each 1-unit change in x; b(1) ● Regression coefficient vs correlation coefficient ● The slope is unbounded because there is no limit on how much larger the sum of the products can be relative to the sum of squares x ○ The correlation coefficient is bounded between 1 and -1 because the covariance can never be larger than the square root of the product of the ○ variances on x and y CONFIRMATORY VS EXPLORATORY RESEARCH Confirmatory research ● Priori hypotheses, data independent, hypothesis testing, p-values interpretable ○ Exploratory research ● Post hoc hypotheses, data contingent, hypothesis generating, p-values not interpretable ○ Presenting exploratory as confirmatory increases publishability of results at the cost of credibility of results ● QUESTIONABLE RESEARCH PRACTICES (QRPs) Underreporting ● Problem: including multiple DVs but only reporting DVs that support hypothesis ○ Why is this a problem? ○ Familywise error rate (FWER): probability of making one or more false alarms when performing multiple pairwise comparisons ■ Multiple comparisons lead to “alpha escalation” ● Misrepresents exploratory research as confirmatory ■ P-hacking ● Problem: researchers make decisions during data analysis that lead to statistically significant results ○ Exploiting “research degrees of freedom” ■ Why is this a problem? ○ Goal of data analysis should be determining how well the data support the hypothesis, NOT obtaining a significant p-value ■ HARKing ● Problem: hypothesizing after the results are known ○ TRANSPARENCY SOLUTION Methods ● Researchers are expected to disclose every study detail ○ How they determined their sample size ■ All data exclusions (if any) ■ All manipulations ■ All measures ■ Readers are better able to evaluate strength of the evidence ○ Results ● Researchers publicly share data files, how they prepared the data, how they computed composite scores ○ Helps address underreporting and p-hacking ○ Hypotheses ● Preregistration: ○ Document decisions in advance and post to public repositories with time stamp ■ Antidote to HARKing ■ Registered report: ○ A step beyond preregistration ■ Write a plan (introduction, method, data analysis) and submit to a journal before commencing data collection ■ “Conditional acceptance” if study deemed good ● Addresses publication bias (aka the ‘file drawer problem’) ■

Use Quizgecko on...
Browser
Browser