Document Details

ProfoundNobelium

Uploaded by ProfoundNobelium

University of Nottingham Malaysia Campus

Tags

research methods statistics experiments social science

Summary

This document provides a summary of research methods, covering topics such as statistics, experiments, quasi-experiments, correlations, and different types of variables. It includes definitions and examples for each concept.

Full Transcript

Lecture 1 What is statistics? Process of finding out about patterns in the world using real data Why do you need statistics? To understand what events are likely and what is due to chance Deterministic Model Syste...

Lecture 1 What is statistics? Process of finding out about patterns in the world using real data Why do you need statistics? To understand what events are likely and what is due to chance Deterministic Model System Does not include elements of randomness -​ Every time the model is run with the same condition → same results. -​ Does not consider variation Probabilistic Model System Includes elements of randomness -​ Every time the model runs, likely to get different results, even with the same initial condition -​ System that incorporates some kind of random variation Unsystematic Variation Inaccuracy or anomaly that are results of unknown factors not under statistical control/uncontrollable factors E.g., participants intelligence, education, mood How to limit? -​ Random assignment of participants Systematic Variation Difference in performance caused by a manipulation of the IV Descriptive Statistics Summarises or describes characteristics of a data set -​ Measures of central tendency, measures of variability, frequency distribution Inferential Statistics Uses samples to make reasonable guesses about the larger population. -​ Testing for differences -​ Testing for correlation -​ Testing for interactions How can stats mislead others? -​ May only show part of the data -​ Exaggerated graphics Lecture 2 What are the three scientific methods? ​ Experiments ​ Quasi-Experiments ​ Correlational methods Experiments Definition Investigating the cause-and-effect relationship between variables ​ An IV is manipulated, and the DV is measured. ​ Extraneous variables are controlled → only IV impacts DV. ​ It requires the use of control & treatment groups (to avoid confounding variables) Quasi-Experiments Definition Investigating the cause-and-effect relationship between variables, but DO NOT rely on the random arrangement of participants into groups. -​ Due to innate differences of the participants themselves -​ Researcher does not have control over treatment E.g. -​ A memory task with a group of clinically depressed participants compared to a control group of non-depressed participants. Correlational Methods Definition Investigating the strength of the association between variables. -​ Variables are only observed, with no manipulation or intervention by researchers. -​ Limited control→ other variables may play a role → presence of extraneous variables E.g., the correlation between smoking cigars and cancer What are Categorical variables? Classifies data into distinct categories or groups Types of Categorical Variables Nominal: Categories with no inherent order (hair colour: black, brown, blonde) Ordinal: Categories with meaningful order (Level of education: high school, bachelor's, master's, PhD) → usually discrete variables What are Continuous Variables? A variable that can take on any value on the measurement scale Types of Continuous Variables Interval Scale: The intervals on a scale represent EQUAL differences in things being measured. -​ Do not need a meaningful zero point E.g., temperature (0 Celsius is not an absence of temp.) Ratio Scale: The intervals on a scale represent equal differences, but ratio variables need a meaningful ZERO point → absence of the attribute -​ E.g., a height of 0 cm, enzyme activity of 0, number of errors, reaction times. Independent Variable Definition The factor under investigation in an experiment, which is manipulated to create two or more conditions, is expected to be responsible for changes in the DV. An IV can have two or more levels or conditions -​ E.g., testing participants anxiety levels with 1, 2, 3 cups of coffee → has 1 IV with three levels Dependent Variable Definition The variable that the experimenter measures An experiment can have one or more DVs Between-Subjects Design An experimental design in which different groups of participants go through different conditions. Weakness of Between Subjects ​ Participant variables ○​ Many individual differences affecting DV rather than IV ○​ Can counter this by using random assignment ​ Time-consuming—more participants required. Strengths of Between-Subjects ​ Eliminates Boredom Effects and Practice Effects, Demand Characteristics ○​ Only 1 condition Within-Subjects Design An experimental design where one group of participants takes part in all IV levels of experiment Weaknesses of Within Subjects: ​ Validity ○​ Demand Characteristics ○​ Order Effects ○​ Carryover effects How to reduce? ​ Counterbalancing Strengths of Within Subjects ​ Fewer participants needed compared to between subjects ​ Less participant variable Matched-Pairs Design Experimental design in which participants are matched in pairs (based on the same characteristic), with each member performing at different levels of IV. Strengths of Matched-Pairs ​ Makes it comparable ​ Minimises individual differences Weaknesses of Matched Pairs ​ Time-consuming ​ Smaller sample size—harder to generalise Extraneous Variables Any variable not being investigated that has the potential to affect the outcome of a research study Confounding Variable A factor that systematically affects the IV and DV. Hypothesis A testable statement predicting the outcome of the study. Types of Hypotheses 1.​ Alternative Hypothesis Predicts that there will be a difference between groups E.g., a pill will enhance alertness. 2.​ Null Hypothesis A testable statement predicting that there will be no differences between groups/conditions. Explain Test Statistics ​ p : the probability that the effect we measured is due to chance ​ a (alpha) : the criterion level (.05) that p has to be less than for us to think events measured is NOT due to chance ​ If p > a, then we have failed to reject the null hypothesis. Type 1 error (false positive result) An error that occurs when a researcher rejects the null hypothesis when it was true -​ E.g., patients diagnosed with medical conditions that they don't have. Due to confounding variables Type 2 error (false negative result) An error that occurs when a researcher accepts the null hypothesis when it was false Due to noise in the data Validity The extent to which the researcher is testing what they claim to be testing -​ Internal → order, practice, boredom effects. -​ Ecological -​ Generalisability -​ Demand characteristics Reliability The extent to which a task, procedure, or measure is consistent -​ Internal reliability → standardised procedures -​ External reliability: replication (test-retest) -​ Inter-rater reliability Plan for Designing & Analysing Experiments 1.​ Design Experiment a.​ IV & DV b.​ Between or Within design 2.​ Specify statistical hypothesis 3.​ Collect Data 4.​ Describe Data 5.​ Calculate Test Statistics 6.​ Reject or fail to reject hypothesis 7.​ Write Paper. Lecture 3 Discrete Variables Have certain fixed values (often integers). -​ Ordinal variables are discrete variables. -​ E.g., number of objects Continuous Variables Variables can take on any fractional value (within a given range). -​ E.g., the distance run can be 5.4 km -​ Interval and ratio scales can be continuous What is the central tendency? A single value that attempts to describe a set of data by identifying the central position within that set of data. Mode Definition The most common category/score in data Mode Pros & Cons Pros: -​ Can be used for categorical data (nominal) (no other central tendency can.) -​ Always gives real data value Cons: -​ Can sometimes give more than 1 mode (bimodal / multimodal distributions) Median Definition The middle value of a data set when ranked in ascending order, or the mean of the two middle values of a data set. Median Pros & Cons Pros: -​ Unaffected by extreme scores (outlying data) at each end of the distribution. -​ Can be used for ordinal, interval, & ratio variables. -​ Often gives real data value Cons: -​ Unable to be used with nominal data → not numerical. -​ Ignores a lot of data -​ Not easy to calculate without a computer Mean Definition The average score in a data set. Sum of all values divided by the number of values. Adv and Disadv of Mean Adv: -​ Uses all data → Does not leave out outlying data Disadv: -​ Can be influenced by extreme scores. -​ Not always a meaningful value (E.g., 2..4 children) Measures of Spread Describes the spread of data around a central value (mean, median, mode) -​ Tells us how much variability there is in the data Measures of spread are related to measures of central tendency: -​ Median: range/interquartile -​ Mean : Variance/Standard Deviation -​ Mode: none Range Largest score subtracted by smallest Affected by extreme scores Interquartile range Tells you how spread out the middle half of your data is -​ Doesn't account for outliers -​ Good for skewed distributions How to calculate the 1.​ (𝑛 + 1) * 1/4 to find Q1 interquartile range 2.​ (𝑛 + 1) * 3/4 to find Q 3.​ 𝑖𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1 Variance & its formula Measures the spread of data around the sample’s mean How much the values in a data set differ from the mean. Pros and Cons of Variance Pros: -​ Uses all the data to get its value Cons: -​ More sensitive to outliers -​ Requires a ‘normal’ distribution -​ Does not have a sensible unit Standard Deviation A square root of the variance. Units become the same as the variable being measured. High standard deviation = values are far from the mean Low = values are close to the mean. Formula for standard Population: deviation of a population and sample. Sample: Why do we need the standard Used to make inferences and estimates about population deviation of a sample? standard deviation Unbiased estimate of what the population standard deviation would be if we were to measure it. Standard Error of the Mean Measures how far the mean of the sample is from the true population mean Formula for standard error *Standard error mean SEM decreases as sample size increases, giving more precise/valid estimation of a population Z-scores Identifies how many standard deviations a specific score differs from the mean. -​ Z-scores will have a mean of 0 and S.D of 1 -​ Positive z-score—when it is on the right -​ Negative z-score-left (-2 means 2 standard deviations) -​ Zero z-score indicates datapoint = mean Z-score formula Empirical Rule 68% of data is within 1 SD 95% of data is within 2 SD 99.7% of data is within 3 SD Normal Distribution When data is distributed symmetrically around the centre of all scores -​ Bell-shaped curve Mode = mean = median in normal distribution Skewed Distribution When the most frequent scores are at one end of the scale -​ Positively skewed: most frequent data are at the lower end. -​ Negatively skewed: most frequent data are at the higher end. SPSS What is the Kolmogorov-Smirnov test for? To test that the null hypothesis of that set of data is NORMALLY DISTRIBUTED. Test for Normality -​ If p >.05, it is normally distributed What is the Levene’s Test for? Homogeneity of Variance -​ If the variances are equal or not If p >.05 -​ High probability of it being due to chance -​ Non-significant -​ Normally distributed -​ Not statistically different -​ Variances are presumed equal. If p <.05 -​ Low probability of it being due to chance -​ Highly significant -​ Not normally distributed -​ Statistically different How do you test for normality in SPSS? Analyse, descriptive stats, explore, variables. Independent Samples T-test ​ The Levene’s test determines which row to use. ○​ If the p >.05, then use upper row (variances are equal) ○​ If the p <.05, then use lower row (variances are not equal) Correlation: What is correlation? Investigating the association between variables E.g., the association between caffeine and alertness What is covariance? A measure of the relationship between two random variables ​ The mean of the product of the deviations Best Fit Line Line that best represents our data -​ Correlation is indicated by how closely data points fall to this line. -​ How well the line describes the association is reflected by magnitude. Two characteristics of correlations Strength: between -1 and 0 and 1 -​ -1 = negative relationship -​ 0 = no relationship -​ 1 = positive relationship Direction -​ Negative correlation vs. positive correlation -​ E.g., -0.67 or 0.56 Two Key Tests of Correlations Pearson's coefficient of correlation -​ Used to explore the linear relationship between two continuous variables. Spearman Rho’s Correlation Coefficient -​ Used to explore the linear relationship between ranked scores (ordinal data) rather than continuous data. Properties of Pearson’s r -​ Only dependent on how well related the variables are, not how much they vary -​ Takes values between -1 & +1 -​ -1 = when x goes up, y goes down -​ +1 = when x goes up, y goes up too -​ 0 signifies NO linear relationship between x and y Describe Spearman’s Rho rs -​ Non-parametric test -​ Does not assume that data are normally distributed or evenly spaced. -​ Used for ordinal data -​ First rank the data, then work out Pearson’s r on the ranks. What are scatterplots used for? To get a pictorial view of the relationship between two variables Four Measures of Association -​ Covariance (parametric) -​ Pearson's Coefficient of Correlation (parametric) -​ Variance -​ Spearman’s Coefficient of Rank Correlation (non-parametric) What is variance? Measure of association (correlation) calculated by squaring Pearson’s R -​ Based on % or fractions -​ If the value is close to 1, then your variable explains almost ALL variations in your data. Partial Correlations Measure of the strength and direction of the linear relationship between two continuous variables while controlling the effects of one or more continuous variables. E.g., the effect of running performance on V02 max might be explained by wind speed & humidity. Zero Order Correlation Simple (bivariate) correlation First Order Partial Correlation ‘Partials out’ a single variable constant Second Order Partial Correlation Holds two variables constant If one or both of the variables are Non-parametric analysis : Spearman’s rank correlation measured on an ordinal scale (categorical rather than continuous), then what correlation analysis do you need to conduct? If both variables are measured on Parametric analysis : Pearson's coefficient of correlation an interval or ratio scale, then what correlation analysis do you need to conduct? In a scatterplot, if the observations Magnitude will be large → there might not be many variables are close to the line, what is the at play magnitude? There are four measures of association -​ Covariance (parametric) -​ Pearson's Coefficient of Correlation (parametric) -​ Variance explained -​ Spearman's Coefficient of Rank Correlation (non-parametric) Covariance -​ Mean of the product of the deviations Correlations have two characteristics: -​ Magnitude → 1 = perfect relationship , 0 = no relationship -​ Direction → Negative Correlation vs positive correlation Two key tests of correlation: 1.​ Pearson's Correlation Coefficient a.​ Used to explore linear relationship between two continuous variables 2.​ Spearman's Correlation Coefficient a.​ Used to explore the linear relationship between ranked scores (ordinal data) rather than continuous data. Spearman's does not assume that the data are evenly spaced or normally distributed → good for ordinal data -​ First, have to rank data, then work out Pearson's r on the ranks. CORRELATION practice: A positive correlation between both variables was found, r(8) =.97, p <.001. Partial Correlations : -​ Choose the partial option instead of bivariate. ​ Simple (bivariate) correlation is called a zero-order correlation. ​ First-order (partial) correlation is one that singles out a single variable constant. ​ Second-order (partial) correlation holds two variables constant, etc. Regression Use to describe the relationship between Uses data to calculate a variables and predict the value of a line of best fit. dependent variable based on one or more independent variables. Includes linear and multiple regression. Linear regression A regression analysis involving one y=ax+b. predictor variable and one outcome variable. E.g., height can predict weight, and exam performance can be predicted based on revision time. Outcome variable The variable we want to predict (DV) Denoted as y Predictor variable The variable we are using to predict the Denoted as x other variable's value (IV). Intercept (b) The point where the regression line Part of the regression crosses the y-axis, representing the equation predicted value of y when x=0. y=ax+b Slope (a) The rate of change in the dependent Represents the strength of variable for a one-unit change in the the relationship. independent variable. R2 (Coefficient of The proportion of variance in the - Expressed as a Determination) dependent variable explained by the percentage or fraction. independent variable(s) - Higher R2: better explanatory power. Analysis of Variance A statistical test used in regression to - p < 0.05: model explains (ANOVA) determine whether the variance variance significantly. explained by the model is statistically significant. Reporting results for a linear 1.​ Explain how much variance of regression model the outcome variable is explained by the predictor variable—R2 2.​ Whether the amount of variance explained is significant (ANOVA) 3.​ What the slope (a) and intercept (b) regression equation is → whether this is significant. Multiple regression Model that estimates the relationship Uses a plane of best fit between the dependent variable and two instead of a line. or more independent variables Z = ax + by + c Reporting regression in APA State: -​ The type of analysis *Same format for both linear -​ The relationship between IV and and multiple DV -​ the significance of variance R2 of DV and ANOVA + p-value -​ The predictors, t-test and significance -​ The unstandardised coefficient and how it increases the DV E.g., with life satisfaction increasing by.211 for every point in social support T-tests T-tests Statistical tests are used to - p < 0.05: Significant compare the means of two groups to determine if the - p > 0.05: Not significant difference is significant. t-value Value calculated to evaluate - Negative t: sample mean is less than the reliability of results; hypothesised mean higher values indicate more - Positive t: sample mean is more than reliable results. the hypothesised - t depends on the standard error of the mean. Degree of freedom (df) A measure based on the - Depends on N (sample size). sample size, indicating the number of independent values that can be estimated in an analysis Independent sample t-test Compares the means of two - Assumptions: independent groups 1. Interval or ratio scale (continuous undergoing different data). conditions. 2. Normal distribution (KS test) (Between-subject design) 3. Independent data between groups. 4. Equal variances (tested with Levene's test). Kolmogorov-Smirnov Test A test to determine if data is - p > 0.05: data is normally normally distributed distributed. - p < 0.05: data is not normally distributed. Levene’s Test Assesses the equality of - p > 0.05: variances are equal. variances between groups (homogeneity of variance). - Only used in independent t-tests. Paired sample t-test Compares the means of the - Assumptions: same group under two 1. Paired observations different conditions (dependent data). (Within-subject design) 2. Normal distribution of differences. 3. Interval or ratio scale data. One-sample t-test Compare the sample mean to - Assumptions: a known population mean. 1. Normal distribution. 2. Interval or ratio scale. 3. Random sampling. Undirected hypothesis (two-tailed hypothesis) states - Non-specific. that there is a difference between groups without specifying the direction of the difference. Directed hypothesis (one-tailed hypothesis) states - Specific and directional. that the difference between groups is in a specific direction.

Use Quizgecko on...
Browser
Browser