PSGY1014 Research Methods and Analyses 1 Revision Semester 1 PDF
Document Details
Uploaded by MemorablePeninsula4209
2024
Steve Janssen
Tags
Summary
This document is a revision guide for the PSGY1014 Research Methods and Analyses 1 course, for L13 revision semester 1 in 2024. It covers topics including objectives, statistical concepts like descriptive statistics, averages, distributions, and explores correlational and inferential statistical concepts with examples and explanations. With detailed explanations and visuals the document serves as a practical revision tool.
Full Transcript
PSGY1014 – Research Methods and Analyses 1 L13 – Revision Semester 1 Prof Steve Janssen February 1, 2024 Objectives of the Module By the end of this module, you should be able to … – Understand more clearly the process of scientific discovery –...
PSGY1014 – Research Methods and Analyses 1 L13 – Revision Semester 1 Prof Steve Janssen February 1, 2024 Objectives of the Module By the end of this module, you should be able to … – Understand more clearly the process of scientific discovery – Understand the difference between effects and noise – Identify some of the common pitfalls of statistical reporting – Know when and how to apply the more common statistical tests, using statistical analysis software (SPSS or JASP) to conduct these tests, and to interpret and report these tests’ results PSGY1014 Semester 1 Revision 2 Why stats? Many people do not enjoy learning about statistical analyses But why are we teaching you how to do statistical analyses? Statistical analyses help researchers to describe data (i.e., descriptive statistics) or to decide whether there is a relation between two variables (i.e., inferential statistics) PSGY1014 Semester 1 Revision 3 Descriptive Statistics There are many types of descriptive statistics – Frequencies Numbers, percentages, or proportions – Averages (or central tendencies) Mean, median, or mode – Measures of spread Range (incl. min and max) or interquartile range Variance, standard deviation, population estimated standard deviation, or standard errors Descriptive statistics are not only useful for describing the data, but they are also important for inferential statistics PSGY1014 Semester 1 Revision 4 Averages Mode ̶ Most prevalent score or scores Median ̶ The middle value when all scores are ranked Mean ̶ The sum of the scores divided by the number of scores ̶ Often used, but sensitive to outliers PSGY1014 Semester 1 Revision 5 Normal Distribution Normal distributions (also called Gaussian distributions) are shaped like a bell curve. In these distributions, the values of the three averages are the same. PSGY1014 Semester 1 Revision 6 Skewed Distributions However, in skewed distributions, the values of the three average are not the same Positively Skewed Negatively Skewed Whether the distribution is skewed will affect the choice for the most appropriate inferential statistic PSGY1014 Semester 1 Revision 7 Measures of Spread Central tendencies (i.e., averages) and measures of spread are related Median – Range (max – min) – Interquartile range (Q3 – Q1) Mean – Variance (σ2), standard deviation (σ, s, SD), population estimated standard deviation (ŝ), standard error (SE) Mode – No corresponding measure of spread PSGY1014 Semester 1 Revision 8 Exploring Data With graphs, you can explore or present the data Descriptive – Stem-and-leaf plots (hardly used) – Histograms (one group) – Population pyramids (two groups) Inferential – Bar charts (use means and standard error) – Line charts (also use means and standard error) – Boxplots (use median and interquartile range) – Scatter plots (for correlational designs) PSGY1014 Semester 1 Revision 9 Stem-and-Leaf Plot PSGY1014 Semester 1 Revision 10 Histogram PSGY1014 Semester 1 Revision 11 Population Pyramid PSGY1014 Semester 1 Revision 12 Bar Chart PSGY1014 Semester 1 Revision 13 Line Chart PSGY1014 Semester 1 Revision 14 Box Plot PSGY1014 Semester 1 Revision 15 Scatter plot PSGY1014 Semester 1 Revision 16 Inferential Statistics There are many types of inferential statistics – Student t-tests – Correlation coefficients (Pearson and Spearman) – Regression analyses – Chi-square tests – Mann-Whitney, Wilcoxon, Kruskal-Wallis, Friedman – Analyses of variance (ANOVA) Inferential statistics are used to examine proposed relations between variables PSGY1014 Semester 1 Revision 17 Hypotheses These proposed relations can be written in the form of a hypothesis – The amount of time spent learning is related to the grade that students receive on the exam If changing one variable leads to a change in another variable, then these variables might be related – One group studies 5 minutes, other group 20 minutes – Is there a difference in their performance on the test? PSGY1014 Semester 1 Revision 18 Hypotheses The null hypothesis (H0) is a formal way of saying that there is no effect (i.e., there is no relation) – The alternative (H1) is that there is an effect (i.e., there is a relation) Why do we set it up this way? – We want the default belief to be set against the effect – We cannot prove something to be true; only false – So, we set the H0 to be the default position, unless it is very unlikely for the data to have occurred in such a way PSGY1014 Semester 1 Revision 19 Variables The two variables in a hypothesis are called – Independent variable (manipulation, cause, predictor) Time spent learning – Dependent variable (outcome, effect, criterion) Exam grade To test a hypothesis, we change the independent variable – With and without manipulation (i.e., control group) – With different levels If the dependent variable changes with the changes in the independent variable, then the independent variable might influence the dependent variable PSGY1014 Semester 1 Revision 20 Variables Besides these two variables, there are variables that might unintentionally affect the relation between the independent and dependent variable – Confounding variable – Extraneous (or nuisance) variable The main difference between confounding and extraneous variables is that confounding variables are unequally distributed over the conditions and extraneous variables are equally divided over the conditions A variable can be a confounding variable in one study and an extraneous one in another study PSGY1014 Semester 1 Revision 21 Variables If visibility influences learning and if one group learns with better lighting conditions than the other group … – Then lighting conditions are a confounding variable PSGY1014 Semester 1 Revision 22 Variables However, if visibility influences learning but lighting conditions are poor for some participants and good for other participants in Group 1 and poor for some participants and good for other participants in Group 2 … – Then lighting conditions are an extraneous variable PSGY1014 Semester 1 Revision 23 Variables If gender influences the dependent variable and if the group of participants with the manipulation has more women than the group of participants without the manipulation … – Then gender is a confounding variable However, if gender influences the dependent variable but the group of participants with the manipulation has a similar proportion of women as the group of participants without the manipulation … – Then gender is an extraneous variable PSGY1014 Semester 1 Revision 24 Measurement Scales Dependent variables are measured on four types of scales: – Nominal: Numbers on the scale refer to classes without any ranking between them (e.g., gender, ethnicity) – Ordinal: Numbers refer to a relative position on the scale but the steps are not meaningful (e.g., military rank, level of education) – Interval: Numbers refer to equal steps on the scale (e.g., many psychological questionnaires, IQ tests) – Ratio: Numbers refer to equal steps on the scale and there is a meaningful zero (e.g., age, length, reaction time) PSGY1014 Semester 1 Revision 25 Parametric vs. Non-Parametric Tests It is important to understand on what kind of scale the dependent variable is measured, because it determines what kind of statistical test is appropriate – If interval or ratio scales, then parametric test, such as Pearson’s correlation coefficient, student t-tests, or ANOVAs, are appropriate – If nominal or ordinal scales, then non-parametric tests, such as Spearman’s correlation coefficient or Chi-Square tests, are appropriate Not only the scale on which the dependent variable is measured but also the skewedness of the distribution determines whether non-parametric tests should be used PSGY1014 Semester 1 Revision 26 Experimental Design To test a hypothesis, researchers set up studies, which tend to have three types of experimental designs – Correlational – Neither variable is manipulated, and only a relation and no causality can be inferred (“correlation is not causation”) – Experimental – Independent variable is manipulated and everything else is held constant (“ceteris paribus”), which allows researchers infer causality – Quasi-experimental – Independent variable could not be manipulated but is varied (e.g., cats vs. dogs), but everything else is held constant PSGY1014 Semester 1 Revision 27 Variables – Example 1 Go to: https://socrative.com Login: Student Login Room name: JANSSEN2363 PSGY1014 Semester 1 Revision 28 Correlational Designs Correlational designs examine whether there is a relation between variables There are four measures related to correlations – Covariance (cov) – Pearson’s Coefficient of Correlation (r) – Variance explained (R2) – Spearman’s Coefficient of Rank Correlation (ρ, called rho) PSGY1014 Semester 1 Revision 29 Four Correlational Measures Covariance (cov) – Mean product of the deviations from the means, but not standardized Pearson’s Coefficient of Correlation (r) – Covariance divided by the product of the standard deviations, but standardized Variance explained (R2) – Coefficient of correlations squared Spearman’s Coefficient of Rank Correlation (ρ, called rho) – For ordinal data PSGY1014 Semester 1 Revision 30 Coefficients of Correlation Correlations are conducted to examine whether two variables are (somehow) related ̶ This relation is represented by a straight line Correlations do NOT tell us whether there is a difference between two variables ̶ Differences can only be observed between groups or between conditions on one variable 31 PSGY1014 Semester 1 Revision Correlations r(168) = -.43, p <.05 A correlation coefficient consists of four pieces of information 32 PSGY1014 Semester 1 Revision Correlations r(168) = -.43, p <.05 Degrees of freedom Direction A correlation coefficient consists of four pieces of information 1. The degrees of freedom, representing the number of observations on which the analysis is based (N minus 2) 2. The direction (or sign) of the correlation, representing whether the relation between the two variables is positive or negative 33 PSGY1014 Semester 1 Revision Correlations r(168) = -.43, p <.05 Magnitude p-value A correlation coefficient consists of four pieces of information 3. The magnitude of the correlation, representing how well the relation is described by the analysis 4. The p-value, representing the likelihood that the observed correlation is due to chance (given the null hypothesis would be true) 34 PSGY1014 Semester 1 Revision Degrees of Freedom The number of values in a statistical analysis that are free to vary For Pearson’s Coefficient of Correlation, the degrees of freedom can be calculated by subtracting 2 from the total number of observations (i.e., N–2). 35 PSGY1014 Semester 1 Revision Direction A correlation has a positive value (+) ̶ As Variable X increases, Variable Y increases too A correlation has a negative value (-) ̶ A negative correlation does NOT mean that there is no relation between Variables A and B ̶ As Variable X increases, Variable Y decreases 36 PSGY1014 Semester 1 Revision Magnitude The magnitude or correlation coefficient describes HOW WELL the line describes the data (i.e., quality) ̶ How close are the data points to the line (i.e., residuals) It does NOT describe how the two variables are related (i.e., quantity) HOW the variables are related ̶ Regression: If Variable X increases with one unit, then Variable Y changes with a units 37 PSGY1014 Semester 1 Revision Magnitude If the magnitude is large, then the data points are close to the line (i.e., smaller residuals): r(13) =.87 If the magnitude is smaller, then the data points further away from the line (larger residuals): r(13) =.70 38 PSGY1014 Semester 1 Revision Magnitude Magnitude is not affected by the angle of line ̶ A relatively flat line can have a high magnitude: r(15) =.87 ̶ And a steeper line have a lower magnitude: r(15) =.76 ̶ And vice versa 39 PSGY1014 Semester 1 Revision Magnitude http://www.rossmanchance.com /applets/GuessCorrelation.html http://guessthecorrelation.com/ 40 PSGY1014 Semester 1 Revision Magnitude The magnitude of correlation coefficients can be quantified ̶ Small (or weak):.10 ̶ Medium:.30 ̶ Large (or strong):.50 Correlations of.38 and.43 are therefore considered to be medium-sized 41 PSGY1014 Semester 1 Revision P-Value Indicates whether the observed relation between the two variables is due to chance (given the null hypothesis would be true) ̶ If the p-value is lower than.05 (or.01 and.001), then it is unlikely that the observed relation is due to chance ̶ If the p-value is higher than.05, then the two variables are probably not (directly) related The significance value depends on the number of observations and the magnitude of the correlation 42 PSGY1014 Semester 1 Revision Explained Variance How much variance in the dependent variable is -.43 x -.43 explained by the variance in the independent 9 variable 120 If the explained variance is low, then there are 120 other variables that influence the dependent + 1600 R2 = Simply square the correlation coefficient (i.e., 0.1849 magnitude) ↓ – If r(168) = -.43, then R2 =.185 18.5% 43 PSGY1014 Semester 1 Revision Causality When a significant correlation is observed, it does NOT mean that Variable A causes Variable B ̶ Correlation is not causality However, this absence of causality does NOT mean that it does not matter how the relation is described If there is a negative correlation between alcohol consumption and age for men, you cannot say: ̶ “As men drink more alcohol, they become younger” It would be more appropriate to say: ̶ “As men become older, they drink less alcohol” 44 PSGY1014 Semester 1 Revision Coefficients of Correlation Correlations are conducted to examine whether two variables are (somehow) related ̶ This relation is represented by a straight line A correlation consists of four pieces of information 1. The degrees of freedom (or the number of observations) 2. The direction (or sign) of the correlation 3. The magnitude (or correlation coefficient) 4. The p-value (or significance level) 45 PSGY1014 Semester 1 Revision Coefficients of Correlation Correlations do NOT tell whether there is a difference between two variables ̶ Differences can only be observed between groups or between conditions Whether two correlations are different can, however, be examined with the Fisher Z test Focus on Pearson’s coefficient of correlation ̶ Covariance, partial correlations, and Spearman’s coefficient of correlation are important too 46 PSGY1014 Semester 1 Revision Regression Whereas correlation coefficients examine whether or not two variables are related, regression analyses examine how two (or more) variables are related – How much does the dependent variable increase when the independent variable is increased by one unit? – For regressions, independent variables are called predictors and a dependent variables is called a criterion PSGY1014 Semester 1 Revision 47 Regression Regressions can be expressed with an equation – y = ax + b – The equation gives the value of the criterion (y) when you know the value of the predictor (x) – The intercept b indicates the point where the line would cross the y-axis (i.e., the value of y when x = 0) – The slope a indicates the steepness of the line (i.e., the amount with which the y-value changes when x increases with 1) PSGY1014 Semester 1 Revision 48 Regression Regressions can have multiple predictors – If there is one predictor, then the equation describes the line of the best fit – If there are two predictors, then the equation describes the plane of the best fit PSGY1014 Semester 1 Revision 49 Experimental Design Experimental and quasi-experimental studies tend to have three ways in which the independent variable is varied – Between-subjects: One group of participants receives the treatment, whereas the other group does not Independent-samples t-test – Within-subjects: All participants are tested twice, once with the treatment and once without the treatment – Matched-subjects: Each participant who received the treatment is matched (often on age and gender) to another participant who did not receive the treatment Both paired-samples t-test PSGY1014 Semester 1 Revision 50 Between-Subjects One group of participants receives the treatment, whereas the other group does not Or one group receives a higher dosage than the other group Group 1 50% Test Different? Group 2 25% Test Differences between Group 1 and 2? – Random allocation PSGY1014 Semester 1 Revision 51 Within-Subjects All participants are tested twice, once with the treatment and once without the treatment Or first with a higher dosage and then with a lower dosage Group 50% Test 1 25% Test 2 Different? Order effects (practise, fatigue)? – Counter-balancing PSGY1014 Semester 1 Revision 52 Variables – Example 2 Go to: https://socrative.com Login: Student Login Room name: JANSSEN2363 PSGY1014 Semester 1 Revision 53 Hypotheses Inferential statistics are used to test hypotheses, which can be written as null (H0) and alternative (H1) hypotheses Null hypotheses state that there is no difference, whereas alternative hypotheses state that there is a difference These two statements are mutually exclusive – If one is true, then the other is false As indicated earlier, inferential statistics test the null hypothesis (i.e., whether is the statement that there is no difference is false) If the null hypothesis is rejected, then this result would offer support for the alternative hypothesis PSGY1014 Semester 1 Revision 54 Student T-Tests Student t-tests are used when there is one independent variable with two groups or levels, assuming that – The results are measured on a ratio or interval scale – The results are distributed normally (Shapiro-Wilk) – The results in the conditions have similar variance (Levene) – The results in one condition are independent of the results in other conditions If there is more than one independent variable or if there are more than two groups or levels, then use analyses of variance (ANOVA) – Later this semester PSGY1014 Semester 1 Revision 55 Student T-Tests There are three types of student t-tests – Independent-samples t-test Between-subjects designs – Paired-samples t-test Within-subjects designs – One-sample t-test The results of one group that is tested once are compared with a pre-defined value Descriptive; not inferential PSGY1014 Semester 1 Revision 56 Probability Inferential statistics calculate the probability (p) that, assuming that H0 is true, the observed difference between the groups is due to chance There are three main factors that affect this probability – The difference between the means of the scores of the groups (i.e., M1 – M2) – The variance in the scores of the groups (i.e., SD) – The sample size of the groups (i.e., N) PSGY1014 Semester 1 Revision 57 Significance If the probability is very low, then the assumption that H0 is true must be wrong and is therefore rejected – Support for the alternative hypothesis, H1 The level that the probability should be for the null hypothesis to be rejected is called the alpha criterion (α) or significance level – In statistics, significance is not the same as importance The levels for alpha tend to be:.05,.01, and.001 – A probability of.05 suggests that there is a 1-in-20 chance that this kind of difference was found despite the null hypothesis being true PSGY1014 Semester 1 Revision 58 Student T-Tests These inferential tests examine whether there is a difference between the two groups or the two levels – If the difference is not significant, then we can say that the two groups or the two levels are not different – However, we cannot say then that the two groups or the two levels are the same There are other tests --called equivalence tests-- to determine whether the two groups or the two levels are the same Alternatively, one can use a different approach to statistical analyses called Bayesian statistics PSGY1014 Semester 1 Revision 59 Significance However, this argument is made on the basis that there are no confounding or extraneous variables affecting the relation between the independent and dependent variable When there are confounding or extraneous variables, then the outcome of the statistical analysis might be incorrect – The test might suggest that there is a difference between two conditions, but there is none in reality (or vice versa) – Type I or Type II Errors PSGY1014 Semester 1 Revision 60 Type I and Type II Errors The value of alpha influences whether the null hypothesis is accepted or rejected by the statistical analysis Outcome Test Reality Accept H0 Reject H0 H0 is true No support H1 Type I Error H0 is false Type II Error Support H1 PSGY1014 Semester 1 Revision 61 Type I Errors Accept H0 Reject H0 H0 is true No support H1 Type I Error H0 is false Type II Error Support H1 Type I Errors represent situations in which the null hypothesis was incorrectly rejected, and the alternative hypothesis was therefore incorrectly supported (i.e., a false alarm) Medicine – The test suggests that the patient has the disease, but they do not actually have the disease PSGY1014 Semester 1 Revision 62 Type I Errors Accept H0 Reject H0 H0 is true No support H1 Type I Error H0 is false Type II Error Support H1 Type I Errors can occur when – The alpha was too lenient – Or there was a confounding variable (causing the difference between the means of the groups to increase) PSGY1014 Semester 1 Revision 63 Type II Errors Accept H0 Reject H0 H0 is true No support H1 Type I Error H0 is false Type II Error Support H1 Type II Errors represent situations in which the null hypothesis was incorrectly accepted, and the alternative hypothesis was therefore incorrectly rejected (i.e., a miss) Medicine – The test suggests that the patient does not have the disease, but they actually have the disease PSGY1014 Semester 1 Revision 64 Type II Errors Accept H0 Reject H0 H0 is true No support H1 Type I Error H0 is false Type II Error Support H1 Type II Errors can occur when – The alpha was too stringent – Or there was an extraneous variable (causing the variation in both groups to increase) PSGY1014 Semester 1 Revision 65 Type I and Type II Errors PSGY1014 Semester 1 Revision 66 Interpretation Inferential statistics indicate whether the null hypothesis was accepted or rejected If the null hypothesis is rejected, then this outcome is not “evidence” that the alternative hypothesis is true It gives only “support” for the alternative hypothesis, because a competing hypothesis has been eliminated Avoid therefore strong words, such as “proof” or “evidence”, when describing the outcomes of statistical analyses PSGY1014 Semester 1 Revision 67 Interpretation Please note that the results or outcomes of inferential statistics say only something about the statistical results of the study at hand If the study was poorly designed and allowed for confounding variables, then – even though the statistical analyses are correct – the results of the study are misleading For this reason, we put a lot of emphasis on the design of the experiments at the in-class assessments of this module PSGY1014 Semester 1 Revision 68 Causality The direction of causality does not only affect studies with a correlational design; it can also affect studies with other types of designs A researcher compared the self-esteem of students who had not passed the exam to the self-esteem of students who had passed the exam ̶ What kind of experimental design? 69 PSGY1014 Semester 1 Revision Causality The students who had not passed the exam gave lower ratings of self-esteem than the students who had passed the exam, and the researcher concluded that self-esteem caused failure The problem with the researcher’s conclusion is that failing the exam may have caused low self-esteem (in the students who had failed the exam) ̶ Does low self-esteem cause poor performance on the exam? ̶ Or does poor performance on the exam cause low self- esteem? 70 PSGY1014 Semester 1 Revision Causality The problem may become more evident when one would plot the results of the study 71 PSGY1014 Semester 1 Revision Causality The most appropriate way to address the problem of directionality is simply giving all students the self-esteem scale before they take the exam (or at the beginning of the academic year) If students with high self-esteem perform better on the exam than students with low self-esteem, then there is some support for the conclusion that low self-esteem might lead to poor performance Alternatively, if one would measure self-esteem both before and after the exam, then one could also examine whether poor performance leads to low self-esteem 72 PSGY1014 Semester 1 Revision Control Group The study design has already a control group (i.e., the students who passed the exam) Adding a third group and comparing students with low, average, and high marks does not address the problem of directionality 73 PSGY1014 Semester 1 Revision Noise Like any study, the results may have been affected by extraneous variables (“noise”) However, extraneous variables were not the most glaring error that the researcher made when they drew their conclusions Controlling for extraneous variables is good practice, but it would not have solved the problem of directionality 74 PSGY1014 Semester 1 Revision Confounding Like any study, the results may have been affected by confounding variables IQ -> Performance -> Self-Esteem Although controlling for confounding variables is good practice, it would (again) not have solved the problem of directionality 75 PSGY1014 Semester 1 Revision Good vs. Best Answers Why is committing murder bad? 76 PSGY1014 Semester 1 Revision Good vs. Best Answers Why is committing murder bad? Is it that you may get blood on your clothes and blood is notoriously difficult to get out of your clothes? Or are there other, more important reasons? When answering questions, choose the best answer 77 PSGY1014 Semester 1 Revision Good vs. Best Answers To be clear, confounding or extraneous variables might be the best answer in one of the future tests However, in those cases, you will need to indicate (a) which variable is the confounding or extraneous variable and (b) how this variable might have influenced the outcomes of the study In addition, avoid giving many different answers in the hope that one of them is the correct answer 78 PSGY1014 Semester 1 Revision Semester 1 Revision I have quickly reviewed the lectures from the first semester I have not discussed everything from these lectures; note that parts that I have not discussed are important too! PSGY1014 Semester 1 Revision 79 Semester 2 Date Lecture Topic Note Feb 1 11 Revision Semester 1 Feb 8 12 Probability and Distributions + In-Class Test #3 Feb 15 Feb 22 13 Significance and Power Feb 29 14 Categorical Data Mar 7 15 Non-Parametric Tests 1 + In-Class Test #4 Mar 14 16 Non-Parametric Tests 2 Mar 21 17 ANOVA 1 + Explanation Test #5 Mar 28 PSGY1014 Semester 1 Revision 80 Semester 2 Date Lecture Topic Note Apr 4 18 ANOVA 2 + In-Class Test #5 Apr 11 Apr 18 19 ANOVA 3 + Explanation Coursework Assignment Apr 25 20 Revision Semester 2 Deadline Coursework Assignment May 2 21 Resit Opportunity Class Tests only for those with EC! PSGY1014 Semester 1 Revision 81 Next week: Test 3 Similar to previous tests: One “factual” question and one “applied” question – In the factual question, you will be asked to give definitions or examples of statistical concepts – In the applied question, you will be given a description of a study and you will be asked whether something is wrong with the study and how you would address this problem Pay special attention to Lectures 2, 9, 10, and 11 PSGY1014 Semester 1 Revision 82 Next lecture (February 8) ̶ Test 3 ̶ Probability and Distributions ̶ Field (2009), pp. 40–48 Thank you! [email protected] B1B21 PSGY1014 Semester 1 Revision 83 Questions Go to: https://socrative.com Login: Student Login Room name: JANSSEN2363 PSGY1014 Semester 1 Revision 84 Instatt PSGY1014 Semester 1 Revision 85