Chapter 12 – Analysis of Variance: One-Way Between-Subjects Design PDF

Summary

This document discusses analysis of variance (ANOVA), specifically one-way between-subjects ANOVA. It covers concepts like analyzing variance, independent samples, and the role of the F-test in comparing two or more groups. The document also details different types of variances and how they relate to research design, aiding understanding of how researchers determine if group means significantly differ.

Full Transcript

In blue or highlighted: what was said or emphasized by the lecturers Chapter 12 – Analysis of Variance: one-way between-subjects design Increasing k: a shift to analyzing variance The levels of the factor (k) = the number of groups or different...

In blue or highlighted: what was said or emphasized by the lecturers Chapter 12 – Analysis of Variance: one-way between-subjects design Increasing k: a shift to analyzing variance The levels of the factor (k) = the number of groups or different ways in which an independent or quasi- independent variable is observed. An introduction to analysis of variance A one-way between-subjects ANOVA is a statistical procedure used to test hypotheses for one factor with two or more levels concerning the variance among group means. This test is used when different participants are observed at each level of a factor and the variance in any population is unknown. o If the means in each group significantly vary o The larger the differences are between group means, the larger the variance of group means will be A between-subjects-design is a research design in which we select independent samples, meaning that different participants are observed at each level of a factor. With the F-test, we investigate to what extent the categories of an independent ordinal/nominal variable help us to explain the variation of a dependent interval/ratio variable o The dependent variable is an interval/ratio variable ANOVA (for interval or ratio) = is a statistical procedure used to test hypotheses for one or more factors concerning the variance among two or more group means (k≥2), where the variance in one or more populations is unknown; is an extension of the independent samples t-test and it can be used for comparing more than two groups The F-test in the main test of significance of the ANOVA and used when comparing two or more groups The H0 for the F-test states that in the population all group means are equal: o H0 = "C = "/ = "B = ⋯ = "w o In the population the means of happiness are equal across all the groups The H1 states that in the population, at least one group mean differs from the other group(s) mean(s). Note: the H1 us always non-directional (as opposed to H1 for independent sample t-test) o H1: there are differences across groups on " Once we have rejected H0, we need a post-hoc test that tells us which group(s) differ significantly from the rest Two ways to select independent samples Select a sample from two or more populations o Used for the quasi-experimental research method (which is a research design that does not have a comparison group and/or includes a factor that is preexisting – cannot be manipulated/changed) In blue or highlighted: what was said or emphasized by the lecturers Select one sample from the same population and randomly assign participants in the sample to two or more groups o Used for experimental research method (which is a research design that includes randomization, manipulation, and a control or comparison group) o The only way to achieve an experiment using the between-subjects design is to randomly assign participants selected from a single population to different groups For ANOVA: n = number of participants per group / N = number of total participants in a study o When n is the same in each group: k x n = N Sources of variation and the test statistic Between-groups variation = the variation attributed to mean differences between groups. A source of variation = any variation that can be measured in a study. In the one-way between-subjects ANOVA, there are two sources of variation: variation attributed to differences between group means and variation attributed to error. Within-groups variation = the variation attributed to mean differences within each group. This source of variation cannot be attributed to or caused by having different groups and is therefore called error variation. An F distribution = a positively skewed distribution derived from a sampling distribution of F ratios. In analysis of variance, we distinguish between three different types of variances o Total variance: how much all x vary around the grand mean M o Variance between groups MSB: how much the groups means (M1, M2, Mk) vary around the grand mean M o Variance within groups MSW: how much all the x vary around their respective groups means In blue or highlighted: what was said or emphasized by the lecturers As the value of the test statistic increases, the likelihood of rejecting H0 increases, i.e. larger value indicate less probable H0’s, and also p-value decreases TU The value of F is defined as: x = TU y à important for theory questions z o the value of F gets larger (making it easier to reject H0) in situations where (either or both) § group means differ more (i.e. large between-group variance, the larger MSB) § The groups are more homogeneous (i.e. less within-group variance, the smaller MSW) Degrees of freedom The critical value for an ANOVA is the cutoff value for the rejection region Two sources of variation for the one-way between-subjects ANOVA = two df o The degrees of freedom between groups (dfBG) = are the degrees of freedom associated with the variance of the group means in the numerator of the test statistic. They are equal to the number of groups (k) minus 1. à df = k – 1 o The degrees of freedom error (dfW), degrees of freedom within groups, = are the degrees of freedom associated with the error variance in the denominator. They are equal to the total sample size (N) minus the number of groups (k). à df = N – k The one-way between-subjects ANOVA We compute this test when we compare two or more group means, in which different participants are observed in each group Post Hoc Once we reject the H0 of the F-test, we need to conduct a post-hoc test to determine which means differ significantly from each other o Scheffé’s test § Is similar to two-samples t-test: it runs through all possible pairs of group means and examines their means differences § For each pair of group means the H0 of the Scheffé’s test states that their population means are equal § For each pair of groups, the post-hoc test tells us: The sample mean difference (first group mean MINUS second group mean) Whether this difference is statistically significant (at the 0,05 level) In blue or highlighted: what was said or emphasized by the lecturers Chapter 15 – Correlation The structure of a correlational design Correlation = a statistical procedure used to describe the strength and direction of the linear relationship between two factors o The statistics used to measure correlations = correlation coefficients (r) Describing a correlation o A correlation can be used to § Describe the pattern of data points for the values of two factors (described by the direction and strength of the relationship between two factors = correlation coefficient) § Determine whether the pattern observed in a sample is also present in the population from which the sample was selected o Scatter plot = scatter gram = graphical display of discrete data points (x, y) used to summarize the relationship between two (interval/ratio) variables. Pairs of values for x and y are called data points (or bivariate plots). o In a scatter plot, the independent variable (x) is always put in the horizontal axis o Each dot represents a case and its position reflects the case’s value for x and y o Provides an intuitive, straightforward way of examining the relationship between two interval/ratio variable o We need a measure of association to express the strength & direction of the relationship in a single, easily interpretable number: the correlation coefficient Pearson’s r The direction of a correlation o Correlation coefficient (r) is used to measure the strength and direction of the linear relationship or correlation between two factors. The value of r ranges from -1.0 to +1.0. Values closer to ±1.0 indicate a strong correlation. The sign of the correlation (+ or -) indicates the direction of the correlation o A positive correlation (0 < r ≤ + 1.0) is a positive value of r that indicates that the values of two factors change in the same direction: as the values of one factor increase, the values of the second factor also increase. o A negative correlation (-1.0 ≤ r < 0) is a negative value of r that indicates that the values of two factors change in different directions, meaning that as the values of one factor increase, the values of the second factor decrease The strength of a correlation o The closer a correlation coefficient is to r = 0, the weaker the correlation and the less likely that two factors are related o The closer a correlation coefficient is to r = ±1, the stronger the correlation and the more likely that two factors are related o Regression line = the best-fitting straight line to a set of data points. A best-fitting line is the line that minimizes the distance of all data points In blue or highlighted: what was said or emphasized by the lecturers that fall from it. Scores are more consistent the closer they fall to a regression line Pearson correlation coefficient Used to determine the strength and direction of the relationship between two factors on an interval or ratio scale of measurement Each score should be transformed into a z score Pearson’s r formula {]n`qa`){b | `)Z } UU&Ä Y= = i~ ×i ÅUU&UUÄ o First calculate standard deviation of x and y Σ(x − M)/ X2critical à reject H0 If p-value of Chi-Square is smaller than alpha à reject H0 Conclusion (example if rejecting H0): A chi-square test for independence showed a significant relationship between...; (state numerical results of chi- square and p). The data indicate... is associated with... Chi-square (both) Steps in significance testing: 1. Formulate H0 and H1 a. For both Chi-Square tests, H1 is always non-directional b. H0 = no relationship between two categories c. H1 = there is a relationship 2. Calculate test statistic a. In order to calculate Chi Square, we need to construct a fe table and ([ãF[â )J apply the results to the formula + / = Σ [â 3. Find appropriate critical value (given alpha and df) a. For the Chi Square test for independence, the df is: df = (number of rows – 1) x (number of columns – 1) b. For the Chi Square goodness-pf-fit test, the df is: df = k – 1 4. Compare obtained and critical value and decide whether to reject H0 a. obtained > critical à reject H0 In blue or highlighted: what was said or emphasized by the lecturers Important concepts – what was highlighted and revised in the lecture Types of statistics (lecture 1) Descriptive & inferential Uni-, bi-, & multivariate Types of variables (lecture 1) Continuous & discrete Levels of measurement: interval/ratio, ordinal, nominal Univariate description of distributions (lecture 1) Measures of CT: mode, median & mean Calculation of Sum of Squares Bivariate statistics (lecture 2) Scatter plot Pearson’s r Key concepts inferential statistics Population = the group about which we want to generalize (not necessarily people) Sample = a set of cases from the population that is selected to be studied Sample design = the procedure used to select the cases for the sample Sampling bias / non-response bias = a bias in the sample, caused by a flawed sampling procedure; the sampling is not representative of the population Sampling error = a deviation of a sample characteristics (e.g. the mean of a variable) from what actually exists in the population (not due to sampling bias) o We always have to assume a certain degree of sample error (unless the population is as large as the sample) o Inferential statistics gives us the tools to deal with this uncertainty Tests of statistical significance = statistical techniques that help us to decide to what extent findings from the sample can be generalized to the population Sample Population Mean M " Variance s2 K/ Standard deviation s K Size n N Key concepts: normal distribution In inferential statistics, we typically assume our (interval/ratio) variables to be “normally distributed” meaning o Unimodal (one mode peak only) o Symmetric (mean = median = mode; just as many cases above the mean as below the mean) o Asymptotic to x-axis (the curve never reaches the x-axis) In blue or highlighted: what was said or emphasized by the lecturers o Theoretical More bivariate statistics: crosstables = corsstabs = contingency tables (not in the book) Independent variable = variable that we expect to influence another variable in the model (x) Dependent variable = variable that we expect to be influenced by at least one (independent) variable in the model (y) Nominal and ordinal variables Crosstables = table that depicts a possible relationship between an independent and a dependent variable What does this table tell us about the relationship between the two variables? Use percentage! When calculating the percentages of the column totals, we always compare percentages horizontally To determine the relationship in a crosstab: o Put the independent variable in the columns and the dependent variables in rows o Calculate percentages of columns for easy interpretation o Compare percentages horizontally within the categories of the dependent variable In SPSS: tutorial 3 o Is there a difference in general subjective health between men and woman? § Gender = independent health / healthy perception = dependent o Analyze o Descriptive statistics o Crosstabs o Put dependent variable in the row and independent variable in the column o Cells § Always column percentage Steps in significance testing Formulate H0 and H1 (directional or nondirectional H1) Determine level of significance (default: alpha 0.05) Calculate the test statistic (obtained) o One-sample z-test if we know SD of the population o One-sample t-test if SD of the population in unknown Find appropriate critical value of test statistic at the 0.05 level in table Compare obtained statistic with critical value: if obtained statistic < critical value retain H0 Even if this step rejects H0, confirm that sign of obtained value supports the H1 when directional State your conclusion referring to the population and as detailed as possible: mean, context P-value In blue or highlighted: what was said or emphasized by the lecturers When the proportion of the distribution beyond the sample mean in the tail (p- value) is small, then the t-value is high o When p is low H0 has to go o Remember what to do with p-value in SPSS for directional H1 à divide by 2 before comparing to alpha ns (not significant) or p ≥ 0.05 Ways to remember rejection of H0/acceptance of H1 z-test: Zobtained > Zcritical t-test: Tobtained > Tcritical t-test on SPSS: p-value < alpha à when p is low, H0 has to go The differences and similarities between z Test and t Test z Test t Test What is the obtained Z statistic, p value T statistic, p value value? What distribution is used Normal distribution T distribution to locate the probability of obtaining a sample mean? What is the denominator Standard error Estimated standard error of the test statistic? Do we know the Yes No. The sample variance is population variance? - used to estimate the standard deviation population variance Are df required? No, because the Yes. df = n-1 population variance is known What does the test The probability of The probability of measure? obtaining a measured obtaining a measured sample outcome sample outcome What can be inferred from Whether H0 should be Whether H0 should be the test? retained or rejected retained or rejected The characteristics of hypothesis testing and estimation Significance testing Point/interval estimation Do we know the Yes. It is stated in the H0 No, we are trying to population mean? estimate it What is the process used The likelihood of obtaining The value of a population to determine? a sample mean mean What is learned? Whether the population The range of values within mean is likely to be which the population correct mean is likely to be contained What is our decision? To retain or reject H0 The population mean is estimated – there is no decision per se In blue or highlighted: what was said or emphasized by the lecturers Questions about the proportion of cases in a population that have a value below/above/between given values: z-transformation and z-table +−" R= K Questions about proportion of sample means from a population, that falls below/above/between a given mean value(s): z-transformation and z-table (−" R= K √2 Hypothesis testing (inferential statistics) o H0: the population mean has certain value (K known = one-sample z- test / K unknown = one sample t-test) o H0: The population means of two groups are equal (two-sample t-test – in SPSS – Levine’s) o H0: the population means of more than two groups are equal (ANOVA – F-test – in SPSS) o H0: the ordinal/nominal variable has a certain distribution in population (Chi-square test for goodness of fit) o H0: two ordinal/nominal variables are not related to each other in population (Chi-square test for independence) Estimating confidence intervals for population mean S Use z-formula if K known: " = ( ± R( ) √) ˆS Use t-formula if K unknown: " = ( ± =( ) √) ANOVA The F-test is the main test of significance of the ANOVA It is used to compare the means of more than two groups The H0 of the F-test states that all population group means are equal o H1 is always nondirectional here If we reject H0 we need to look at the post hoc test (Scheffe) to determine which group means differ significantly from each other and how The value of the f test in an ANOVA is dependent on the variance between group means and the variance within groups The f value obtained gets larger making it easier to reject H0 o When group means differ a lot from each other (large variance between groups o When the variances within groups are small (small variance within groups) Things you should know about regression analysis We use regression analysis in situations where we want to test a causal model with one or more IVs and one DV, all measured on interval/ratio level R2 tells us the proportion of the variance in the DV that can be explained by the model (i.e. by the IV(s)) o R2 is dependent on the amount of total variance SST and the amount of variance that is explained by the model SSM In blue or highlighted: what was said or emphasized by the lecturers The intercept (constant in SPSS) tells us the predicted value of the DV when all IVs are zero The unstandardized coefficients tell us the change in the predicted value of the DV when the given IV increases by one unit, with the effect of the other IVs held constant The standardized coefficients are used to compare/make statements about the strength of effects (standardized as it lies -1 and 1, for simple regression) One F-test is calculated for the model as a whole (H0: in the population, R2 = 0; cannot be predicted by ) One t-test is calculated for each individual IV (H0: in the population, the effect of is zero) Regression analysis is based on a number of assumptions, whose violations can lead to false conclusions about the sample data and the population (e.g.: type II errors) General assumption & outlook For all covered tests, we assume our sample to be drawn randomly (simple random samples) For all covered parametric tests o we assume the dependent variable to be measured on interval/ration level o we assume the variables to be normally distributed. However, this normality assumption may be relaxed in situations with large n SPSS Altering dataset o Select casas (data – select cases) o Recode into different variable (transform – record into different variable) o Compute variable Visualizing the data set o Frequency tables (with bar charts) (analyze – descriptive statistics – frequencies) o Scaterplots (graphs – legacy dialogs – scatter/dot) Testing the data-set o Calculate Pearson’s r (analyze – correlate – bivariate) o Calculate crosstabs (analyze – descriptive statistics – crosstabs) o One-sample t-test (analyze – compare means – one-sample t-test) o Chi-square goodness-of-fit (analyze – nonparametric tests – legacy dialogs – chi-square) o Two-sample t-test (analyze – compare means – independent samples t- test) ANOVA o Analyze – compare means – one-way ANOVA § Factor = independent variable o Post hoc – Scheffe o Options – descriptive Regression analysis In blue or highlighted: what was said or emphasized by the lecturers o We use it in situations where we want to test a causal model with one or more IVs and one DV, all measured on interval/ratio level o R2 tells us the proportion of the variance in the DV that can be explained by the model (by the IVs). Concept behind formula. o The intercept (constant) tells us the predicted value of the DV when all IVs are zero o The unstandardized coefficients (slope) tell us the change in the predicted value of the DV when the given IV increases by one unit, with the effects of the other IVs held constant o The standardized coefficients are used to compare/make statements about the strength of effects o One F-test is calculated for the model as a whole (H0: in the population, the R2 = 0, the IVs are not useful predicting the DV) o One t-test is calculated for each individual IV (H0: in the population, the effect of that IV is zero) o Regression analysis is based on a number of assumptions, whose violations can lead to false conclusions § Linearity = linear relationship § Lack of multicollinearity = they should be no strong linear relationship between any two IVs. No IVs should correlate stronger than r=0.80 Diagnostics available in SPSS that test for multicollinearity Multicollinearity can lead to inflated p-values of t-test (type 2 error); underestimation of R2; unreliable coefficients § Homoscedasticity = we assume the variances of the residuals to be constant for all values of the IVs (residuals = distance between points and the linear regression line) The accuracy of our predictions should nor depend on the value of one or more IVs o Interaction effects: the effect that one IV has on the DV, is dependent on other IV

Use Quizgecko on...
Browser
Browser