Effect Size, Normality, and the Central Limit Theorem
10 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Explain the difference between measuring effect size relative to difference scores versus original variables. Why might a researcher choose one over the other?

Measuring effect size relative to difference scores focuses on the magnitude of change within the specific context of the comparison. Measuring effect size relative to original variables assesses the practical significance of the change in relation to the overall variability of the original data. If the practical consequences really matter, the effect size relative to original variables may be preferred.

Describe a scenario where a variable might be highly non-normal, and explain why this non-normality might occur.

Response time (RT) data is often non-normal because it represents the minimum time it takes for one of many potential triggers to elicit a response. This often results in a skewed distribution.

What is the purpose of using a QQ plot? How does it help in assessing the normality of a sample?

A QQ plot is used to visually check if a sample violates the assumption of normality. It plots observed quantiles against theoretical quantiles from a normal distribution. Systematic deviations from a straight line suggest non-normality.

Explain why the Central Limit Theorem often leads to real-world quantities being normally distributed. What condition needs to be met for this to occur?

<p>The Central Limit Theorem states that the distribution of the sum (or average) of a large number of independent, identically distributed random variables will be approximately normal, regardless of the original distribution. This applies when a variable is an average of many different things.</p> Signup and view all the answers

If you suspect that your data violates the normality assumption required for a t-test, what are two methods you could use to assess whether this assumption is seriously violated?

<p>You could use a QQ plot to visually inspect for systematic deviations from normality. Alternatively, the Shapiro-Wilk test can be used to statistically test the null hypothesis that the data are normally distributed.</p> Signup and view all the answers

In the context of ANOVA, explain why the term 'analysis of variance' can be considered misleading.

<p>The term is misleading because ANOVA is primarily used to investigate differences in means between groups, even though the technique's name refers to variances.</p> Signup and view all the answers

Describe a scenario where a one-way ANOVA would be an appropriate statistical test. What specific type of data is needed for this test?

<p>A one-way ANOVA is appropriate when comparing the means of several different groups of observations for a single outcome variable of interest. This test is used with a continuous outcome variable and a categorical predictor variable (grouping variable).</p> Signup and view all the answers

In a clinical trial studying the effectiveness of an antidepressant, what is the purpose of including a placebo group and an existing drug (like Anxifree) group, in addition to the new drug (Joyzepam) group?

<p>The placebo group serves as a control to account for the placebo effect, while the existing drug group (Anxifree) provides a benchmark to compare the effectiveness of the new drug (Joyzepam).</p> Signup and view all the answers

Explain the role of 'post hoc tests' in ANOVA. Why are they used, and what problem do they address?

<p>Post hoc tests are used after a significant ANOVA result to determine which specific group means are significantly different from each other. They address the problem of inflated Type I error rates (false positives) that occur when performing multiple pairwise comparisons.</p> Signup and view all the answers

Briefly describe the relationship between ANOVA and t-tests. In what situation(s) would ANOVA be preferred over a series of t-tests?

<p>Both ANOVA and t-tests compare means, but ANOVA is preferred over multiple t-tests when comparing three or more group means. Running multiple t-tests increases the risk of Type I error, while ANOVA controls for this inflated error rate.</p> Signup and view all the answers

Flashcards

Cohen's d (Original Variables)

Measures effect size compared to original variables, not just difference scores. Use when practical consequences relative to original scales matter.

Normality Assumption Rationale

Many real-world variables tend to be normally distributed due to the central limit theorem, especially if they're averages of many factors.

Non-Normality of Response Time (RT)

Response time data is often non-normal because the response occurs at the first trigger out of many possibilities.

QQ Plot

A visual tool to check if a sample's distribution significantly deviates from normality.

Signup and view all the flashcards

QQ plot (Normality)

Checks for systematic violations of normality assumption.

Signup and view all the flashcards

One-Way ANOVA

A statistical test to compare the means of several groups.

Signup and view all the flashcards

ANOVA Purpose

Concerned with investigating differences in means across groups.

Signup and view all the flashcards

Clinical Trial Example

A study design with three groups: placebo, existing drug, and new drug.

Signup and view all the flashcards

ANOVA Developer

Sir Ronald Fisher in the early 20th century.

Signup and view all the flashcards

One-Way ANOVA Focus

Compares groups based on some outcome variable of interest.

Signup and view all the flashcards

Study Notes

Paired Samples T-Test

  • A paired samples t-test expects two variables, x and y, and requires specifying paired=TRUE.
  • It verifies the first element of x and y correspond to the same subject, since there is no "id" variable.
  • The command t.test(x = chico$grade_test2, y = chico$grade_test1, paired = TRUE) performs a paired samples t-test on Dr. Chico's class data.
  • The output includes the t-statistic, degrees of freedom, p-value, confidence interval, and the mean of the differences.
  • Results match those calculated in Section 13.5

Effect Size - Cohen's d

  • Cohen's d is a common measure of effect size for t-tests.
  • In the context of a Student's t-test, it is calculated as the difference between the means divided by an estimate of the standard deviation. -d = (mean 1) - (mean 2) / std dev
  • Cohen's d has a natural interpretation.
  • It describes the difference in means as the number of standard deviations separating them.

Interpreting Cohen's d

  • Around 0.2 represents a small effect.
  • Around 0.5 represents a moderate effect.
  • Around 0.8 represents a large effect.
  • Context is important when thinking about the size.
  • A small effect can have practical importance.
  • A large effect might not matter in some situations.
  • The cohensD() function in the 1sr package is used.
  • It uses the method argument to distinguish between them.

Cohen's d - One Sample

  • When running a t-test with oneSampleTTest, independentSamplesTTest, and pairedSamplesTTest(), new commands are not needed because they automatically produce an estimate of Cohen's d as part of the output
  • If using t.test(), the cohensD() function (in the 1sr package) is needed.
  • In the case of a one-sample t-test comparing a sample mean X to a population mean µ₀, Cohen's d is calculated as: d = (𝑋 - μ₀) / 𝜎̂
  • x is a numeric vector with sample data.
  • mu is the mean against which x is being compared (defaults to mu = 0)

Calculating Cohen's d - Zeppo's class

  • cohensD(x = grades, mu = 67.5) calculates the effect size for data from Dr. Zeppo's class.
  • Results indicate students achieved 72.3%, about 0.5 standard deviations higher than the expected level (67.5%).
  • This is considered a moderate effect size.

Cohen's d - Student t-test

  • Focuses on the situation analogous to that of the Student's independent samples t-test.
  • Several versions of d can be used.
  • The method argument to the cohensD() function picks one of the versions
  • Population effect size delta = (𝜇₁ - 𝜇₂) / 𝜎
  • 𝜇₁ and 𝜇₂ are the population means for groups 1 and 2
  • sigma is the standard deviation

Estimating d

  • d = (𝑋₁ - 𝑋₂) / 𝜎̂_p
  • 𝑋₁ and 𝑋₂ are the sample means
  • 𝜎̂_p is the pooled standard deviation
  • Commonly used version is referred to as Hedges' g statistic
  • method = "pooled" (the default, in the cohensD() function.)
  • Glass' ∆ is when you only want to use one of the two groups as the basis for calculating the standard deviation
  • Use method = "x.sd" or method = "y.sd" when using cohensD() function, the groups are one of two groups is a purer reflection of "natural variation" than the other
  • method = "raw" is when correction omitted, divide N
  • Primarily to calculate the effect size in the sample, rather than estimating an effect size in the population.
  • method = "corrected" Multiplies the value of d by (N − 3)/(N – 2.25)
  • Based on Hedges and Olkin (1985), pointing out there is a small bias in the usual (pooled) estimation for Cohen's d.
  • Command example cohensD(formula = grade ~ tutor, data = harpo, method = "pooled" )
  • This outputs Cohen's d: [1] 0.7395614

Cohen's d - Welch test

  • The situation is more like the Welch test: have two independent samples, where the corresponding populations have equal variances
  • New measure: delta prime = (𝜇₁ - 𝜇₂) / 𝜎', so as to keep it distinct from the measure delta
  • What Cohen (1988) suggests: could define our new population effect size by averaging the two population variances.
  • unequal calculates d
  • formula = grade ~ tutor, data = harpo, method = "unequal"

  • his is the version of Cohen's d that gets reported by the independentSamplesTTest() function whenever it runs a Welch t-test.

Cohen's d - paired samples test

  • What should we do for a paired samples t-test?
  • If is you want to measure your effect sizes relative to the distribution of difference scores, the measure of d that you calculate is just d = D / 𝜎_D
  • cohensD( x = chico$grade_test2, y = chico$grade_test1, method = "paired")

  • Use the versions of Cohen's d the same you use for a Student or Welch test, the same versions of Cohen's d that if is when you care about research

Checking the Normality of the sample

  • Statistical tests assume the data is normally distributed
  • The Central Limit Theorem (Section 10.3.3) often indicates that many real world quantities are normally distributed making t-tests safe to use
  • Normality is not guaranteed, variables can be highly non-normal
  • Data such response time (RT) data is systematically normal

QQ plots

  • A "quantile-quantile” plot (QQ plot) checks if the sample breaks normality
  • Visually determines if there are any systematic violations on sample
  • Each observation is plotted as a single dot.
  • The x co-ordinate is the theoretical quantile that the observation should fall in, if the data were normally distributed (with mean and variance estimated from the sample)
  • The y co-ordinate is the actual quantile of the data within the sample
  • If the data are normal, the dots should form a straight line
  • Generate normal data example

normal.data <- rnorm( n = 100 ) # generate N = 100 normally distributed numbers hist(x = normal.data ) # draw a histogram of these numbers qqnorm( y = normal.data ) # draw the QQ plot

Shapiro-Wilk tests

  • Shapiro-Wilk test (Shapiro & Wilk, 1965) tests something a bit more formal
  • The null hypothesis being tested is N observations is normally distributed.
  • The test statistic is conventionally denoted as W and calculated as follows
  • The test statistic should be test =
  • Conventionally denoted as W

Shapiro-Wilk behavior

  • Small values of W indicate deviation from normality when reporting
  • The test in R, shapiro.test()

Testing non-normal data with Wilcoxon tests

  • What if the data is pretty substantially non-normal, but still test like a t-test?"
  • This situation where you want to use Wilcoxon tests

Wilcoxon tests

  • Has 2 forms: one-sample and two-sample
  • Situation: exact same situations as the corresponding t-tests

Wilcoxon vs T-test

  • Wilcoxon: doesn’t assume normality
  • T-test: Assume normality
  • Wilcoxon: Makes no assumptions about what kind of distribution is involved: calls them nonparametric tests

2 sample Wilcoxon test

Suppose test of awesomeness with groups of people, "A” and "B"

  • File: awesome.Rdata
  • Contains a dataset of only a single frame, called “awesome”
  • Code example: ´load(″awesome.Rdata″)´ ´´print(awesome).

Two sample Wilcoxon use

  • Want to construct a table compares every observer in group A against every observation in group B
  • There are no ties: simple
  • Each time the group A datum is checked: place a check in the table

Wilcoxon test statistic

  • W: number of checkmarks
  • The interpretation of W: qualitatively the same as the interpretation with t or z
  • Two-sided test: you refuse if W is very large/small
  • Directional (that is, one-sided Hypothesis): Only use one or the other

Structure of Wilcox

  • Is structured like the wilcox.test() function should feel familiar by structure now
  • Organized using organizing variables such as formula and data which is how code/comands are organized wilcox.test(formula = scores group, data = awesome)´
  • Just like we saw with test) Function, the test function, the alternative argument can switch sides, plus extra arguments

data vs grouped data

  • ´function allows using x and y arguments when there is data set separated by each group
  • load(″awesome2.Rdata″)´ Score A17.6,13.2,1.3
  • Score B17.6,13.2,1.14
  • Results same as with last run

One sample Wilcoxon test

  • The on sample WIlcoxon Test
  • WIlcoxon test - equivalent to paired samples WIlcoxon test
  • Suppose an effect of happiness of student effect by taking a class in stats

test vs paired samples WIlcoxon test

  • No fundemental differece between doing paired sample test vs doin one sample just usin change scores
  • Simply think about it to make a tabulation test for a complete sample with those positive values that the test would create code: williox.test(X= happiness change, m=0
  • output the WIlcoxon signed Rank test + function

One-sample test and Summarys

  • As seen in the test, there’s a significant effect and statistics affects the
  • Switching test/sample version doesn’t alter the answers Summary

Summary - Tests

  • One sample test is used to compare 1+sample mean is by a population
  • Independent samples t test is used to compare 2 groups, and test the hypothesis that they posses the same mean, with 2 forms: the student test assumes groups share a standard number. The Welch test does not

Tests vs Paired Samples Test

  • Two scores used to determine the same mean, equals doing the differnce from each, by doing one sample with score
  • Cohen- effect size calculations are calculated on with cohens
  • QQ can check data results for normality
  • If it’s non- normal can use wIllioxon test

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

This content explains effect size calculation with difference scores and original variables, scenarios causing non-normality, and the use of QQ plots for normality assessment. It also clarifies the Central Limit Theorem's role in normal distributions and methods to assess violations of normality assumptions for t-tests.

More Like This

RMB (2)
18 questions

RMB (2)

SnazzyClarity avatar
SnazzyClarity
Types of T-Test and Effect Size
15 questions
Statistics: Effect Size and Sample Size
8 questions
Use Quizgecko on...
Browser
Browser