Chi-Square Test: Goodness of Fit and Contingency Table

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What methodological weakness significantly undermines the study's validity, rendering it almost worthless for understanding preferences between humans and robots?

The human participants did not respond honestly to the question to avoid undesirable consequences.

Explain the fundamental approximation underlying the $\chi^2$ test and why it might not hold true when there is only 1 degree of freedom.

The $\chi^2$ test assumes the binomial distribution approximates a normal distribution for large N. This approximation often fails with only 1 degree of freedom, especially in 2x2 contingency tables.

In the context of the $\chi^2$ test, what problem does Yates' continuity correction address, and how does it attempt to resolve it?

Yates' correction addresses the problem of the goodness of fit statistic tending to be "too big" when N is small and df = 1, leading to inflated alpha values. It resolves this by subtracting 0.5 from the absolute difference between observed and expected values in the $\chi^2$ formula.

Describe the potential impact of ignoring the continuity correction when conducting a $\chi^2$ test with one degree of freedom. Focus on how it affects the p-value and the likelihood of Type I error.

<p>Ignoring the continuity correction when df=1 may result in a p-value that is too small which artifically inflates the chance of a Type I error.</p> Signup and view all the answers

Explain the difference between achieving a statistically significant result and obtaining a result with scientific value, using the example of the flawed human-robot preference study.

<p>A statistically significant result means the null hypothesis is rejected, but scientific value requires the data to meaningfully inform the research hypothesis. In the flawed study, a significant result was meaningless due to a major methodological flaw (dishonest responses).</p> Signup and view all the answers

Flashcards

Reactivity Effect

The effect where participants alter their behavior because they know they are being studied.

Significance vs. Value

A statistically significant result doesn't guarantee scientific value if there are methodological flaws.

Continuity Correction

A small adjustment used in chi-square tests with 1 degree of freedom to correct for the approximation of a continuous distribution.

Chi-Square Distribution

The true sampling distribution for the ( X^2 ) statistic is discrete but the ( \chi^2 ) distribution is continuous

Signup and view all the flashcards

Yates Correction Formula

A formula used when calculating for continuity correction.

Signup and view all the flashcards

Study Notes

Goodness of Fit Test & Calculating Expected Frequencies

An approach to calculate expected frequencies relies on the null hypothesis to determine expectations
Calculate based on the true probability
The expected frequency is the probability of choosing an option multiplied by the number of people in a species

Contingency Table & Chi-Square Statistic

The challenge is the null hypothesis does not specify a particular value for probability, which needs estimation from data
Estimate of probability involves dividing the row total by the total sample size
The expected frequency can be expressed as the row total multiplied by the column total, divided by total observations
A test statistic can be defined using the same strategy as the goodness of fit test
The chi-square statistic is the sum of squared differences between expected and observed frequencies, divided by the frequencies

Understanding Degrees of Freedom

Large chi-square values suggest the null hypothesis poorly describes data
Small values suggest a good fit
The null hypothesis should be rejected if the chi-square value is too large
Degrees of freedom relate to the amount of data being analyzed minus constraints
In a contingency table with rows and columns, total observations is the number of observed frequencies

Constraints & Experimental Design

Constraints relate to fixed column totals by experimenter intervention
Null hypothesis having free parameters affects the number of constraints
Each free parameter resembles an additional constraint
Probabilities have constraints because they sum to one

Test Implementation in R

The associationTest function in the 1sr package simplifies testing
A formula is needed to specify variables for cross-tabulation
A data frame name containing such variables is also needed
Similar testing can be done through chisq.test()

Interpreting Association Test Results

The output of the chi-square test includes variables, hypotheses, observed and expected contingency tables
Statistical significance is determined through the X-squared statistic, degrees of freedom, and p-value
The effect size is quantified by the Cramer's V
Significant association indicated a likelihood of different preferences among groups
Statistical significance may not guarantee scientific value due to methodological flaws

Yates' Correction (Continuity Correction)

The Chi-squared tests rely on assumptions and approximations
The chi-squared test is based on how closely the binomial distribution resembles a normal distribution
At 1 degree of freedom the tests tend to be too big, so the p-value can be too small
The correction subtracts 0.5 from each difference between observed and expected values
Redefines the goodness of fit statistic
It is not derived from principled theory, but examining the behavior of the test and observing a better performance
Continuity correction is explicitly noted in the output

Effect Size Measures

Reporting effect size indicates the strength of association or deviation
Common measures include phi statistic and Cramer's V
Phi statistic's value is by dividing X2 by sample size, then taking the square root
Cramer's V adjusts for contingency table size, proposed by Cramér

Advantages of Cramer's V

Cramer's V is calculated using a similar approach to a contingency table
V ranges from 0 (no association) to 1 (perfect association)
The core R packages do not have these functions
The cramersV() function is available in the 1sr package

Key Test Assumptions

Expected frequencies should be sufficiently large
Goal is to have all expected frequencies larger than 5, or at least most above 5
Should be no expected frequencies below 1
Guidelines are rough and somewhat conservative
Data should be independent of one another

Non-Independence Issues

The Chi-square test assumes observations are independent
Non-independence stuffs things up", causing false rejection or retention of the null
Extreme (and extremely silly) examples exist
The cards experiment illustrates related data, and falsely retaining the null
Potential alternatives include the McNemar or Cochran tests, or check out the Fisher exact test

R Functions

goodnessOfFitTest() & associationTest() offer detailed output
chisq.test() is more terse with output
The goodness-of-fit test and test of independence are underpinned by the same mathematics
chisq.test() can run either based on input type
Input a frequency table for a goodness of fit test
Input a cross-tabulation for a test of independence

Fisher Exact Test Basics

Done when cell counts are low
Based on field experiments looking at the emotional status of people accused of witch craft
Not easy to find people being set on fire, so cell counts are often very small
Test works differently than Chi-Square and others that often calculate a value of "p" directly
Tests works fine for small sample sizes in such experiments

Fisher Analysis Basics

Probability is described by a distribution (hypergeometric)
One must calculate the probability of observing this type of table or one more extreme
One must compute the probability of observing a particular table or a table that is "more extreme"
Conceptual difficult is to figure out which contingency table is more extreme than another
The table with the lowest probability, is the most extreme
fisher.test() has basic implementation

McNemar Test Situations

Used when hired to demonstrate how effective advertisements are
For those that intend to vote before and after seeing the advertisements
Obvious" studies will often include many other aspects but study notes consider one simple experiment
Data is expressed via a contingency table

McNemar Test Set-up (cont.)

Null hypothesis would be simple x2 test when simple and independent
But after the test one often has 100 participants and 200 observations, there is an issue
Each person gives you an answer in both the before column and the after column each time, and therefore aren't independent
If voter A says "yes" the first time and voter B says "no," then you'd expect that voter A is more likely to say "yes" the second time than voter B, so you'd expect that voter A is more likely to say "yes" the second time than voter B
Such a test is for non-standard use since there is often little time to talk of such a thing

McNemar Test: Table Setup

McNemar published such solutions but starting with the given data in a slightly different way
This is exactly the same" data but re-written" so that each of our 100 participants appears in only one cell and satisfies assumptions
The is now a contingency table with something near x2-goodness, then becomes a tricky part
Labeling has many considerations

McNemar Null Hypothesis

In a rewritten formula, must consider the test after and before as the same test
For the now available data we are "testing", the row/cot totals come from the same distribution
Testing whether the null hypothesis in McNemar's test is the case will have "marginal homogeneity"
The totals have two distributions

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Description

Explore the calculation of expected frequencies using the null hypothesis. Learn how to apply the Chi-Square statistic. Understand degrees of freedom in statistical tests.