T-tests and Chi-Squared tests

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In hypothesis testing, what does NHST stand for, and what is its primary focus?

Null Hypothesis Significance Testing; understanding how to assess the null hypothesis.

Besides effect size, what other measure should you consider to assess a study, in addition to p-values?

Statistical power.

Why is understanding the limitations of hypothesis testing important when interpreting research results?

To prevent overreliance on p-values, and to consider the potential for errors in the study.

What type of variable is most appropriate for using the t-test?

<p>Continuous variables.</p> Signup and view all the answers

For what type of data is the Chi-squared test most often used?

<p>Categorical data.</p> Signup and view all the answers

How do t-tests and chi-squared tests contribute to inferential statistics?

<p>They play a crucial role in inferential statistics and serve as building blocks for more advanced statistical techniques.</p> Signup and view all the answers

Explain the fundamental goal when comparing two variables using bivariate tests.

<p>Determining if membership in one group is associated with a specific outcome or characteristic in another variable.</p> Signup and view all the answers

Why are t-tests and chi-squared tests considered foundational tests in statistics?

<p>Because they play a significant role in comparing groups and making bivariate comparisons.</p> Signup and view all the answers

What specific characteristic of a variable makes it appropriate for analysis using a t-test?

<p>It should be normally distributed.</p> Signup and view all the answers

Describe the purpose of an independent two-sample t-test.

<p>To determine if the means of two independent groups are significantly different.</p> Signup and view all the answers

Define the null and alternative hypotheses of an independent samples t-test when comparing the means of two groups (G1 and G2).

<p>Null hypothesis (H0): μG1 = μG2; Alternative hypothesis (HA): μG1 ≠ μG2</p> Signup and view all the answers

In a t-test, what is assumed about the distribution of the difference between the sample means ($X_{G1} - X_{G2}$) if the null hypothesis is true?

<p>The difference is normally distributed around zero.</p> Signup and view all the answers

Why is the t-distribution used instead of the normal (Z) distribution when conducting a t-test?

<p>Because the population standard deviation is typically unknown and must be estimated from the sample.</p> Signup and view all the answers

Explain the concept of degrees of freedom (df) in the context of a t-distribution.

<p>The number of independent pieces of information available to estimate the standard deviation.</p> Signup and view all the answers

Explain why the t-distribution is wider and shorter than the standard normal (Z) distribution.

<p>To capture the uncertainty in estimating the standard deviation from a small sample.</p> Signup and view all the answers

As the degrees of freedom increase, how does the t-distribution change, and to what distribution does it become more similar?

<p>It becomes more similar to the standard normal (Z) distribution.</p> Signup and view all the answers

What is the purpose of standardizing the signal (difference in means) in a t-test?

<p>To account for noise and divide the signal by the standard error of the mean.</p> Signup and view all the answers

Define the standard error of the mean and explain why it's considered a "conservative" estimate.

<p>It's the estimate of the standard deviation of the sample mean, and conservative because it uses sample data instead of the population-level standard deviation.</p> Signup and view all the answers

Explain the influence of a one-tailed versus a two-tailed t-test on where the rejection region is placed.

<p>One-tailed concentrates rejection region to one side, two-tailed splits rejection region to both sides.</p> Signup and view all the answers

What does a p-value less than 0.05 typically indicate in the context of hypothesis testing?

<p>Strong evidence against the null hypothesis.</p> Signup and view all the answers

Describe one of the main assumptions that must be met when conducting a t-test.

<p>The data must be drawn from a random sample.</p> Signup and view all the answers

What does the Shapiro-Wilk test assess and why is it used in the context of t-tests?

<p>It assesses the normality of the data, which is important because t-tests assume the data is normally distributed.</p> Signup and view all the answers

What is the Levene test used for, and what assumption of the t-test does it help to check?

<p>It tests the assumption of homogeneity of variance.</p> Signup and view all the answers

State the null and alternative hypotheses in a Chi-squared test regarding the relationship between two categorical variables, X and Y.

<p>Null hypothesis (H0): X and Y are independent; Alternative hypothesis (HA): X and Y are not independent.</p> Signup and view all the answers

What is a contingency table (or crosstab), and how is it used in the context of a Chi-squared test?

<p>A table that displays the frequencies of two categorical variables. It is the basis for calculating the Chi-squared statistic.</p> Signup and view all the answers

Explain how expected values are calculated in a Chi-squared test when assessing the independence of two categorical variables.

<p>They are calculated by multiplying row and column totals and dividing by the total sample size.</p> Signup and view all the answers

Describe the purpose of calculating a Chi-squared score.

<p>To identify if the observed counts are similar or different than the expected counts.</p> Signup and view all the answers

Explain how degrees of freedom are determined in a Chi-squared test involving a contingency table.

<p>df = (# of rows - 1) * (# of columns - 1).</p> Signup and view all the answers

What does a Chi-squared distribution represent, and how is it related to the Z-distribution?

<p>It results from the sum of squared standard normal variables.</p> Signup and view all the answers

What are the assumptions of the Chi-squared test?

<p>X and Y are both categorical; levels of X and Y are mutually exclusive; each observation is independent; expected value for each cell should be 5 or greater for 80% of cells and must be at least 1 for every cell.</p> Signup and view all the answers

For a chi-squared test, the expected value for each cell should meet what criteria?

<p>Each cell should be 5 or greater for 80% of cells and must be at least 1 for every cell.</p> Signup and view all the answers

If a study violates the expected value conditions for performing a Chi-squared test, what adjustments or alternative tests might be considered?

<p>Collapsing categories or exact tests.</p> Signup and view all the answers

What does ANOVA stand for, and in what situation is it applicable?

<p>Analysis of Variance; comparing group means of three or more groups.</p> Signup and view all the answers

Describe the general logic behind ANOVA.

<p>Compares between-group variation to within-group variation to assess differences between group means.</p> Signup and view all the answers

In ANOVA, what does the null hypothesis typically state?

<p>The means of all groups equal each other.</p> Signup and view all the answers

What assumptions are typically made regarding the variable X when conducting an ANOVA?

<p>Each observation must be independent; X must be a normally distributed variable within each group; the distribution of X for each group must have the same variance.</p> Signup and view all the answers

In ANOVA, why is it important that the distribution of the variable X for each group has the same variance?

<p>Maintaining equal variances ensures that group differences are not due to variance alone, but to actual different mean values.</p> Signup and view all the answers

When is a t-test typically used, and what kind of variable is required for its application?

<p>A t-test is commonly used to determine if group membership is associated with different values of a normally distributed variable.</p> Signup and view all the answers

What are the null and alternative hypotheses in an independent samples t-test?

<p>Null hypothesis (H0): the population means of group 1 and group 2 are equal. Alternative hypothesis (HA): the population means of group 1 and group 2 are not equal.</p> Signup and view all the answers

Explain why the t-distribution is 'wider' and 'shorter' than the Z-distribution.

<p>The t-distribution is wider and shorter than the Z-distribution because the standard deviation is derived from the sample, meaning there is inherently more uncertianty within the standard deviation. This is especially true when there are fewer observations.</p> Signup and view all the answers

What do degrees of freedom represent in the context of a t-distribution?

<p>Degrees of freedom represent the amount of data available to calculate the variability (i.e., standard deviation) of the data, effectively indicating the number of independent pieces of information used to estimate a parameter.</p> Signup and view all the answers

How is the t-statistic calculated, and how does dividing by the standard error of the mean influence the distribution?

<p>The t-statistic is calculated as the difference between the sample means of two groups divided by the standard error of the mean. Dividing by the standard error maps the signal onto the t-distribution.</p> Signup and view all the answers

When is a one-sample t-test appropriate, and what is the null hypothesis in such a test?

<p>A one-sample t-test is appropriate when comparing the sample mean of a single group to a known or predefined value. The null hypothesis is that the mean of the population, from which the sample is drawn, is equal to this predefined value.</p> Signup and view all the answers

In a paired samples t-test, what is being compared, and what does 'd' represent in the context of the null hypothesis?

<p>A paired samples t-test compares the mean of a variable for one group at two different time points. In the null hypothesis, 'd' represents the mean difference in measurement between time 1 and time 2, and it is assumed to be 0.</p> Signup and view all the answers

Under what circumstances is the Chi-squared test used?

<p>The Chi-squared test is used to assess whether two categorical variables are independent.</p> Signup and view all the answers

Differentiate between the null and alternative hypothese in a Chi-squared test.

<p>Null hypothesis (H0): the two categorical variables are independent. Alternative hypothesis (HA): the two categorical variables are not independent.</p> Signup and view all the answers

Explain how observed and expected frequencies are compared in the Chi-squared test.

<p>The test compares observed patterns in the distribution of the two categorical variables with what would be expected if the null hypothesis (independence) were true. Large differences suggest the variables are not independent.</p> Signup and view all the answers

In the context of a Chi-squared test, how are expected values calculated for each cell in a contingency table?

<p>Expected values are calculated by multiplying the number of people in that row by the number of people in that column, and dividing by the total number of observations i.e. individuals.</p> Signup and view all the answers

Describe how the Chi-squared score influences the p-value and how this relates to the observed and expected counts.

<p>The larger the difference between the observed and expected counts, the higher the Chi-squared score (and the lower the corresponding p-value).</p> Signup and view all the answers

Define what a chi-squared distribution is.

<p>A chi-squared distribution is a specific type of probability distribution that arises from squaring a stand Z-distribution.</p> Signup and view all the answers

In general terms, explain the relationship between the Z-distribution and the Chi-squared distribution.

<p>If you have $k$ independent variables that follow the Z-distribution. And create a new variable $Y$ by summing the squares of of all $k$ variables, we say that $Y$ is distrubuted according to the chi-squared distribution.</p> Signup and view all the answers

What do degrees of freedom signify in the context of a Chi-squared distribution, and how are they determined??

<p>Degrees of freedom (df) in Chi-squared reflect the number of independent pieces of information available to estimate a distribution, calculated as $df = (# \text{ of rows} - 1) * (# \text{ of columns} - 1)$.</p> Signup and view all the answers

List the assumptions that must be met for an appropriate application of the Chi-squared test.

<p>Both variables must be categorical, the levels of the variables must be mutually exclusive, each observation must be independent, and the expected value for each cell should be 5 or greater for 80% of cells and must be at least 1 for every cell.</p> Signup and view all the answers

Briefly introduce one-way ANOVA and its relation to the F-distribution, differing from that of the t-test or Chi-squared test.

<p>One-way ANOVA compares group means of three or more groups and determine if they are all the same or if they differ in some way. Differing from the t-test of Chi-squared test, which stems from the Z-distribution, the F distribution is used in ANOVA, and the F distribution arises by taking the ratio of two chi-squared distributed variables.</p> Signup and view all the answers

In the context of a One-Way ANOVA, what is the null hypothesis?

<p>The null hypothesis ($H_0$) states that the means of all groups being compared are equal i.e. $μG1 = μG2 = ... = μακ$.</p> Signup and view all the answers

Outline the assumptions that must be met to ensure the validity of One-Way ANOVA.

<p>Each observation must be independent, the variable must be normally distributed within each group, and the distribution of the variable for each group must have the same variance.</p> Signup and view all the answers

In the context of ANOVA, describe the difference between between-group variation and within-group variation.

<p>Between-group variance is differences among group means whereas within-group variation is the variability within each group.</p> Signup and view all the answers

Flashcards

What is a T-test?

A statistical test used to determine if there is a significant difference between the means of two groups.

What is a Chi-squared test?

A statistical test used to determine if there is a significant association between two categorical variables.

What is ANOVA

A statistical procedure that tests the null hypothesis that the population means of all groups are equal.

What is a p-value?

The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained, assuming the null hypothesis is true.

Signup and view all the flashcards

What is a grouping variable?

A variable that places observations into groups (e.g., trial/control, men/women).

Signup and view all the flashcards

What is the null hypothesis?

The hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.

Signup and view all the flashcards

What is the t-distribution?

A variation of the standard normal distribution used when the population standard deviation is unknown and estimated from the sample.

Signup and view all the flashcards

What are degrees of freedom?

The number of independent pieces of information available to estimate a parameter.

Signup and view all the flashcards

What is the standard error of the mean?

The standard deviation of the sampling distribution of a statistic

Signup and view all the flashcards

What is Chi-squared distribution?

The distribution that rises by squaring Z-distribution.

Signup and view all the flashcards

Study Notes

Module 2 Recap

  • Module 2 covered understanding Null Hypothesis Significance Testing (NHST).
  • Module 2 covered evaluating effect sizes and power in statistical tests.
  • It also included recognizing errors and limitations in hypothesis testing.

Introduction to Module 3

  • Module 3 explores and confirms multivariable associations and outcome distributions.
  • Week 4 lecture covers associations and two statistical tests for comparing groups: t-tests and Chi-Squared tests.
  • T-tests are for means of continuous variables and Chi-Squared tests are for categorical associations.
  • Both tests play a crucial role in inferential statistics and serve as building blocks for more advanced statistical techniques.
  • ANOVA will be previewed.

Bivariate Associations

  • Today's lecture is about comparing two variables.
  • Often, there is a variable that places observations into groups and group membership associated with another variable is examined.
  • An example of this is looking at PhD students' anxiety levels compared to undergrads, or seeing if men are more prone to binge drinking than women.
  • Methods are needed to ask such questions.

Two Foundational Tests Overview

  • The t-test and chi-squared test are inferential tests.
  • They are used for comparing groups and making bivariate comparisons.
  • The tests play roles such as determining the significance of regression coefficients.
  • The goal is to understand these tests and their logic, and how to anticipate using them.

Student's T-Test

  • The t-test is for cases that use a normally distributed variable X and the goal is to determine if specific group membership is is associated with different values of X.
  • Three common variations of the t-test are one sample t-test, independent two sample t-test, and paired t-test.
  • The focus is on the independent two sample t-test and then discuss the others after.

Independent Samples T-Test

  • Have a normally distributed random variable X and two groups, G1 and G2.
  • Goal is to determine if the population-level mean of X for G1 and G2 are the same or different.
  • The t-test defines null and alternative hypotheses: H0: μG1=μG2, HA: μG1≠μG2.
  • The null hypothesis is that the population means are equal and the alternate is that they are not.

Logic of the T-Test

  • Hypotheses can be represented as H0: μG1−μG2=0 and HA: μG1−μG2≠0.
  • Collect a sample of individuals, identify group G1/G2, and measure X for each person to run the study.
  • Calculate sample mean values for each group: 𝑥ҧ𝐺1 & 𝑥ҧ𝐺2.
  • Calculate 𝑥ҧ𝐺1 − 𝑥ҧ𝐺2 from here.
  • When the null hypothesis is true, the most likely value for x̄G1 - x̄G2 is 0, values nearing 0 are more likely than values further, and values above 0 equally likely as values less than 0.
  • In other words, the possible values for x̄G1 - x̄G2 appear normally distributed assuming the null is true.

Addressing Unknown Standard Deviation

  • Normal distribution is defined by two population-level parameters, the mean μ and the standard deviation σ.
  • While X is normally distributed, σ is often unknown.
  • This is fairly common in drug use epidemiology, when our populations are understudied (or difficult to fully capture): such as people who inject; undergraduates who vape; etc.
  • A new distribution developed because of this.

The T-Distribution Defined

  • The t-distribution is a variation of the standard normal distribution (Z-distribution).
  • Similar to the Z-distribution, the t-distribution has a mean value of 0 and is symmetrical around the mean.
  • The t-distribution is a little bit “wider” and bit “shorter” than the Z-distribution.
  • Standard deviation is derived from the sample.

Degrees of Freedom

  • The t-distribution is defined in terms of “degrees of freedom”.
  • The more degrees of freedom exist to define the t-distribution, the more similar it becomes to the Z-distribution.
  • Degrees of freedom represent the amount of data available to calculate the variability of data.
  • Degrees of freedom refer to the number of parameters that are able to vary freely given some assumed outcome.
  • If there are 100 participants and their mean age is 60 years old, there are infinite possibilities for how age can be distributed throughout this group.
  • However, if 99 of their ages are known, the final person's age is fixed.
  • To calculate the mean value, one observation cannot "vary freely".
  • In this example, n = 100 observations and 1 df to calculate the mean must be spent.
  • The normal distribution is defined by a mean value and a standard deviation.
  • With n observations, one degree of freedom must be spent to identify the mean value.
  • There are (n-1) degrees of freedom left to calculate the standard deviation.
  • The t-distribution is defined by (n-1) degrees of freedom as standard deviation is calculated from the sample.
  • The more observations, the more degrees of freedom to inform the t-distribution.

Degrees of Freedom and Distribution Certainty

  • The t-distribution is intended to capture uncertainty in standard deviation measurement from a small sample.
  • The fewer df, the less certain that measured standard deviation s represents population-level standard deviation σ.
  • Therefore, the t-distribution is “shorter” and “wider” than the Z-distribution.
  • Values further from 0 become more probable due to less certainty about the standard deviation.

Mapping Tests

  • The t-test is nearly identical to the z-test, but the signal is mapped onto the t(n-1)-distribution.
  • First, calculate the signal 𝑥ҧ𝐺1 − 𝑥ҧ𝐺2.
  • The signal must then be standardized by dividing it by the noise.
  • The t-test divides the signal by the standard error of the mean.

Standard Error of the Mean Formula

  • The standard error of the mean is a "conservative" estimate of the standard deviation because the population level standard deviation is unknown.
  • SE = s / √(nG1 + nG2)
  • Where s is the standard deviation of X in the sample.

Calculating the T-Statistic

  • The test statistic, t, is calculated as the following:t = (x̄G1 − x̄G2) / SE = (x̄G1 − x̄G2) / s * √(1/nG1 + 1/nG2)
  • Taking the signal and mapping it onto the t-distribution with nG1 + nG2 - 2 degrees of freedom is achieved by dividing by the standard error of the mean.
  • Degrees of freedom are nG1 − 1 to calculate the standard deviation for G1 and nG2 − 1 for G2.

T-Distribution Test Statistic Mapping

  • A test statistic is mapped onto the appropriate t-distribution.
  • Supposing G1 and G2 both have 100 people
  • In this case, the averages for G1 is x̄G1 = 21, with G2 is x̄G2 = 22 and a pooled standard deviation of 3.
  • Then t = (21-22)/(3*sqrt(1/100 + 1/100) with t ≈ -2.36.
  • Map this value to a t-distribution with 198 degrees of freedom.

P-Value in T-Test

  • If the calculated p < 0.05, then we consider this significant evidence against our null hypothesis.
  • This indicates the signal (or a more extreme signal) is observed less than 5% of the time if the null were true.
  • Provides evidence that the null hypothesis is not true.

Assumptions for T-Test to be Valid

  • The variable of interest, X, must be measured on an ordinal or continuous scale
  • Data must be drawn from a random sample and the two groups being compared must be independent
  • X must be normally distributed. As sample size increases, this assumption becomes weaker. the t-test is more robust to violating this assumption.
  • The variance of X in both groups must be the same. In other words, the standard deviation of X in both groups must be roughly equal.

One sample t-test

  • Compares the mean of X for one group to some pre-defined level, y.
  • Null hypothesis is that: H₀: μ = y.
  • T-score then can be calculated as follows: t = (x̄ - y) / (s / √(1/n)).
  • Compare tot t-distribution with n-1 degrees of freedom.

Paired Samples T-test

  • A t-test can compare the mean of X for one group at time 1 versus at time 2.
  • Null hypothesis is that H₀: d = 0.
  • "d" represents the difference in measurement from time 1 and time 2.
  • Sample mean x̄ and the sample standard deviation "s" of the differences are used to calculate the t-score as follows: t = d / (s / √(1/n)
  • Compare this to a t-distribution with n-1 degrees of freedom.

Chi-Squared Test

  • Assesses if two categorical variables, X and Y, are independent.
  • The null hypothesis is that X and Y are independent; the alternative hypothesis is that X and Y are not independent.
  • This test compares observed patterns in the distributions of X and Y to what is expected if the null hypothesis of independence is true.
  • The frequencies of two categorical variables are examined at the same time (contingency table or crosstabs).
  • For each cell, the number of people in that row is multiplied by the number of people in that column, then divided by the total number: (Nrow * Ncolumn) / n.
  • The goal of the chi-squared test is to identify if the observed counts are similar or different than the expected counts.
  • The greater the difference between the observed and expected counts, the higher is the chi-squared score (and the lower is the p-value).

Chi-Squared Distribution Overview

  • Normarl distributions arisde from understanding certain natural phenomenon.
  • Chi-squared distribution with one degree of freedom is the square of the Z-distribution.
  • This means that for any point (x,y) on the Z distribution, it gets mapped onto (x², y²) on the chi-squared distribution.

Chi-Squared General Distribution

  • Consider if there exist k random variables X1,X2...XK which are independent and follow the Z distribution. A new variable Y can have a value calculated by the sum of each random variance such that Y=∑xi^2
  • It follows Y is understood to be distributed/according to the chi-squared distribution with k degrees of freedom

Test Statistic as Sum of Squares

  • The test calculates: χ² = Σ((Observed - Expected)² / Expected).
  • This is a sum of squares!! So, this follows a chi-squared distribution.
  • When such a tesst sums 8 squares (4 years x 2 housing options), it is not indepedent and requires degrees of freedom.

Degrees of Freedom for Test

  • The value for the test is calculated as the difference between what is expected assuming the null versus what is actually observed.
  • Degrees of freedom (df) is calculated as: df = (# of rows - 1) * (# of columns - 1)
  • Begin filling in such a table with one entry at a time.
  • The degrees of freedom is the number of pieces of information needed to be known in order to fill out the entire rest of the table.
  • P is less than p < 0.00001 and is obtained by taking the area under the curve of the chi-squared distribution with 3 degrees of freedom!

Chi-Squared Test Assumptions

  • The X and Y variables must be categorical.
  • Levels of X and Y are mutually exclusive. Each participant must belong to one and only one level of each.
  • Each observation must be independent (data is drawn from a random sample).
  • The expected value for each cell should be 5 or greater for 80% of cells and must be at least 1 for every cell.

ANOVA

  • ANOVA is touched upon.
  • F distribution arises by taking the ratio of two chi-squared distributed variables.
  • ANOVA allows comparison of group means of three or more groups (extending t-test) and determine if they are all the same or if they differ in some way.

One-Way ANOVA

  • This measures a normal random variable X and there are k groups G1, G2, ..., Gk. Determines if the mean value across each group is the same or different.
  • The null hypothesis is H₀: μG1 = μG2 = ... = μGk
  • The alternate hypothesis says that they do not all equal each other. This could be all of them being different or even just one.

Variance for One-Way ANOVA

  • Logic of One-Way Anova compares between-group variation and within-group variation.

Assumptions for ANOVA

  • Each observation must be independent.
  • X must be a normally distributed variable within each group.
  • The distribution of X for each group must have the same variance.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser