Podcast
Questions and Answers
In hypothesis testing, what does NHST stand for, and what is its primary focus?
In hypothesis testing, what does NHST stand for, and what is its primary focus?
Null Hypothesis Significance Testing; understanding how to assess the null hypothesis.
Besides effect size, what other measure should you consider to assess a study, in addition to p-values?
Besides effect size, what other measure should you consider to assess a study, in addition to p-values?
Statistical power.
Why is understanding the limitations of hypothesis testing important when interpreting research results?
Why is understanding the limitations of hypothesis testing important when interpreting research results?
To prevent overreliance on p-values, and to consider the potential for errors in the study.
What type of variable is most appropriate for using the t-test?
What type of variable is most appropriate for using the t-test?
For what type of data is the Chi-squared test most often used?
For what type of data is the Chi-squared test most often used?
How do t-tests and chi-squared tests contribute to inferential statistics?
How do t-tests and chi-squared tests contribute to inferential statistics?
Explain the fundamental goal when comparing two variables using bivariate tests.
Explain the fundamental goal when comparing two variables using bivariate tests.
Why are t-tests and chi-squared tests considered foundational tests in statistics?
Why are t-tests and chi-squared tests considered foundational tests in statistics?
What specific characteristic of a variable makes it appropriate for analysis using a t-test?
What specific characteristic of a variable makes it appropriate for analysis using a t-test?
Describe the purpose of an independent two-sample t-test.
Describe the purpose of an independent two-sample t-test.
Define the null and alternative hypotheses of an independent samples t-test when comparing the means of two groups (G1 and G2).
Define the null and alternative hypotheses of an independent samples t-test when comparing the means of two groups (G1 and G2).
In a t-test, what is assumed about the distribution of the difference between the sample means ($X_{G1} - X_{G2}$) if the null hypothesis is true?
In a t-test, what is assumed about the distribution of the difference between the sample means ($X_{G1} - X_{G2}$) if the null hypothesis is true?
Why is the t-distribution used instead of the normal (Z) distribution when conducting a t-test?
Why is the t-distribution used instead of the normal (Z) distribution when conducting a t-test?
Explain the concept of degrees of freedom (df) in the context of a t-distribution.
Explain the concept of degrees of freedom (df) in the context of a t-distribution.
Explain why the t-distribution is wider and shorter than the standard normal (Z) distribution.
Explain why the t-distribution is wider and shorter than the standard normal (Z) distribution.
As the degrees of freedom increase, how does the t-distribution change, and to what distribution does it become more similar?
As the degrees of freedom increase, how does the t-distribution change, and to what distribution does it become more similar?
What is the purpose of standardizing the signal (difference in means) in a t-test?
What is the purpose of standardizing the signal (difference in means) in a t-test?
Define the standard error of the mean and explain why it's considered a "conservative" estimate.
Define the standard error of the mean and explain why it's considered a "conservative" estimate.
Explain the influence of a one-tailed versus a two-tailed t-test on where the rejection region is placed.
Explain the influence of a one-tailed versus a two-tailed t-test on where the rejection region is placed.
What does a p-value less than 0.05 typically indicate in the context of hypothesis testing?
What does a p-value less than 0.05 typically indicate in the context of hypothesis testing?
Describe one of the main assumptions that must be met when conducting a t-test.
Describe one of the main assumptions that must be met when conducting a t-test.
What does the Shapiro-Wilk test assess and why is it used in the context of t-tests?
What does the Shapiro-Wilk test assess and why is it used in the context of t-tests?
What is the Levene test used for, and what assumption of the t-test does it help to check?
What is the Levene test used for, and what assumption of the t-test does it help to check?
State the null and alternative hypotheses in a Chi-squared test regarding the relationship between two categorical variables, X and Y.
State the null and alternative hypotheses in a Chi-squared test regarding the relationship between two categorical variables, X and Y.
What is a contingency table (or crosstab), and how is it used in the context of a Chi-squared test?
What is a contingency table (or crosstab), and how is it used in the context of a Chi-squared test?
Explain how expected values are calculated in a Chi-squared test when assessing the independence of two categorical variables.
Explain how expected values are calculated in a Chi-squared test when assessing the independence of two categorical variables.
Describe the purpose of calculating a Chi-squared score.
Describe the purpose of calculating a Chi-squared score.
Explain how degrees of freedom are determined in a Chi-squared test involving a contingency table.
Explain how degrees of freedom are determined in a Chi-squared test involving a contingency table.
What does a Chi-squared distribution represent, and how is it related to the Z-distribution?
What does a Chi-squared distribution represent, and how is it related to the Z-distribution?
What are the assumptions of the Chi-squared test?
What are the assumptions of the Chi-squared test?
For a chi-squared test, the expected value for each cell should meet what criteria?
For a chi-squared test, the expected value for each cell should meet what criteria?
If a study violates the expected value conditions for performing a Chi-squared test, what adjustments or alternative tests might be considered?
If a study violates the expected value conditions for performing a Chi-squared test, what adjustments or alternative tests might be considered?
What does ANOVA stand for, and in what situation is it applicable?
What does ANOVA stand for, and in what situation is it applicable?
Describe the general logic behind ANOVA.
Describe the general logic behind ANOVA.
In ANOVA, what does the null hypothesis typically state?
In ANOVA, what does the null hypothesis typically state?
What assumptions are typically made regarding the variable X
when conducting an ANOVA?
What assumptions are typically made regarding the variable X
when conducting an ANOVA?
In ANOVA, why is it important that the distribution of the variable X for each group has the same variance?
In ANOVA, why is it important that the distribution of the variable X for each group has the same variance?
When is a t-test typically used, and what kind of variable is required for its application?
When is a t-test typically used, and what kind of variable is required for its application?
What are the null and alternative hypotheses in an independent samples t-test?
What are the null and alternative hypotheses in an independent samples t-test?
Explain why the t-distribution is 'wider' and 'shorter' than the Z-distribution.
Explain why the t-distribution is 'wider' and 'shorter' than the Z-distribution.
What do degrees of freedom represent in the context of a t-distribution?
What do degrees of freedom represent in the context of a t-distribution?
How is the t-statistic calculated, and how does dividing by the standard error of the mean influence the distribution?
How is the t-statistic calculated, and how does dividing by the standard error of the mean influence the distribution?
When is a one-sample t-test appropriate, and what is the null hypothesis in such a test?
When is a one-sample t-test appropriate, and what is the null hypothesis in such a test?
In a paired samples t-test, what is being compared, and what does 'd' represent in the context of the null hypothesis?
In a paired samples t-test, what is being compared, and what does 'd' represent in the context of the null hypothesis?
Under what circumstances is the Chi-squared test used?
Under what circumstances is the Chi-squared test used?
Differentiate between the null and alternative hypothese in a Chi-squared test.
Differentiate between the null and alternative hypothese in a Chi-squared test.
Explain how observed and expected frequencies are compared in the Chi-squared test.
Explain how observed and expected frequencies are compared in the Chi-squared test.
In the context of a Chi-squared test, how are expected values calculated for each cell in a contingency table?
In the context of a Chi-squared test, how are expected values calculated for each cell in a contingency table?
Describe how the Chi-squared score influences the p-value and how this relates to the observed and expected counts.
Describe how the Chi-squared score influences the p-value and how this relates to the observed and expected counts.
Define what a chi-squared distribution is.
Define what a chi-squared distribution is.
In general terms, explain the relationship between the Z-distribution and the Chi-squared distribution.
In general terms, explain the relationship between the Z-distribution and the Chi-squared distribution.
What do degrees of freedom signify in the context of a Chi-squared distribution, and how are they determined??
What do degrees of freedom signify in the context of a Chi-squared distribution, and how are they determined??
List the assumptions that must be met for an appropriate application of the Chi-squared test.
List the assumptions that must be met for an appropriate application of the Chi-squared test.
Briefly introduce one-way ANOVA and its relation to the F-distribution, differing from that of the t-test or Chi-squared test.
Briefly introduce one-way ANOVA and its relation to the F-distribution, differing from that of the t-test or Chi-squared test.
In the context of a One-Way ANOVA, what is the null hypothesis?
In the context of a One-Way ANOVA, what is the null hypothesis?
Outline the assumptions that must be met to ensure the validity of One-Way ANOVA.
Outline the assumptions that must be met to ensure the validity of One-Way ANOVA.
In the context of ANOVA, describe the difference between between-group variation and within-group variation.
In the context of ANOVA, describe the difference between between-group variation and within-group variation.
Flashcards
What is a T-test?
What is a T-test?
A statistical test used to determine if there is a significant difference between the means of two groups.
What is a Chi-squared test?
What is a Chi-squared test?
A statistical test used to determine if there is a significant association between two categorical variables.
What is ANOVA
What is ANOVA
A statistical procedure that tests the null hypothesis that the population means of all groups are equal.
What is a p-value?
What is a p-value?
Signup and view all the flashcards
What is a grouping variable?
What is a grouping variable?
Signup and view all the flashcards
What is the null hypothesis?
What is the null hypothesis?
Signup and view all the flashcards
What is the t-distribution?
What is the t-distribution?
Signup and view all the flashcards
What are degrees of freedom?
What are degrees of freedom?
Signup and view all the flashcards
What is the standard error of the mean?
What is the standard error of the mean?
Signup and view all the flashcards
What is Chi-squared distribution?
What is Chi-squared distribution?
Signup and view all the flashcards
Study Notes
Module 2 Recap
- Module 2 covered understanding Null Hypothesis Significance Testing (NHST).
- Module 2 covered evaluating effect sizes and power in statistical tests.
- It also included recognizing errors and limitations in hypothesis testing.
Introduction to Module 3
- Module 3 explores and confirms multivariable associations and outcome distributions.
- Week 4 lecture covers associations and two statistical tests for comparing groups: t-tests and Chi-Squared tests.
- T-tests are for means of continuous variables and Chi-Squared tests are for categorical associations.
- Both tests play a crucial role in inferential statistics and serve as building blocks for more advanced statistical techniques.
- ANOVA will be previewed.
Bivariate Associations
- Today's lecture is about comparing two variables.
- Often, there is a variable that places observations into groups and group membership associated with another variable is examined.
- An example of this is looking at PhD students' anxiety levels compared to undergrads, or seeing if men are more prone to binge drinking than women.
- Methods are needed to ask such questions.
Two Foundational Tests Overview
- The t-test and chi-squared test are inferential tests.
- They are used for comparing groups and making bivariate comparisons.
- The tests play roles such as determining the significance of regression coefficients.
- The goal is to understand these tests and their logic, and how to anticipate using them.
Student's T-Test
- The t-test is for cases that use a normally distributed variable X and the goal is to determine if specific group membership is is associated with different values of X.
- Three common variations of the t-test are one sample t-test, independent two sample t-test, and paired t-test.
- The focus is on the independent two sample t-test and then discuss the others after.
Independent Samples T-Test
- Have a normally distributed random variable X and two groups, G1 and G2.
- Goal is to determine if the population-level mean of X for G1 and G2 are the same or different.
- The t-test defines null and alternative hypotheses: H0: μG1=μG2, HA: μG1≠μG2.
- The null hypothesis is that the population means are equal and the alternate is that they are not.
Logic of the T-Test
- Hypotheses can be represented as H0: μG1−μG2=0 and HA: μG1−μG2≠0.
- Collect a sample of individuals, identify group G1/G2, and measure X for each person to run the study.
- Calculate sample mean values for each group: 𝑥ҧ𝐺1 & 𝑥ҧ𝐺2.
- Calculate 𝑥ҧ𝐺1 − 𝑥ҧ𝐺2 from here.
- When the null hypothesis is true, the most likely value for x̄G1 - x̄G2 is 0, values nearing 0 are more likely than values further, and values above 0 equally likely as values less than 0.
- In other words, the possible values for x̄G1 - x̄G2 appear normally distributed assuming the null is true.
Addressing Unknown Standard Deviation
- Normal distribution is defined by two population-level parameters, the mean μ and the standard deviation σ.
- While X is normally distributed, σ is often unknown.
- This is fairly common in drug use epidemiology, when our populations are understudied (or difficult to fully capture): such as people who inject; undergraduates who vape; etc.
- A new distribution developed because of this.
The T-Distribution Defined
- The t-distribution is a variation of the standard normal distribution (Z-distribution).
- Similar to the Z-distribution, the t-distribution has a mean value of 0 and is symmetrical around the mean.
- The t-distribution is a little bit “wider” and bit “shorter” than the Z-distribution.
- Standard deviation is derived from the sample.
Degrees of Freedom
- The t-distribution is defined in terms of “degrees of freedom”.
- The more degrees of freedom exist to define the t-distribution, the more similar it becomes to the Z-distribution.
- Degrees of freedom represent the amount of data available to calculate the variability of data.
- Degrees of freedom refer to the number of parameters that are able to vary freely given some assumed outcome.
- If there are 100 participants and their mean age is 60 years old, there are infinite possibilities for how age can be distributed throughout this group.
- However, if 99 of their ages are known, the final person's age is fixed.
- To calculate the mean value, one observation cannot "vary freely".
- In this example, n = 100 observations and 1 df to calculate the mean must be spent.
- The normal distribution is defined by a mean value and a standard deviation.
- With n observations, one degree of freedom must be spent to identify the mean value.
- There are (n-1) degrees of freedom left to calculate the standard deviation.
- The t-distribution is defined by (n-1) degrees of freedom as standard deviation is calculated from the sample.
- The more observations, the more degrees of freedom to inform the t-distribution.
Degrees of Freedom and Distribution Certainty
- The t-distribution is intended to capture uncertainty in standard deviation measurement from a small sample.
- The fewer df, the less certain that measured standard deviation s represents population-level standard deviation σ.
- Therefore, the t-distribution is “shorter” and “wider” than the Z-distribution.
- Values further from 0 become more probable due to less certainty about the standard deviation.
Mapping Tests
- The t-test is nearly identical to the z-test, but the signal is mapped onto the t(n-1)-distribution.
- First, calculate the signal 𝑥ҧ𝐺1 − 𝑥ҧ𝐺2.
- The signal must then be standardized by dividing it by the noise.
- The t-test divides the signal by the standard error of the mean.
Standard Error of the Mean Formula
- The standard error of the mean is a "conservative" estimate of the standard deviation because the population level standard deviation is unknown.
- SE = s / √(nG1 + nG2)
- Where s is the standard deviation of X in the sample.
Calculating the T-Statistic
- The test statistic, t, is calculated as the following:t = (x̄G1 − x̄G2) / SE = (x̄G1 − x̄G2) / s * √(1/nG1 + 1/nG2)
- Taking the signal and mapping it onto the t-distribution with nG1 + nG2 - 2 degrees of freedom is achieved by dividing by the standard error of the mean.
- Degrees of freedom are nG1 − 1 to calculate the standard deviation for G1 and nG2 − 1 for G2.
T-Distribution Test Statistic Mapping
- A test statistic is mapped onto the appropriate t-distribution.
- Supposing G1 and G2 both have 100 people
- In this case, the averages for G1 is x̄G1 = 21, with G2 is x̄G2 = 22 and a pooled standard deviation of 3.
- Then t = (21-22)/(3*sqrt(1/100 + 1/100) with t ≈ -2.36.
- Map this value to a t-distribution with 198 degrees of freedom.
P-Value in T-Test
- If the calculated p < 0.05, then we consider this significant evidence against our null hypothesis.
- This indicates the signal (or a more extreme signal) is observed less than 5% of the time if the null were true.
- Provides evidence that the null hypothesis is not true.
Assumptions for T-Test to be Valid
- The variable of interest, X, must be measured on an ordinal or continuous scale
- Data must be drawn from a random sample and the two groups being compared must be independent
- X must be normally distributed. As sample size increases, this assumption becomes weaker. the t-test is more robust to violating this assumption.
- The variance of X in both groups must be the same. In other words, the standard deviation of X in both groups must be roughly equal.
One sample t-test
- Compares the mean of X for one group to some pre-defined level, y.
- Null hypothesis is that: H₀: μ = y.
- T-score then can be calculated as follows: t = (x̄ - y) / (s / √(1/n)).
- Compare tot t-distribution with n-1 degrees of freedom.
Paired Samples T-test
- A t-test can compare the mean of X for one group at time 1 versus at time 2.
- Null hypothesis is that H₀: d = 0.
- "d" represents the difference in measurement from time 1 and time 2.
- Sample mean x̄ and the sample standard deviation "s" of the differences are used to calculate the t-score as follows: t = d / (s / √(1/n)
- Compare this to a t-distribution with n-1 degrees of freedom.
Chi-Squared Test
- Assesses if two categorical variables, X and Y, are independent.
- The null hypothesis is that X and Y are independent; the alternative hypothesis is that X and Y are not independent.
- This test compares observed patterns in the distributions of X and Y to what is expected if the null hypothesis of independence is true.
- The frequencies of two categorical variables are examined at the same time (contingency table or crosstabs).
- For each cell, the number of people in that row is multiplied by the number of people in that column, then divided by the total number: (Nrow * Ncolumn) / n.
- The goal of the chi-squared test is to identify if the observed counts are similar or different than the expected counts.
- The greater the difference between the observed and expected counts, the higher is the chi-squared score (and the lower is the p-value).
Chi-Squared Distribution Overview
- Normarl distributions arisde from understanding certain natural phenomenon.
- Chi-squared distribution with one degree of freedom is the square of the Z-distribution.
- This means that for any point (x,y) on the Z distribution, it gets mapped onto (x², y²) on the chi-squared distribution.
Chi-Squared General Distribution
- Consider if there exist k random variables X1,X2...XK which are independent and follow the Z distribution. A new variable Y can have a value calculated by the sum of each random variance such that Y=∑xi^2
- It follows Y is understood to be distributed/according to the chi-squared distribution with k degrees of freedom
Test Statistic as Sum of Squares
- The test calculates: χ² = Σ((Observed - Expected)² / Expected).
- This is a sum of squares!! So, this follows a chi-squared distribution.
- When such a tesst sums 8 squares (4 years x 2 housing options), it is not indepedent and requires degrees of freedom.
Degrees of Freedom for Test
- The value for the test is calculated as the difference between what is expected assuming the null versus what is actually observed.
- Degrees of freedom (df) is calculated as: df = (# of rows - 1) * (# of columns - 1)
- Begin filling in such a table with one entry at a time.
- The degrees of freedom is the number of pieces of information needed to be known in order to fill out the entire rest of the table.
- P is less than p < 0.00001 and is obtained by taking the area under the curve of the chi-squared distribution with 3 degrees of freedom!
Chi-Squared Test Assumptions
- The X and Y variables must be categorical.
- Levels of X and Y are mutually exclusive. Each participant must belong to one and only one level of each.
- Each observation must be independent (data is drawn from a random sample).
- The expected value for each cell should be 5 or greater for 80% of cells and must be at least 1 for every cell.
ANOVA
- ANOVA is touched upon.
- F distribution arises by taking the ratio of two chi-squared distributed variables.
- ANOVA allows comparison of group means of three or more groups (extending t-test) and determine if they are all the same or if they differ in some way.
One-Way ANOVA
- This measures a normal random variable X and there are k groups G1, G2, ..., Gk. Determines if the mean value across each group is the same or different.
- The null hypothesis is H₀: μG1 = μG2 = ... = μGk
- The alternate hypothesis says that they do not all equal each other. This could be all of them being different or even just one.
Variance for One-Way ANOVA
- Logic of One-Way Anova compares between-group variation and within-group variation.
Assumptions for ANOVA
- Each observation must be independent.
- X must be a normally distributed variable within each group.
- The distribution of X for each group must have the same variance.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.