T-Tests: One-Sample vs. Independent Samples
55 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

In the context of t-tests, what is the primary difference that distinguishes a one-sample t-test from an independent two-sample t-test?

A one-sample t-test compares the mean of a single sample to a known value, while an independent two-sample t-test compares the means of two independent groups.

When conducting an independent samples t-test, what assumption about the data is particularly important regarding the distribution of the variable being analyzed, and why is this assumption important?

The variable should be approximately normally distributed within each group because the t-test relies on the normality assumption to ensure the validity of the p-value and confidence intervals.

Explain in your own words, the null hypothesis ($H_0$) and the alternative hypothesis ($H_A$) of a two-tailed, independent samples t-test. Use the symbols $\mu_{G1}$ and $\mu_{G2}$ to denote the population means of group 1 and group 2, respectively.

The null hypothesis ($H_0$) states that the population means of groups 1 and 2 are equal ($\mu_{G1} = \mu_{G2}$). The alternative hypothesis ($H_A$) states that the population means of groups 1 and 2 are not equal ($\mu_{G1} \neq \mu_{G2}$).

A researcher is comparing anxiety scores between a group of PhD students and undergraduate students. What null and alternative hypotheses should the researcher use?

<p>Null Hypothesis ($H_0$): There is no difference in the mean anxiety scores between PhD students and undergraduate students. Alternative Hypothesis ($H_A$): There is a difference in the mean anxiety scores between PhD students and undergraduate students.</p> Signup and view all the answers

In a statistical analysis comparing the effectiveness of a new drug, researchers obtain a p-value of 0.03. Using a significance level (alpha) of 0.05, what is the conclusion regarding the null hypothesis, and what does this imply about the drug's effectiveness?

<p>Since the p-value (0.03) is less than the significance level (0.05), we reject the null hypothesis. This suggests that the drug has a statistically significant effect.</p> Signup and view all the answers

Why is it important to understand the tests that compare two variables?

<p>Comparing two variables is essential for determining if group membership is related to another variable. For example, determining if a treatment is more effective than a placebo.</p> Signup and view all the answers

In the context of research, what does it mean to make a bivariate comparison, and can you provide an example of a research question that would involve such a comparison?

<p>A bivariate comparison involves examining the relationship between two variables. An example research question is: &quot;Is there a relationship between education level and income?&quot;</p> Signup and view all the answers

A study aims to investigate whether there is a significant difference in the mean test scores between students who attended a review session and those who did not. If an independent samples t-test is used for this purpose, briefly explain the logic behind using this test.

<p>The independent samples t-test is used to compare the means of two independent groups (those who attended the review session and those who did not) to determine if the observed difference in their mean test scores is statistically significant, indicating a real difference in the population rather than just a chance occurrence.</p> Signup and view all the answers

Explain how the shape of the t-distribution changes as the degrees of freedom increase, and why this change occurs.

<p>As the degrees of freedom increase, the t-distribution becomes more similar to the Z-distribution. This is because with more observations, the sample standard deviation becomes a more reliable estimate of the population standard deviation, reducing uncertainty.</p> Signup and view all the answers

Why is the standard error of the mean considered a 'conservative' estimate of standard deviation?

<p>The standard error of the mean is 'conservative' because it estimates the variability of sample means around the population mean, which is typically larger than the variability within a single sample. It accounts for the uncertainty in estimating the population mean from a sample.</p> Signup and view all the answers

Briefly describe the difference between a z-test and a t-test, and explain when it is more appropriate to use a t-test.

<p>A z-test uses the standard normal distribution and assumes the population standard deviation is known. A t-test uses the t-distribution and is used when the population standard deviation is unknown and estimated from the sample. A t-test is more appropriate with smaller sample sizes.</p> Signup and view all the answers

How do degrees of freedom relate to the calculation of the t-statistic when comparing two groups?

<p>When comparing two groups, the degrees of freedom are calculated as $n_1 + n_2 - 2$, where $n_1$ and $n_2$ are the sample sizes of the two groups. These degrees of freedom are used to determine the appropriate t-distribution for assessing the significance of the difference between the group means.</p> Signup and view all the answers

Explain how the t-statistic standardizes the 'signal' (difference between means) in a t-test.

<p>The t-statistic standardizes the 'signal' by dividing the difference between the sample means by the standard error of the mean difference. This scales the difference in means relative to the variability within the samples, mapping the signal onto the t-distribution.</p> Signup and view all the answers

Describe what the 'signal' represents in the context of calculating a t-statistic.

<p>In the context of a t-statistic, the 'signal' represents the difference between the means of the groups being compared. It is the effect size that we are trying to measure and determine if it is statistically significant.</p> Signup and view all the answers

Explain in one or two sentences, why more observations lead to a t-distribution that more closely resembles a Z-distribution.

<p>More observations provide a more accurate estimate of the population standard deviation. This reduces the uncertainty, causing the t-distribution to converge towards the Z-distribution, which assumes a known population standard deviation.</p> Signup and view all the answers

What is the formula for calculating the Standard Error of the Mean (SEM), and which component of the formula accounts for sample variability?

<p>The formula for SEM is $SE = s / \sqrt{n}$, where <em>s</em> is the sample standard deviation and <em>n</em> is the sample size. The sample standard deviation, <em>s</em>, accounts for sample variability.</p> Signup and view all the answers

How does the t-distribution differ from the Z-distribution (standard normal distribution)?

<p>The t-distribution is wider and shorter than the Z-distribution. The t-distribution also uses degrees of freedom to estimate the population standard deviation from the sample.</p> Signup and view all the answers

Explain the concept of 'degrees of freedom' in the context of statistical analysis.

<p>Degrees of freedom refer to the number of parameters that are free to vary when estimating statistical parameters. For example, when calculating the mean of a sample, one degree of freedom is lost because one data point is fixed once the mean and the other data points are known.</p> Signup and view all the answers

Why is the t-distribution particularly useful in fields like drug use epidemiology?

<p>The t-distribution is useful because in drug use epidemiology, populations are often understudied or difficult to fully capture, leading to situations where the population standard deviation is unknown and must be estimated from a sample. The t-distribution accommodates for the uncertainty introduced by estimating the standard deviation with a sample.</p> Signup and view all the answers

In a scenario with 50 participants, if you know the mean score on a test, how many degrees of freedom do you have when analyzing the data related to this mean?

<p>There are 49 degrees of freedom. One degree of freedom is 'spent' to calculate the mean.</p> Signup and view all the answers

Explain why, as the degrees of freedom increase, the t-distribution becomes more similar to the Z-distribution.

<p>As degrees of freedom increase, the sample provides a more accurate estimate of the population standard deviation, reducing the uncertainty. With a more reliable estimate of the population standard deviation, the t-distribution converges towards the Z-distribution, which assumes the population standard deviation is known.</p> Signup and view all the answers

Explain why the null hypothesis $H_0: μ_{G1} - μ_{G2} = 0$ implies about the likely values for $μ_{G1} - μ_{G2}$.

<p>If the null hypothesis is true, it means that the population means of group 1 and group 2 are equal. Therefore, the most likely value for $μ_{G1} - μ_{G2}$ is 0, and values closer to 0 are more likely than values farther from 0. Values greater than 0 are just as likely as values less than 0.</p> Signup and view all the answers

What parameters define a normal distribution, and why is it important to know these when conducting statistical tests?

<p>A normal distribution is defined by its mean ($μ$) and standard deviation ($σ$). Knowing these parameters allows us to characterize and make inferences about the distribution of data, calculate probabilities, and perform statistical tests that assume normality.</p> Signup and view all the answers

How is the variability (i.e., standard deviation) of data related to the concept of degrees of freedom?

<p>Degrees of freedom represent the amount of independent information available to estimate the variability (standard deviation) of data. When calculating variability using sample data, one or more degrees of freedom are 'lost' because certain parameters (like the sample mean) must first be estimated, which constrains the variability calculation.</p> Signup and view all the answers

In the context of the Chi-squared test, what does a higher Chi-squared score suggest about the observed and expected counts?

<p>A higher Chi-squared score indicates a greater difference between the observed and expected counts.</p> Signup and view all the answers

If the total number of students (N) in a survey is 400, with 100 students from each year (1st, 2nd, 3rd, and 4th), and the total number of on-campus students is 252, what is the expected number of on-campus students for each year, assuming no association between year and housing?

<p>The expected number of on-campus students for each year is 63.</p> Signup and view all the answers

Explain how the Chi-squared test helps in determining the relationship between two categorical variables, such as year of study and housing preference (on-campus vs. off-campus).

<p>The Chi-squared test compares observed frequencies with expected frequencies (assuming no association) to determine if there's a statistically significant association between the variables.</p> Signup and view all the answers

In the example, it's mentioned that 1st-year students are more likely to live on-campus compared to 4th-year students. How is this observation reflected in the Chi-squared test results, assuming the data supports this claim?

<p>This observation would likely lead to a high Chi-squared score and a low p-value, indicating a significant association between the year of study and housing preference.</p> Signup and view all the answers

The formula for calculating the expected value for each cell is provided as $\frac{N_{row} * N_{column}}{N}$. Explain why this formula is used to find the expected value.

<p>This formula calculates the expected value under the assumption of independence between row and column variables; it's the value we'd expect if the variables were unrelated.</p> Signup and view all the answers

If the observed number of 3rd-year students living off-campus is 49 and the expected number is 37, what does this difference contribute to the overall Chi-squared statistic, and how does it reflect on the initial hypothesis?

<p>This difference contributes to the overall Chi-squared statistic by quantifying the deviation between the observed and expected frequencies. A substantial deviation suggests the initial hypothesis of no association may be incorrect.</p> Signup and view all the answers

Using the provided data, explain why the calculation for the expected number of on-campus students is the same for every row.

<p>The calculation is the same because each row represents an equal number of students (100).</p> Signup and view all the answers

What would a Chi-squared test result with a high p-value (e.g., > 0.05) suggest about the relationship between the year of study and housing preference in this scenario?

<p>A high p-value would suggest that there is no statistically significant relationship between the year of study and housing preference, meaning any observed differences are likely due to chance.</p> Signup and view all the answers

In the context of calculating a Chi-squared statistic, explain the meaning of the phrase 'degrees of freedom'.

<p>Degrees of freedom refers to the number of independent pieces of information available to estimate a parameter. It reflects the number of values in the final calculation of a statistic that are free to vary.</p> Signup and view all the answers

Describe in your own words the relationship between the Z-distribution and the Chi-squared distribution with one degree of freedom.

<p>The Chi-squared distribution with one degree of freedom is the distribution of the square of a standard normal (Z) random variable. Every value from the Z distribution is squared to get a corresponding value in the Chi-squared distribution.</p> Signup and view all the answers

In a Chi-squared test, why is it important to consider whether the variables being summed are independent?

<p>Independence is important because the Chi-squared distribution assumes that the squared values being summed are derived from independent sources. If the variables are dependent, the resulting test statistic might not accurately follow a Chi-squared distribution, leading to incorrect conclusions. To determine the correct degrees of freedom, we have to account for any dependencies.</p> Signup and view all the answers

Explain the purpose of calculating a Chi-squared statistic. What type of question can it help you answer?

<p>The Chi-squared statistic is used to test the independence of two categorical variables. It helps answer the question of whether the observed distribution of data differs significantly from what would be expected if the variables were unrelated.</p> Signup and view all the answers

For the housing data provided, explain the meaning of the numbers in parentheses (e.g. '(63)') and the numbers in brackets (e.g. '[18.3]').

<p>The numbers in parentheses represent the expected values under the assumption of independence between year and housing choice, while the numbers in brackets represent the individual cell's contribution to the overall chi-squared statistic.</p> Signup and view all the answers

If you are performing a Chi-squared test and obtain a very small p-value (e.g., less than 0.05), what does this typically indicate regarding the null hypothesis?

<p>A small p-value indicates strong evidence against the null hypothesis. This suggests that there is a statistically significant association between the variables being tested.</p> Signup and view all the answers

Describe a scenario where using a Chi-squared test would not be appropriate.

<p>A Chi-squared test is not appropriate when dealing with continuous data. It is specifically designed for categorical data where you are comparing observed frequencies to expected frequencies. Also, if expected cell counts are too small (typically less than 5), the Chi-squared approximation may not be valid.</p> Signup and view all the answers

In the housing example provided, how would you calculate the degrees of freedom?

<p>The degrees of freedom would be calculated as (number of rows - 1) * (number of columns - 1). In this example, we have 4 rows (years) and 2 columns (housing options), so the degrees of freedom are (4-1)*(2-1) = 3.</p> Signup and view all the answers

In a chi-squared test, what does the test statistic (e.g., 182.7 in the example) represent?

<p>It represents the overall difference between the observed values and the expected values under the null hypothesis. A larger test statistic indicates a greater difference.</p> Signup and view all the answers

Explain why we need to calculate degrees of freedom when comparing a chi-squared test statistic to a distribution.

<p>Degrees of freedom are needed to select the correct chi-squared distribution for comparison. Different degrees of freedom result in different chi-squared distributions, affecting the p-value and the conclusion of the hypothesis test.</p> Signup and view all the answers

In the context of degrees of freedom, what does it mean to say that you need a certain number of 'pieces of information' to fill out a table?

<p>It means that once you know a specific number of cell values, you can deduce the remaining cell values based on the row and column totals. The degrees of freedom represent the number of independent values you need to know to complete the table.</p> Signup and view all the answers

Based on the example, describe one way to computationally determine the degrees of freedom for a chi-squared test.

<p>The degrees of freedom can be determined by considering how many cell values in a contingency table can be freely chosen before the remaining values are fixed by the row and column totals. In the example it is the number of rows -1, times the number of columns - 1.</p> Signup and view all the answers

Explain the relationship between the chi-squared value and the likelihood of rejecting the null hypothesis.

<p>A larger chi-squared value generally corresponds to a smaller p-value. If the p-value is below the significance level, we reject the null hypothesis.</p> Signup and view all the answers

Describe what the numbers in brackets '[ ]' represent under each of the observed values in the table.

<p>The numbers in brackets represent the contribution of that particular cell to the overall chi-squared statistic. They are calculated as $\frac{(O-E)^2}{E}$, where O is the observed frequency and E is the expected frequency for that cell.</p> Signup and view all the answers

What information is needed, besides the chi-squared value, to determine if the null hypothesis should be rejected?

<p>To determine whether to reject the null hypothesis, one needs both the chi-squared value and the degrees of freedom. These are used to find the p-value.</p> Signup and view all the answers

If the observed value is exactly equal to the expected value for a particular cell, what will be its contribution to the overall chi-squared statistic, and why?

<p>The cell's contribution to the chi-squared statistic will be zero. This is because the formula for the contribution is $\frac{(O-E)^2}{E}$, and if O = E, then (O-E) = 0, and the entire term becomes zero.</p> Signup and view all the answers

Explain why the 'Total' row and column in the table are insufficient to calculate the degrees of freedom without knowing the internal cell values.

<p>The 'Total' row and column only provide marginal distributions. The calculation of degrees of freedom requires knowing or estimating the expected frequencies in the internal cells, which depend on the relationship between the row and column variables.</p> Signup and view all the answers

In the table, what does the null hypothesis imply about the relationship between year of study and housing choice (on-campus vs. off-campus)?

<p>The null hypothesis implies that there is no association between the year of study and housing choice; the two variables are independent. The proportion of students living on-campus or off-campus is the same across all years.</p> Signup and view all the answers

Describe a scenario where a chi-squared test might be inappropriate, and an alternative statistical test would be more suitable.

<p>A chi-squared test is inappropriate when expected cell counts are very small (e.g., less than 5). In such cases, Fisher's exact test is more suitable, especially for 2x2 contingency tables.</p> Signup and view all the answers

How would increasing the sample size (e.g., changing the total from 400 to 800) potentially affect the chi-squared statistic, assuming the proportions in each cell remain approximately the same?

<p>Increasing the sample size would likely increase the chi-squared statistic. Even if the proportions remain the same, larger sample sizes amplify the deviations from the expected values, leading to a larger test statistic.</p> Signup and view all the answers

Explain the difference between 'observed' and 'expected' values in the context of the chi-squared test.

<p>Observed values are the actual counts obtained from the sample data for each category. Expected values are the counts that would be anticipated in each category if there were no association between the variables being studied (i.e., under the assumption of the null hypothesis).</p> Signup and view all the answers

If the degrees of freedom for a chi-squared test is 3, and the critical value at a significance level of 0.05 is 7.815, interpret what it means if the calculated chi-squared statistic is 6.5.

<p>If the calculated chi-squared statistic is 6.5, and is less than the critical value 7.815, we fail to reject the null hypothesis at the 0.05 significance level. This suggests that there isn't sufficient evidence to conclude that there is an association between the variables.</p> Signup and view all the answers

Describe how changing the significance level (alpha) from 0.05 to 0.01 would affect the likelihood of rejecting the null hypothesis in a chi-squared test.

<p>Changing the significance level from 0.05 to 0.01 would make it less likely to reject the null hypothesis. A lower significance level requires stronger evidence (a larger chi-squared statistic and a smaller p-value) to reject the null hypothesis.</p> Signup and view all the answers

Flashcards

Bivariate Comparison

Comparing two variables to see if there's a relationship between group membership and another variable.

T-test & Chi-squared

Two important tests used to compare groups and determine significance.

T-Test Use

Tests if group membership is associated with different values of a normally distributed variable.

One Sample T-Test

Tests the mean of one sample against a known or hypothesized mean.

Signup and view all the flashcards

Independent Samples T-Test

Tests the difference between the means of two independent groups.

Signup and view all the flashcards

Paired T-Test

Tests the difference between two related variables from the same subject.

Signup and view all the flashcards

T-Test Null Hypothesis (H0)

The population means of variable X for Group 1 and Group 2 are equal.

Signup and view all the flashcards

T-Test Alternative Hypothesis (HA)

The population means of variable X for Group 1 and Group 2 are not equal.

Signup and view all the flashcards

Null Hypothesis (H0)

States there is no difference between the means of two groups.

Signup and view all the flashcards

Alternative Hypothesis (HA)

States there is a difference between the means of two groups.

Signup and view all the flashcards

t-Distribution

A distribution similar to the standard normal distribution but wider and shorter.

Signup and view all the flashcards

Degrees of Freedom (df)

Parameters that can vary freely, given a known outcome.

Signup and view all the flashcards

Degrees of Freedom and t-Distribution

The t-distribution is defined by this value. As this gets larger, the t-distribution approaches the Z-distribution.

Signup and view all the flashcards

Degrees of Freedom & The Mean

We need to 'spend' one of these to identify this value in a normal distribution.

Signup and view all the flashcards

Normal Distribution Parameters

A normal distribution is defined by these two population-level parameters.

Signup and view all the flashcards

Degrees of Freedom - SD

The amount of data we have to calculate the variability of our data.

Signup and view all the flashcards

N_row

The number of rows in a contingency table.

Signup and view all the flashcards

N_column

The number of columns in a contingency table.

Signup and view all the flashcards

N

The total number of observations.

Signup and view all the flashcards

Expected Value

The value expected in a cell if there is no association between the variables.

Signup and view all the flashcards

Consistent Calculation

Calculation is the same for each category because the sample size is consistent.

Signup and view all the flashcards

Chi-squared Score

A measure of the difference between observed and expected counts.

Signup and view all the flashcards

Chi-squared Test Goal

To determine if observed counts are similar or different from expected counts.

Signup and view all the flashcards

P-value

The probability of observing results as extreme as, or more extreme than, the results actually observed, assuming that the null hypothesis is correct.

Signup and view all the flashcards

t-test and t-distribution Mapping

The t-test maps the signal onto the t(n-1)-distribution to account for uncertainty from small sample sizes.

Signup and view all the flashcards

Standard Error of the Mean

An estimate of the standard deviation of the sample mean. It quantifies the precision of the sample mean as an estimate of the population mean.

Signup and view all the flashcards

Standard Error Formula

The formula is: SE = s / sqrt(n), where 's' is the sample standard deviation and 'n' is the sample size.

Signup and view all the flashcards

Calculating the t-statistic

You calculate our test statistic by dividing the difference between the sample means by the Standard Error of the Mean: t = (mean1 - mean2) / SE

Signup and view all the flashcards

Degrees of freedom (two groups)

The degrees of freedom are calculated as n1 + n2 - 2.

Signup and view all the flashcards

Mapping t-statistic to t-distribution

Mapping the calculated t-value onto a t-distribution with relevant degrees of freedom allows for determining the p-value.

Signup and view all the flashcards

Chi-squared Calculation

Calculated cell-by-cell: (Observed - Expected)^2 / Expected, then sum for all cells.

Signup and view all the flashcards

Chi-squared Distribution

A distribution that arises in statistical tests, often from sums of squared differences.

Signup and view all the flashcards

Chi-squared vs. Z-distribution

The chi-squared distribution with one degree of freedom is the square of the Z-distribution

Signup and view all the flashcards

Chi-squared Area (df=1)

With 1 degree of freedom, 68% of the chi-squared distribution falls between 0 and 1.

Signup and view all the flashcards

Chi-squared (k degrees of freedom)

Sum of squares of k independent random variables, each following a standard normal (Z) distribution.

Signup and view all the flashcards

Test Statistic Form

The test statistic used in the described test is a sum of squares.

Signup and view all the flashcards

Degrees of Freedom

The number of independent pieces of information used to calculate a statistic.

Signup and view all the flashcards

Chi-squared Value

The Chi-squared test statistic is 182.7 in this example, representing the difference between expected and observed values.

Signup and view all the flashcards

DF Definition (More Formally)

Degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Signup and view all the flashcards

DF Visual Explanation

Imagine filling a table; degrees of freedom is how many cells you must fill before the rest auto-fill.

Signup and view all the flashcards

Contingency Table Setup

Start with a blank contingency table that includes row and column totals.

Signup and view all the flashcards

Deducing Values in a Table

With some information filled, additional values can be determined without inputting them directly.

Signup and view all the flashcards

Enough Information Acquired

After filling 3 row values you can deduce all column values.

Signup and view all the flashcards

Minimum Data for Table Completion

With total row and column values, the board can be completed when you know 3 of the on-campus year results.

Signup and view all the flashcards

Degrees of Freedom Formula

degrees of freedom (df) = (number of rows - 1) * (number of columns - 1)

Signup and view all the flashcards

DF Calculation Example

In this example, df = (4 rows - 1) * (2 columns - 1) = 3 * 1 = 3.

Signup and view all the flashcards

Comparing to Chi-Squared Distribution

The calculated chi-squared value is then compared to a chi-squared distribution with df = 3.

Signup and view all the flashcards

P-value Meaning

This indicates the probability (p-value) of observing such extreme results if the null hypothesis were true.

Signup and view all the flashcards

Rejecting the Null Hypothesis

If the p-value is small (typically less than 0.05), we reject the null hypothesis.

Signup and view all the flashcards

Meaning of Rejection

Rejecting the null hypothesis means there's a statistically significant association between the variables.

Signup and view all the flashcards

Purpose of Chi-Squared Test

The chi-squared test determines whether the observed data significantly differ from what would be expected under the null hypothesis.

Signup and view all the flashcards

Study Notes

  • Inferential tests compare two variables to see if group membership is associated with another variable.
  • T-tests and Chi-squared tests are important inferential tests.
  • These tests also determine the significance of regression coefficients.

Student's T-Test

  • Used for a normally distributed variable X
  • Determines if specific group membership is associated with different values of X
  • There are three common variations of the t-test: One sample t-test, Independent two sample t-test, Paired t-test.
  • The focus is on the independent two sample t-test.

Independent Samples T-Test

  • Uses a normally distributed random variable X and two groups, G1 and G2
  • Determines if the population level mean of X for G1 and G2 are the same or different
  • Null hypothesis (H0): μG1=μG2
  • Alternate hypothesis (HA): μG1 ≠ μG2
  • Null says the population means are equal
  • Alternate says the population means are not equal

Logic of the T-Test

  • To run a study:
  • Collect a sample of individuals
  • Identify their group (G1/G2)
  • Measure X for each person
  • Calculate the sample mean values for each group

Distribution Parameters

  • A normal distribution has two population-level parameters: the mean μ and the standard deviation σ
  • Even if X is normally distributed, σ is often unknown, common in understudied populations.
  • A new distribution was developed because of this.

T-Distribution

  • A variation of the standard normal distribution (Z-distribution)
  • Has a mean value of 0
  • Is symmetrical around the mean
  • Is “wider” and bit "shorter" than the Z-distribution
  • Derives the standard deviation from the sample
  • Samples typically are made up of a small number of people

T-Distribution and Degrees of Freedom

  • Defined in terms of "degrees of freedom"
  • The more degrees of freedom to define a t-distribution, the more similar it becomes to the Z-distribution
  • Degrees of freedom represent the amount of data to calculate the variability (i.e., standard deviation) of data.
  • Degrees of freedom (df) are the number of parameters able to "vary freely" given some assumed outcome.
  • If there are 100 participants and the mean age is 60, there’s infinite age distribution possibilities in the group, BUT once 99 ages are known, the final age is fixed.
  • To calculate the mean value, one observation cannot "vary freely”. -Example: n = 100 observations and it’s necessary to spend 1 df to calculate the mean
  • A normal distribution is defined by a mean value and standard deviation.
  • If there are n observations, one degree of freedom must be spent to identify the mean value.
  • There are n – 1 degrees of freedom remaining to calculate the standard deviation.
  • The t-distribution is defined by n – 1 degrees of freedom because the standard deviation is derived from the sample.
  • More observations means more degrees of freedom to inform the t-distribution.
  • The t-distribution captures uncertainty in the measurement of the standard deviation from a small sample.
  • Fewer df equals less certainty that the measured standard deviation s represents the population-level standard deviation σ
  • The t-distribution is "shorter" and "wider" than the Z-distribution to capture this.
  • Values further from 0 become more probable when there is less standard deviation certainty

Mapping the Test

  • The t-test is almost identical to the z-test
  • Map the signal onto the t(n-1)-distribution
  • The signal is standardized by dividing it by the noise, the standard error of the mean.
  • The standard error of the mean is the “conservative" estimate of the standard deviation
  • Standard error is used because the population level standard deviation is unknown.

Calculating the T-Statistic

  • Map the test statistic onto the appropriate t-distribution.
  • Example:
  • G1 has100 people and G2 does too
  • The average for G1 = 21 and for G2 with a pooled standard deviation of 3
  • Mapping this value is onto a t-dist with 100 + 100 – 2 =198 degrees of freedom!
  • If the calculated p < .05, there’s significant evidence against the null hypothesis.
  • the signal (or a more extreme signal) would be observed less than 5% of the time if the null were true.
  • If p <.05, the null hypothesis is not true.

Assumptions of T-Test

  • Variable of interest X must be measured on an ordinal or continuous scale.
  • Data must be drawn from a random sample. The two groups being compared must be independent.
  • X must be normally distributed. The t-test gets more robust to violating this assumption as sample gets larger.
  • The standard deviation or variance of X in both groups should be roughly equal.

Testing the Assumption of Normality

  • Can be done with the Shapiro-Wilk normality test
  • Check each group

Testing the Assumption of Homogeneity

  • Can be done with the Levene Test

One Sample T-Test Variation

  • There’s a t-test comparing the mean of X for one group to some pre-defined level, y
  • The null hypothesis can be calculated
  • Sample mean and the sample standard deviation s is computed to calculate the *t-score
  • Calculate in relation to t-distribution with n-1 degrees of freedom

Paired Samples T-Test Variation

  • Used to compare the mean of X for one group at time 1 versus at time 2
  • Null hypothesis can be determined
  • Where is the difference in measurement from time 1 and time 2
  • Compute sample mean and sample standard deviation s of the differences and calculate the *t-score:
  • Then, compare to t-distribution with n-1 degrees of freedom

Chi-Squared test

  • Assesses if two categorical variables X and Y are independent
  • Null hypothesis (H0): X and Y are independent
  • Alternate hypothesis (HA): X and Y are not independent
  • Observed patterns in the distribution of X and Y are compared to what would expect to observe if the null were true.
  • Expected values calculate for each cell if housing and year were independent.

Contingency Table Example: Frequency of Year of College and Housing Status

  • The goal of the chi-squared test identifies whether the observed counts are similar or different than the expected accounts.
  • The greater the difference between the observed and expected counts, the higher the chi-squared score, and the lower the corresponding p-value.

Calculating Chi-Squared

  • To calculate the Chi-squared score, follow these two steps.
  • Calculate for each cell.
  • After calculating each square, add the sum of the squares to get:
  • Degrees of freedom (df) can be defined
    • Starting with a blank board, just have the number in each row and each column:
      • The df is the number of pieces of information needed to fill it out
  • Degrees of Freedom = 3
  • Then, taking the area under the curve of the chi-squared distribution with 3 degrees of freedom value is found.
  • For the contingency table, the p < 0.00001

Chi-Squared Distribution

  • Normal distribution occurs from understanding certain natural phenomenon
  • The chi-squared distribution with one degree of freedom is the square of the Z-distribution.
  • For any point (x,y) on the Z distribution, it gets mapped onto () on the chi-squared distribution
  • Y is understood to be distributed according to the chi-squared distribution with k degrees of freedom.

Assumptions of Chi-Squared Tests

  • X and Y are both Categorical
  • The levels of X and Y are mutually exclusive. In otherwords, each participant must belong to one and only one level of each
  • Each observation is independent – in other words, our data is drawn from a random sample
  • The expected value for each cell should be > 5 or greater for 80% of cells and must be at least 1 for every cell

One-Way ANOVA

  • Compares group means of three or more groups
  • Determines if they are all the same or differ in some way

One-Way Anova Hypotheses

  • Measures a normal random variable X and k groups
  • Looking for if the mean value across each group is the same or different.
  • Null Hypothesis is if the numbers are the same
  • Alternate hypothesis is that they do not all equal each other. This could be all of them being different or even just one.

One-Way ANOVA Assumptions

  • Each observation must be independent
  • X must be a normally distributed variable within each group
  • The distribution of X for each group must have the same variance

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

Understand the differences between one-sample and independent samples t-tests. Learn about assumptions like data distribution in independent samples t-tests. Explore null and alternative hypotheses with practical examples.

More Like This

Use Quizgecko on...
Browser
Browser