Statistics: Chi-Square & Sampling Distributions
45 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

In a multinomial distribution, what does $N_i$ represent?

  • The total number of trials.
  • The probability of outcome _i_.
  • The number of times outcome _i_ occurs. (correct)
  • The expected value of outcome _i_.

The degrees of freedom for a Chi-Square goodness-of-fit test, with k categories, is k.

False (B)

In the Chi-Square test for goodness of fit, what constitutes the null hypothesis ($H_0$) regarding the probabilities $\pi_i$?

$\pi_i = \pi_{i0}$

In a Chi-Square goodness-of-fit test, a large value of the test statistic $\chi^2$ suggests that you should ______ the null hypothesis.

<p>reject</p> Signup and view all the answers

Match the following terms with their corresponding definitions in the context of the Chi-Square test for goodness of fit:

<p>Observed_i = The actual count of individuals in category <em>i</em>. Expected_i = The count of individuals predicted to be in category <em>i</em> under the null hypothesis. $\pi_{i0}$ = The hypothesized probability of an individual belonging to category <em>i</em>. $\chi^2$ = The test statistic measuring the discrepancy between observed and expected counts.</p> Signup and view all the answers

Which of the following is NOT a topic covered under the binomial model?

<p>Chi-squared test for independence (D)</p> Signup and view all the answers

A sample statistic's value remains constant from one sample to another.

<p>False (B)</p> Signup and view all the answers

What is the term for the probability distribution of a sample statistic?

<p>Sampling Distribution</p> Signup and view all the answers

If $Y_1, Y_2, ..., Y_n$ are i.i.d. from a population with mean $µ$, then the sample mean $Y = (\sum_{i=1}^{n} Y_i)/n$ is a point ________ of $µ$.

<p>estimator</p> Signup and view all the answers

In Sampling Distribution Case A, what is assumed about the population distribution and variance?

<p>Population distribution is normal; population variance is known. (A)</p> Signup and view all the answers

If $E(Y) = \mu$, then $Y$ is a biased estimator of $µ$.

<p>False (B)</p> Signup and view all the answers

In Sampling Distribution Case A, what is the mean of Y, denoted as E(Y)?

<p>µ</p> Signup and view all the answers

In Sampling Distribution Case A, what is the variance of $Y$?

<p>$\sigma^2 / n$ (D)</p> Signup and view all the answers

In the context of small sample tests comparing proportions, what distribution does N11 (number of successes in sample 1) follow under the null hypothesis, when conditioned on row and column totals?

<p>Hypergeometric distribution (D)</p> Signup and view all the answers

In the example given about surgical mortality rates, N11 represents the total number of deaths across both emergency and other cases.

<p>False (B)</p> Signup and view all the answers

In a hypergeometric distribution context, if $n1$ represents the number of orange balls (sample 1) and $n2$ represents the number of green balls (sample 2), what does $n_{·1}$ signify?

<p>the total number of balls selected</p> Signup and view all the answers

The dhyper function in R calculates the ________ for a hypergeometric distribution.

<p>pmf</p> Signup and view all the answers

Why is N11, representing the number of successes in one sample, modeled using a hypergeometric distribution rather than a binomial distribution in this specific context?

<p>Because we are sampling without replacement. (B)</p> Signup and view all the answers

Match the notation with the descriptions in the context of hypergeometric distribution:

<p>n1· = Number of orange balls(sample 1) n2· = Number of green balls(sample 2) n·1 = Total number of balls selected N11 = Number of orange balls among the selected</p> Signup and view all the answers

In the surgical mortality rate example, how is the P-value calculated?

<p>Pr(Observe 1 or more deadly emergency surgery, conditional on 8 total deaths) (C)</p> Signup and view all the answers

In calculating the P-value, conditioning on the row totals is irrelevant when using a hypergeometric distribution.

<p>False (B)</p> Signup and view all the answers

In the nut allergy study, what are the appropriate null and alternative hypotheses to test if there is a difference in the proportion of nut allergies between children whose mothers consumed at least 5 servings of nuts per week during pregnancy and those who consumed less than 5 servings?

<p>$H_0: \pi_1 = \pi_2$, $H_a: \pi_1 \neq \pi_2$ (D)</p> Signup and view all the answers

In hypothesis testing for the difference between two proportions, a one-tailed test is always more appropriate than a two-tailed test.

<p>False (B)</p> Signup and view all the answers

In the nut allergy study, what are the sample sizes ($n_1$ and $n_2$) for each group?

<p>$n_1 = 1366$, $n_2 = 6842$</p> Signup and view all the answers

The estimator for the difference between two population proportions ($\pi_1 - \pi_2$) is calculated as ______.

<p>$p_1 - p_2$</p> Signup and view all the answers

Which of the following is used to estimate the standard error when conducting a large sample Z-test for two proportions?

<p>$\sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2)}{n_2}}$ (C)</p> Signup and view all the answers

What condition must be met to ensure that the approximation using the normal distribution for the difference of sample proportions is valid?

<p>All cell counts ($n_{11}, n_{12}, n_{21}, n_{22}$) must be at least 10. (C)</p> Signup and view all the answers

Match the following terms with their definitions related to two-proportion problems:

<p>$\pi_1$ = Population proportion of success in group 1 $\pi_2$ = Population proportion of success in group 2 $p_1$ = Sample proportion of success in group 1 $p_2$ = Sample proportion of success in group 2</p> Signup and view all the answers

What does the Central Limit Theorem (CLT) allow us to assume about the distribution of the difference between two sample proportions ($p_1 - p_2$) when the sample sizes are large?

<p>Approximately normal</p> Signup and view all the answers

What does the P-value represent?

<p>The probability of observing the sample data or more extreme data towards H1, assuming H0 is true. (C)</p> Signup and view all the answers

A small P-value indicates strong evidence in favor of the null hypothesis.

<p>False (B)</p> Signup and view all the answers

In the context of hypothesis testing, what is the decision rule based on the P-value and significance level (alpha)?

<p>Reject H0 if p-value &lt; α</p> Signup and view all the answers

The P-value is the probability of observing the sample data or more extreme data towards H1, assuming ______ is true.

<p>H0</p> Signup and view all the answers

What is a drawback of making decisions based solely on rejection regions?

<p>It depends on the direction of H1 and α. (A)</p> Signup and view all the answers

Which of the following is NOT a correct interpretation of the P-value?

<p>The probability that the null hypothesis happened by chance. (D)</p> Signup and view all the answers

According to the provided information, a P-value provides a 'degree of significance'.

<p>False (B)</p> Signup and view all the answers

In a one-sample Z test for the mean (µ), given a scenario where a high blood pressure is defined as a systolic blood pressure level higher than 120 mmHg, what would be the null hypothesis (H0) in terms of µ?

<p>µ ≤ 120 mmHg or µ = 120 mmHg</p> Signup and view all the answers

In hypothesis testing, what does the p-value represent?

<p>The probability of observing a test statistic as extreme as, or more extreme than, the one computed if the null hypothesis is true. (B)</p> Signup and view all the answers

A one-sample z-test is appropriate when the population standard deviation is unknown and the sample size is small.

<p>False (B)</p> Signup and view all the answers

For Jane's blood pressure measurements, the test statistic (z) was calculated to be 1. If the critical value for a one-sided test at = 0.05 is 1.645, do you reject the null hypothesis that her blood pressure is not at risk?

<p>No</p> Signup and view all the answers

The process of verifying that your data meets certain conditions before applying a statistical test is known as the ______ phase.

<p>diagnosis</p> Signup and view all the answers

If = 0.05, what is the probability of making a Type I error?

<p>0.05 (D)</p> Signup and view all the answers

What is the purpose of inferential statistical methods?

<p>To make generalizations &amp; inferences about the population. (A)</p> Signup and view all the answers

In the given blood preassure example, what is the null hypothesis?

<p>Jane's blood pressure level is not at risk of high blood pressure. (D)</p> Signup and view all the answers

Match the following terms with their corresponding definitions:

<p>P-value = The probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. Alpha () = The probability of rejecting the null hypothesis when it is true (Type I error). Test Statistic = A value calculated from sample data that is used to determine whether to reject the null hypothesis. Null Hypothesis = A statement about a population parameter that is assumed to be true until there is convincing evidence to the contrary.</p> Signup and view all the answers

Flashcards

Multinomial Distribution

Describes the probability distribution of counts for multiple categories. Nᵢ represents the number of outcomes for category i.

E(Nᵢ) in Multinomial

The expected value (average) for the number of outcomes in category i in a multinomial distribution. Calculated as the product of n (total trials) and πᵢ (probability of category i).

Chi-Square Goodness of Fit Test

A statistical test to assess if observed data fits a hypothesized distribution. It compares observed counts to expected counts.

Hypotheses for Goodness of Fit

H₀: The probability of each category equals the expected probability. H₁: the probability of at least one category differs from the expected probability.

Signup and view all the flashcards

Chi-Square Test Statistic

A measure of the difference between observed and expected values in a goodness-of-fit test. A large value suggests a poor fit.

Signup and view all the flashcards

Small Sample Test for Proportions

Tests if proportions are different between two groups when sample sizes are small.

Signup and view all the flashcards

N11 in Proportion Tests

The number of successes in sample 1, considering row and column totals.

Signup and view all the flashcards

Distribution of N11

Under the null hypothesis, N11 follows a hypergeometric distribution.

Signup and view all the flashcards

P-value in Small Sample Test

Pr(Observe result or more extreme | null hypothesis is true).

Signup and view all the flashcards

Hypergeometric Distribution

A distribution for sampling without replacement from a finite population.

Signup and view all the flashcards

What is n1·?

n1· represents the total in the sample 1 (row total).

Signup and view all the flashcards

What is n2·?

n2· represents the total in the sample 2 (row total).

Signup and view all the flashcards

What is n·1?

n·1 represents the total successes of both samples (column total).

Signup and view all the flashcards

π1 (Nut Allergy Study)

Proportion with allergy among children whose mothers consumed ≥ 5 servings of nuts/week.

Signup and view all the flashcards

π2 (Nut Allergy Study)

Proportion with allergy among children whose mothers consumed < 5 servings of nuts/week

Signup and view all the flashcards

Estimator for (π1 − π2)

Estimates the difference between two population proportions (π1 - π2).

Signup and view all the flashcards

Estimator p1

p1 = n11 / n1· , n11 is number of successes in sample 1, n1· is sample size 1.

Signup and view all the flashcards

Estimator p2

p2 = n21 / n2·, n21 is number of successes in sample 2, n2· is sample size 2.

Signup and view all the flashcards

Expected Value of (p1 − p2)

E(p1 − p2) = (π1 − π2)

Signup and view all the flashcards

Variance of (p1 − p2)

Var(p1 − p2) = [π1(1 − π1)/n1·] + [π2(1 − π2)/n2·]

Signup and view all the flashcards

Large Sample Condition

All cell counts (n11, n12, n21, n22) are ≥ 10.

Signup and view all the flashcards

Sampling Distribution

A sample statistic's probability distribution.

Signup and view all the flashcards

Sample Statistic as a Random Variable

A variable whose value varies from sample to sample.

Signup and view all the flashcards

Goal of Sampling Distribution

Infer population parameters using sample data.

Signup and view all the flashcards

Sample Mean Formula

Sample mean (Y) = (sum of Yi)/ n

Signup and view all the flashcards

E(Y) = µ

The sample mean (Y) is an unbiased estimator of µ.

Signup and view all the flashcards

Sampling Distribution Case A Conditions

If the population is normally distributed and variance is known.

Signup and view all the flashcards

E(Y) in Case A

The mean of Y (sample mean) is the population mean (µ).

Signup and view all the flashcards

Var(Y) in Case A

Var(Y) = σ²/n (population variance divided by sample size).

Signup and view all the flashcards

P-value

The observed significance level in hypothesis testing.

Signup and view all the flashcards

P-value Interpretation

The likelihood of obtaining the observed data (or more extreme) if the null hypothesis is true.

Signup and view all the flashcards

P-value meaning

Observed significance level.

Signup and view all the flashcards

P-value Misconception

It is NOT the probability that the null hypothesis is true.

Signup and view all the flashcards

P-value fallacies

It is NOT the probability your decision is wrong.

Signup and view all the flashcards

P-value definition

It is the probability of observing your data (or more extreme) if the null hypothesis is true.

Signup and view all the flashcards

Decision rule using P-value

Compare the p-value to α (significance level).

Signup and view all the flashcards

Decision with p-value < α

Reject H0.

Signup and view all the flashcards

One Sample Z Test

A statistical test used to determine whether there is enough evidence to reject a null hypothesis about a population mean when the population standard deviation is known.

Signup and view all the flashcards

Alpha (α)

The pre-set threshold for determining statistical significance. If the p-value is less than alpha, we reject the null hypothesis.

Signup and view all the flashcards

Rejection Region (RR)

A pre-determined range of values for the test statistic where, if the calculated statistic falls within this region, we reject the null hypothesis.

Signup and view all the flashcards

Assumptions (Statistical Models)

Data requirements and conditions that should be met for the outcome of a statistical test to be valid.

Signup and view all the flashcards

Probability Model

A mathematical representation of the probability distribution of the data.

Signup and view all the flashcards

Statistical Analysis

Application of statistical methods to examine and draw conclusions from data.

Signup and view all the flashcards

Inferential Statistical Methods

Inferential statistical methods utilize tools such as hypothesis testing to make conclusions/inferences about the population/process.

Signup and view all the flashcards

Study Notes

STA 6176 Biostatistics Midterm Review

Review of Topics

  • Counting Data includes:
    • Binomial model, Binomial test (small sample) and Z test (large sample) for Binomial proportion (Topic 4).
    • Z test (large sample) and Fisher's exact test (small sample) for two proportions and Hypergeometric model (Topic 5).
    • Poisson model, Cl of Binomial proportion by Poisson approximation (large n, small π) (Topic 6).
    • Multinomial model, x² Test for goodness-of-fit (Topic 7).
  • Categorical Data includes:
    • x² Test for association in two-way contingency table (Topic 8).
    • x² Test for trend in 2 × k table (Topic 9).

Review by Types of Data

  • For one categorical variable:
    • When there are two outcomes, use the Binomial model.
    • When there are more than two outcomes, use the Multinomial model and goodness of fit.
  • For two categorical variables:
    • Use the Hypergeometric model and Fisher's test in a 2 × 2 table.
    • Analyse association in r × c table.
    • Analyse trend in 2 × k table.

One Variable, Two Outcomes

  • Using a Binomial model Y ~ Bin(n, π), where π = P(success).
  • To test for π, the null hypothesis Ηo : π = πο, and alternative hypothesis Η₁ : π <, >, ≠ πο.
  • In the Small sample (n < 50), use the Binomial Exact test for π.
  • In the Moderately large sample (n < 50 and ηπο (1 – πο) ≥ 10), use the Z test for π with continuity correction.
  • In the Large sample (n < 50 and ηπο (1 – πο) ≥ 100), use the Z test for π.
  • If n is large, and π is small (n ≥ 20, π ≤ 0.1, and observed y ≥ 5), use Poisson Approximated CI for π.

One Variable, More Than Two Outcomes

  • Using the Multinomial model
    • (N1, N2, ..., Nk) ~ Multinomial (Ν., π1, π2, ..., πκ).
  • To test for goodness-of-fit:
    • The null hypothesis Ηο : πι = πι ...., πκ = πκ, and alternative hypothesis Η₁ : At least one is not equal.
    • Calculate The expected value using E(N;) = ηπ.
    • Degrees of freedom is DF = k - 1.
    • Requires a large sample.

Topic 8 Chi-Square Test for Association

Categorical Data Analysis
  • Focuses on Chapter 7 Categorical Data
  • Study the relationships of two categorical variables, variables can have more than two levels/categories
  • Example analysis could be smoking status vs cancer status
  • We count the number of occurrences under each pair of conditions and enter them into a (two-way) contingency table
    • Contingency table a generalization of a 2×2 table
  • Generalized contingency table
    • r # of rows
    • c = # of columns
    • i = row index
    • j = column index
    • nij = number of occurrences in ith row and jth column
Two-Way Contingency Table
  • Two-way contingency tables are a r × c contingency table.
  • r represents the number of rows.
  • c represents the number of columns.
  • i is the index of row levels, where i = 1, 2, ..., r.
  • j is the index of column levels, where j = 1, 2, ..., c.
  • represents the number of occurrences in the i row level and j" column level.
Two-Way Contingency Table Example
  • Example 7.1 Gastric freezing

    • A balloon was lowered into the patients stomach
    • Then coolant was placed in the balloon
    • This was to see if it would heal duodenal ulcers
    • There were 2 conditions freeze and sham
      • sham = controlled variable, everything else is the same, tube in the mouth
  • Example Question , is there any difference between the treatment and control

    • Equivalently, any association between the treatments and the cause of endpoints?
Probability Model in Two-Way Contingency Table
  • Taking a sample of units from the population
  • Observing each unit and take values of two categorical variable ( one from the column the other from the row)
  • πij = probability that the row variable takes on level i and column variable takes on level j
  • The sum of all probabilities will be 1
  • πi. = sumj(πij ) = P(Row == level i), sum over the columns for the rows and the probability the row equals the ith level
  • π.j = sumi πij = P(Column == level j), sums over the rows for each column to get the probability that the column == jth level
  • Think of Nij as a random variable conditioning the row totals col totals and total overall
  • Nij bin (n., πij) so E(Nij ) = n.. π I J, or π I J = E(Nij)/N
Chi-Square Test for Association

For Row and column variables

  • Ho: No association
  • H1: There is association
  • P(A n B) == P(A) x P(B)
  • No assocation between row and column variables P (row == and column == j) == P (Row == I)) ( P Column j)
  • Ho: π == (pi, )( pj,)
  • H1: π != (pi, )( pj, )
  • if we have a 2x2 table than Ho mean that there is no differences between the probabilities If H0 of NO ASSOCATION than πij = πi + πj implies that E(nij) approximately equal to (ni. x n.j)/ni. nj. N..is the expected count
  • So at ith row and jth column we get X2 equals - to test statistic similarly to goodfit Test X2 equals (observed i–Expected)2 / expected
Chi-Square Test For Goodness of Fit
  • Useful for determining if observed sample data is consistent with hypothesized distribution. If observed and expected data are close = good fit

  • Hypotheses:

    • H0: Data follows a specified distribution
    • H1: Data does not follow distribution
    • Test statistic: 2 K Observed i -Experi
    • X2 = equals, equals
    • Sum at I minus equals 1
  • Expected

  • Sampling distribution:

    • X2 degrees of freedom (k – 1).

Multinomial Model

  • It can be used when you can count the data in different multinomial models

  • Trial has K outcomes where K is greater than 2. K equals the number of categories.

  • P(Outcome J) = TTJ, I = 12 K

  • N independent identical trials

  • The number of outcomes I has multinomial distribution.

  • Notation ni multinomial and pi

  • the means values e(ni) = npi

Example of Multinomial Models

  • AA DOMINANT aa recessive
  • Two parents are a the spring to be AA to a or probability.
  • Ratio1.2.1 To probability 1/4 1/2 1/4, if we consider 639 offsprings.

The mean is expected values are :

Expected to be en wonder equals, E and one equals.

Poisson Model

  • Used to Model the counts of event in time or space, assuption is the of event that occurs in space or space time, should independant of area where we are modelling. Ex number of arrivals to emergency room

  • Approximate binomial model when n is large , and pi is small the assuption is n greater and equal to 20, and pI less then 1. The number of diseases.

  • let Y mean is a parameter of lambda. e^(-y) lambda k/k

  • The observed Y is what can be used to make this model to predict what a confidence model may look. ((sqrtY-1) squared),(sqrtY+1) squared)

  • The Cl for lambda and approximate Cl

  • let X mean is a binomial test , we need to check it is greated and equals to 5. (square root (x)-1) squared/n (square root (x)+1 squared/n

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Questions cover multinomial distributions, Chi-Square goodness-of-fit tests, null hypothesis testing, and sampling distributions. It also covers sample statistics and point estimators.

More Like This

Lab 3: Chi-Square Test in Genetics
12 questions
Chi-Square Test for Goodness of Fit
19 questions

Chi-Square Test for Goodness of Fit

ExceedingChrysoprase7632 avatar
ExceedingChrysoprase7632
Use Quizgecko on...
Browser
Browser