Descriptive Statistics Overview
18 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which centrality measure is most appropriate for a continuous variable with a right skewed distribution?

  • Mode
  • Standard Deviation
  • Mean
  • Median (correct)

What graphical representation is best suited for displaying the relationship between two continuous variables?

  • Bar Chart
  • Histogram
  • Scatterplot (correct)
  • Box Plot

What is a key assumption of the independent sample T-Test?

  • Observations can be dependent
  • The data must be categorical
  • Variance does not need to be equal
  • Samples must be independent (correct)

When analyzing categorical variables, which test would you use to determine if there is a significant association between them?

<p>Chi-Square Test (B)</p> Signup and view all the answers

Which of the following statements is true regarding left skewed distributions?

<p>Mean is less than Median (A)</p> Signup and view all the answers

Which analysis method is appropriate for ordinal data paired across two samples?

<p>Sign Test (D)</p> Signup and view all the answers

What is the main requirement for using the Wilcoxon Test?

<p>Two paired samples (C)</p> Signup and view all the answers

What do you interpret when the p-value is low, according to hypothesis testing?

<p>There is strong evidence against the null hypothesis (B)</p> Signup and view all the answers

What does a p-value less than 0.05 indicate?

<p>Reject the Null Hypothesis (D)</p> Signup and view all the answers

Which factor does NOT influence the width of a 95% Confidence Interval?

<p>Margin of error (B)</p> Signup and view all the answers

What does it mean if the 95% Confidence Interval includes 0 for the difference in means?

<p>There is no difference between the groups (B)</p> Signup and view all the answers

Which type of sampling ensures that each individual has an equal chance of being selected?

<p>Simple random sampling (D)</p> Signup and view all the answers

What is the main weakness of stratified sampling?

<p>Requires a complete list of the population (D)</p> Signup and view all the answers

In effect modification, how is the relationship between a risk factor and an outcome primarily identified?

<p>By comparing odds ratios or relative risks (B)</p> Signup and view all the answers

What does a 95% Confidence Interval that does not include 1 indicate in terms of risk?

<p>Significant risk difference (C)</p> Signup and view all the answers

What sampling method involves selecting every kth individual from a list after setting a random starting point?

<p>Systematic sampling (D)</p> Signup and view all the answers

Which condition describes confounding in research?

<p>It distorts the effect of a risk factor on an outcome (D)</p> Signup and view all the answers

What does a sample size formula n = 1.96 x m^2 calculate?

<p>Required sample size (D)</p> Signup and view all the answers

Flashcards

Skewed Distribution

A distribution where one tail is longer than the other, pulling the mean away from the median.

Median

The middle value in a dataset when arranged numerically. Not affected by outliers.

Mean

The average of all values in a dataset. Affected by outliers and skews.

Continuous Variable

A variable that can take on any value within a range.

Signup and view all the flashcards

Categorical Variable

A variable that falls into distinct categories. Examples include country of origin or rating on a scale.

Signup and view all the flashcards

Bar Chart

A graphical display to show the frequency of categories within a categorical variable.

Signup and view all the flashcards

Box Plot

A graphical display to show the central tendency and spread of numerical data (continuous variable).

Signup and view all the flashcards

p-value

The probability of observing results as extreme as, or more extreme than, the observed results, if the null hypothesis is true.

Signup and view all the flashcards

Statistical Significance (p < 0.05)

Results are likely not due to chance; reject null hypothesis.

Signup and view all the flashcards

Statistical Significance (p > 0.05)

Results are likely due to chance; accept null hypothesis.

Signup and view all the flashcards

95% Confidence Interval

Range of plausible values for a population parameter.

Signup and view all the flashcards

Confidence Interval Formula

Estimate ± margin of error

Signup and view all the flashcards

Confounding

Distorts the effect of a risk factor on an outcome.

Signup and view all the flashcards

Effect Modification

Effect of a risk factor varies depending on another variable.

Signup and view all the flashcards

Simple Random Sampling

Each member has an equal chance of being selected.

Signup and view all the flashcards

Stratified Sampling

Divide population into groups (strata) and sample from each.

Signup and view all the flashcards

Systematic Sampling

Select every kth member from a list.

Signup and view all the flashcards

Cluster Sampling

Select entire groups (clusters) randomly.

Signup and view all the flashcards

Study Notes

Descriptive Statistics

  • Continuous Data Distributions: Data can be either odd-shaped or bell-shaped.
  • Odd-Shaped Data: Use the median to describe centrality and quartiles to describe spread because mean is skewed by the outliers.
  • Bell-Shaped Data: Use the mean for central tendency and the standard deviation for spread.
  • Skewness impact: Right skew (positive): mean > median; Left skew (negative): mean < median.

Categorical vs. Continuous Data

  • Categorical Data: More than two responses, non-numeric, ordered (ordinal) or unordered (nominal). Examples include country of birth or Likert scales.
  • Continuous Data: Data can take any value in a given range, numerical, with infinite resolution. Examples include weight, age, and height.

Data Visualization

  • Categorical Data (1 variable): Bar Chart
  • Categorical Data (2 variables): Clustered Bar Chart
  • Continuous Data (1 variable): Box Plot or Histogram
  • Continuous Data (2 variables): Scatterplot
  • Continuous & Categorical Data: Two Boxplots or two Histograms

Hypothesis Testing Assumptions

  • Independent Samples T-test:
    • Normal distribution of continuous variables (check histograms).
    • Independent samples (individuals within samples and between samples).
    • Equal variances.
  • Chi-Square Test:
    • Categorical data.
    • Independent observations (individuals within/between groups).
    • Expected frequencies (in each category) > 5.
  • Sign Test:
    • Ordinal or continuous data.
    • Independent observations.
    • Paired samples.
    • Symmetrical differences (differences between paired observations).
  • Mann-Whitney U test:
    • Independent groups.
    • Rankable data.
    • Compares two groups, not normally distributed categorical data (ordinal).
  • Wilcoxon Test:
    • Paired samples.
    • Ordinal categorical data (rankable data).

Interpretation of p-values

  • p < 0.05: Reject the null hypothesis; statistically significant.
  • p ≥ 0.05: Accept the null hypothesis; not statistically significant.
  • Lack of significance: Does not confirm equivalence, may indicate insufficient data. Consider increasing sample size.

Confidence Intervals (95%)

  • Formula: Estimate ± (2 x Standard Error)
  • Interpretation: 95% confident that true population value is within the limits.
  • Factors impacting width: Sample size and standard deviation.
  • Consistency with p-values: 95% CI includes zero, p ≥ 0.05 (lack of significant difference); CI does not include zero, p < 0.05 (significant difference).

Standard Deviation vs. Standard Error

  • Standard Error (SE): Measure of precision (of an estimate); standard deviation of estimates, related to n.
  • Standard Deviation (SD): Measure of spread (of the data itself).

Sample Size Estimation

  • Formula: n = 1.96 x (m/SE)² where m = margin of error

Confounding vs. Effect Modification

  • Confounding: Distortion of the effect of a risk factor on an outcome. Related to exposure and outcome (but not in causal pathway). Example: Age confounding smoking & lung cancer.
  • Effect Modification: Different relationship between exposure and outcome depending on a third variable. Example: Gender modifying the effect of a medicine on blood pressure.
  • Identification: Compare odds ratios or risk ratios to see if relationship changes with categories.

Probability Sampling

  • Simple Random Sampling: Each sample has equal selection chance. Example: Random name selection. Strengths = minimizes bias. Weaknesses = underrepresentation for small populations, requires a complete list, potentially difficult to attain.
  • Stratified Sampling: Randomly sampling from subgroups/strata with shared characteristics. Example: GPA study across degrees (English, Science, Engineering) . Strengths: precise representation, all subgroups. Weaknesses: Requires detailed population information, not easy to survey.
  • Systematic Sampling: Select every kth unit after a random starting point. Example: Every 10th name. Strengths = even distribution/representation. Weaknesses: Sensitive to patterns within population.
  • Cluster Sampling: Randomly selecting entire clusters/groups. Example: Randomly selecting schools. Strengths = large populations over wide area. Weaknesses = increased sampling error.
  • Two-Stage Sampling: Two levels of selection. Example: survey streaming platform users. Strengths: cost-effective, wide-area representation. Weaknesses: biased results for uneven population distribution across clusters.

Identifying Statistical Significance with Difference in Means/Risk and Confidence Intervals

  • Difference in Mean: 95% CI does not include 0 → significant difference; includes 0 → no significant difference.
  • Risk: 95% CI does not include 1 → significant risk difference; includes 1 → no significant risk difference.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

This quiz covers the fundamentals of descriptive statistics, including data distributions, key differences between categorical and continuous data, and visualization techniques. Understand the appropriate measures of central tendency and spread for various data types, as well as how to effectively represent data visually.

More Like This

Use Quizgecko on...
Browser
Browser