Podcast
Questions and Answers
Which centrality measure is most appropriate for a continuous variable with a right skewed distribution?
Which centrality measure is most appropriate for a continuous variable with a right skewed distribution?
- Mode
- Standard Deviation
- Mean
- Median (correct)
What graphical representation is best suited for displaying the relationship between two continuous variables?
What graphical representation is best suited for displaying the relationship between two continuous variables?
- Bar Chart
- Histogram
- Scatterplot (correct)
- Box Plot
What is a key assumption of the independent sample T-Test?
What is a key assumption of the independent sample T-Test?
- Observations can be dependent
- The data must be categorical
- Variance does not need to be equal
- Samples must be independent (correct)
When analyzing categorical variables, which test would you use to determine if there is a significant association between them?
When analyzing categorical variables, which test would you use to determine if there is a significant association between them?
Which of the following statements is true regarding left skewed distributions?
Which of the following statements is true regarding left skewed distributions?
Which analysis method is appropriate for ordinal data paired across two samples?
Which analysis method is appropriate for ordinal data paired across two samples?
What is the main requirement for using the Wilcoxon Test?
What is the main requirement for using the Wilcoxon Test?
What do you interpret when the p-value is low, according to hypothesis testing?
What do you interpret when the p-value is low, according to hypothesis testing?
What does a p-value less than 0.05 indicate?
What does a p-value less than 0.05 indicate?
Which factor does NOT influence the width of a 95% Confidence Interval?
Which factor does NOT influence the width of a 95% Confidence Interval?
What does it mean if the 95% Confidence Interval includes 0 for the difference in means?
What does it mean if the 95% Confidence Interval includes 0 for the difference in means?
Which type of sampling ensures that each individual has an equal chance of being selected?
Which type of sampling ensures that each individual has an equal chance of being selected?
What is the main weakness of stratified sampling?
What is the main weakness of stratified sampling?
In effect modification, how is the relationship between a risk factor and an outcome primarily identified?
In effect modification, how is the relationship between a risk factor and an outcome primarily identified?
What does a 95% Confidence Interval that does not include 1 indicate in terms of risk?
What does a 95% Confidence Interval that does not include 1 indicate in terms of risk?
What sampling method involves selecting every kth individual from a list after setting a random starting point?
What sampling method involves selecting every kth individual from a list after setting a random starting point?
Which condition describes confounding in research?
Which condition describes confounding in research?
What does a sample size formula n = 1.96 x m^2 calculate?
What does a sample size formula n = 1.96 x m^2 calculate?
Flashcards
Skewed Distribution
Skewed Distribution
A distribution where one tail is longer than the other, pulling the mean away from the median.
Median
Median
The middle value in a dataset when arranged numerically. Not affected by outliers.
Mean
Mean
The average of all values in a dataset. Affected by outliers and skews.
Continuous Variable
Continuous Variable
Signup and view all the flashcards
Categorical Variable
Categorical Variable
Signup and view all the flashcards
Bar Chart
Bar Chart
Signup and view all the flashcards
Box Plot
Box Plot
Signup and view all the flashcards
p-value
p-value
Signup and view all the flashcards
Statistical Significance (p < 0.05)
Statistical Significance (p < 0.05)
Signup and view all the flashcards
Statistical Significance (p > 0.05)
Statistical Significance (p > 0.05)
Signup and view all the flashcards
95% Confidence Interval
95% Confidence Interval
Signup and view all the flashcards
Confidence Interval Formula
Confidence Interval Formula
Signup and view all the flashcards
Confounding
Confounding
Signup and view all the flashcards
Effect Modification
Effect Modification
Signup and view all the flashcards
Simple Random Sampling
Simple Random Sampling
Signup and view all the flashcards
Stratified Sampling
Stratified Sampling
Signup and view all the flashcards
Systematic Sampling
Systematic Sampling
Signup and view all the flashcards
Cluster Sampling
Cluster Sampling
Signup and view all the flashcards
Study Notes
Descriptive Statistics
- Continuous Data Distributions: Data can be either odd-shaped or bell-shaped.
- Odd-Shaped Data: Use the median to describe centrality and quartiles to describe spread because mean is skewed by the outliers.
- Bell-Shaped Data: Use the mean for central tendency and the standard deviation for spread.
- Skewness impact: Right skew (positive): mean > median; Left skew (negative): mean < median.
Categorical vs. Continuous Data
- Categorical Data: More than two responses, non-numeric, ordered (ordinal) or unordered (nominal). Examples include country of birth or Likert scales.
- Continuous Data: Data can take any value in a given range, numerical, with infinite resolution. Examples include weight, age, and height.
Data Visualization
- Categorical Data (1 variable): Bar Chart
- Categorical Data (2 variables): Clustered Bar Chart
- Continuous Data (1 variable): Box Plot or Histogram
- Continuous Data (2 variables): Scatterplot
- Continuous & Categorical Data: Two Boxplots or two Histograms
Hypothesis Testing Assumptions
- Independent Samples T-test:
- Normal distribution of continuous variables (check histograms).
- Independent samples (individuals within samples and between samples).
- Equal variances.
- Chi-Square Test:
- Categorical data.
- Independent observations (individuals within/between groups).
- Expected frequencies (in each category) > 5.
- Sign Test:
- Ordinal or continuous data.
- Independent observations.
- Paired samples.
- Symmetrical differences (differences between paired observations).
- Mann-Whitney U test:
- Independent groups.
- Rankable data.
- Compares two groups, not normally distributed categorical data (ordinal).
- Wilcoxon Test:
- Paired samples.
- Ordinal categorical data (rankable data).
Interpretation of p-values
- p < 0.05: Reject the null hypothesis; statistically significant.
- p ≥ 0.05: Accept the null hypothesis; not statistically significant.
- Lack of significance: Does not confirm equivalence, may indicate insufficient data. Consider increasing sample size.
Confidence Intervals (95%)
- Formula: Estimate ± (2 x Standard Error)
- Interpretation: 95% confident that true population value is within the limits.
- Factors impacting width: Sample size and standard deviation.
- Consistency with p-values: 95% CI includes zero, p ≥ 0.05 (lack of significant difference); CI does not include zero, p < 0.05 (significant difference).
Standard Deviation vs. Standard Error
- Standard Error (SE): Measure of precision (of an estimate); standard deviation of estimates, related to n.
- Standard Deviation (SD): Measure of spread (of the data itself).
Sample Size Estimation
- Formula: n = 1.96 x (m/SE)² where m = margin of error
Confounding vs. Effect Modification
- Confounding: Distortion of the effect of a risk factor on an outcome. Related to exposure and outcome (but not in causal pathway). Example: Age confounding smoking & lung cancer.
- Effect Modification: Different relationship between exposure and outcome depending on a third variable. Example: Gender modifying the effect of a medicine on blood pressure.
- Identification: Compare odds ratios or risk ratios to see if relationship changes with categories.
Probability Sampling
- Simple Random Sampling: Each sample has equal selection chance. Example: Random name selection. Strengths = minimizes bias. Weaknesses = underrepresentation for small populations, requires a complete list, potentially difficult to attain.
- Stratified Sampling: Randomly sampling from subgroups/strata with shared characteristics. Example: GPA study across degrees (English, Science, Engineering) . Strengths: precise representation, all subgroups. Weaknesses: Requires detailed population information, not easy to survey.
- Systematic Sampling: Select every kth unit after a random starting point. Example: Every 10th name. Strengths = even distribution/representation. Weaknesses: Sensitive to patterns within population.
- Cluster Sampling: Randomly selecting entire clusters/groups. Example: Randomly selecting schools. Strengths = large populations over wide area. Weaknesses = increased sampling error.
- Two-Stage Sampling: Two levels of selection. Example: survey streaming platform users. Strengths: cost-effective, wide-area representation. Weaknesses: biased results for uneven population distribution across clusters.
Identifying Statistical Significance with Difference in Means/Risk and Confidence Intervals
- Difference in Mean: 95% CI does not include 0 → significant difference; includes 0 → no significant difference.
- Risk: 95% CI does not include 1 → significant risk difference; includes 1 → no significant risk difference.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz covers the fundamentals of descriptive statistics, including data distributions, key differences between categorical and continuous data, and visualization techniques. Understand the appropriate measures of central tendency and spread for various data types, as well as how to effectively represent data visually.