Podcast
Questions and Answers
What does the null hypothesis represent?
What does the null hypothesis represent?
A smaller p-value indicates insufficient evidence to reject the null hypothesis.
A smaller p-value indicates insufficient evidence to reject the null hypothesis.
False
The p-value is the probability of observing a difference as extreme as what was observed, assuming the _______ is true.
The p-value is the probability of observing a difference as extreme as what was observed, assuming the _______ is true.
null hypothesis
Match the following terms with their definitions:
Match the following terms with their definitions:
Signup and view all the answers
Which of the following is NOT a type of categorical variable?
Which of the following is NOT a type of categorical variable?
Signup and view all the answers
A box plot is best used for displaying normally distributed data.
A box plot is best used for displaying normally distributed data.
Signup and view all the answers
What is the purpose of using the interquartile range (IQR) in data analysis?
What is the purpose of using the interquartile range (IQR) in data analysis?
Signup and view all the answers
The _____ is the most common value in a data set.
The _____ is the most common value in a data set.
Signup and view all the answers
Which of the following describes a positive skew in a distribution?
Which of the following describes a positive skew in a distribution?
Signup and view all the answers
When data is normally distributed, the mean and median will be different.
When data is normally distributed, the mean and median will be different.
Signup and view all the answers
What are the two measures commonly used to describe central tendency?
What are the two measures commonly used to describe central tendency?
Signup and view all the answers
In a graph representing normal distribution, approximately _____% of observations lie within +/- 2 standard deviations of the mean.
In a graph representing normal distribution, approximately _____% of observations lie within +/- 2 standard deviations of the mean.
Signup and view all the answers
Which of the following best describes a bimodal distribution?
Which of the following best describes a bimodal distribution?
Signup and view all the answers
What is a point estimate?
What is a point estimate?
Signup and view all the answers
Sampling distribution captures the fixed nature of population parameters.
Sampling distribution captures the fixed nature of population parameters.
Signup and view all the answers
What does standard error indicate?
What does standard error indicate?
Signup and view all the answers
A confidence interval provides a range of values which we are ____% confident contains the true population value.
A confidence interval provides a range of values which we are ____% confident contains the true population value.
Signup and view all the answers
Match the terms with their definitions:
Match the terms with their definitions:
Signup and view all the answers
What does a p-value represent in hypothesis testing?
What does a p-value represent in hypothesis testing?
Signup and view all the answers
A wider confidence interval indicates higher precision in estimating the population parameter.
A wider confidence interval indicates higher precision in estimating the population parameter.
Signup and view all the answers
What is the primary cause of bias in estimates?
What is the primary cause of bias in estimates?
Signup and view all the answers
Which of the following statements regarding the Central Limit Theorem (CLT) is true?
Which of the following statements regarding the Central Limit Theorem (CLT) is true?
Signup and view all the answers
A sufficiently large sample size can result in a normal distribution from a skewed parent population.
A sufficiently large sample size can result in a normal distribution from a skewed parent population.
Signup and view all the answers
What formula is used to calculate the standard error (SE)?
What formula is used to calculate the standard error (SE)?
Signup and view all the answers
The z-value used for a hypothesis test at a 95% confidence level is ____.
The z-value used for a hypothesis test at a 95% confidence level is ____.
Signup and view all the answers
What does a p-value inform us about in hypothesis testing?
What does a p-value inform us about in hypothesis testing?
Signup and view all the answers
If the 95% confidence interval does not contain the null value, the p-value is greater than 0.05.
If the 95% confidence interval does not contain the null value, the p-value is greater than 0.05.
Signup and view all the answers
What is the formula for the standard deviation when estimating a proportion?
What is the formula for the standard deviation when estimating a proportion?
Signup and view all the answers
The t-distribution is useful when the __________ standard deviation is unknown.
The t-distribution is useful when the __________ standard deviation is unknown.
Signup and view all the answers
Match the following statistical terms with their descriptions:
Match the following statistical terms with their descriptions:
Signup and view all the answers
What does a correlation coefficient (r) of 0 indicate?
What does a correlation coefficient (r) of 0 indicate?
Signup and view all the answers
Z-scores measure the distance of each observation from the median in units of standard deviation.
Z-scores measure the distance of each observation from the median in units of standard deviation.
Signup and view all the answers
What is the primary purpose of a scatter plot?
What is the primary purpose of a scatter plot?
Signup and view all the answers
The prevalence of a disease is defined as the proportion of people in a population that has the disease at a particular point in time, calculated as number of people with the disease divided by _____.
The prevalence of a disease is defined as the proportion of people in a population that has the disease at a particular point in time, calculated as number of people with the disease divided by _____.
Signup and view all the answers
Which of the following is true about z-scores?
Which of the following is true about z-scores?
Signup and view all the answers
A positive skew indicates that the mean is less than the median.
A positive skew indicates that the mean is less than the median.
Signup and view all the answers
What does the area under the curve in a standard normal distribution represent?
What does the area under the curve in a standard normal distribution represent?
Signup and view all the answers
The ____ rule states that about 95% of observations fall within two standard deviations of the mean.
The ____ rule states that about 95% of observations fall within two standard deviations of the mean.
Signup and view all the answers
Which statement about log transformation is correct?
Which statement about log transformation is correct?
Signup and view all the answers
Conditional distribution can be represented as either row or column percentages in a contingency table.
Conditional distribution can be represented as either row or column percentages in a contingency table.
Signup and view all the answers
What is the geometric mean used for?
What is the geometric mean used for?
Signup and view all the answers
The cumulative incidence is calculated as the number of new cases of a disease divided by the number of people initially _____.
The cumulative incidence is calculated as the number of new cases of a disease divided by the number of people initially _____.
Signup and view all the answers
What is indicated by an r value of -1?
What is indicated by an r value of -1?
Signup and view all the answers
Case control studies focus on groups based on whether they have the outcome of interest.
Case control studies focus on groups based on whether they have the outcome of interest.
Signup and view all the answers
Study Notes
Defining Data
- Classify data as numerical or categorical
- Numerical data can be further classified as discrete or continuous
- Categorical data can be further classified as ordinal or nominal
- Binary/dichotomous data has two possible values
- Derived variables are created from categories using a threshold or cutoff
- Transformed variables involve transformations like log transformations or standardized scores
Outcome and Exposure
- Outcome variables are response or dependent variables (Y)
- Exposure variables are explanatory or independent variables (X)
- Case control groups can be outcome or exposure dependent
- Treatment groups are exposure dependent
- Predictor is the exposure variable
Descriptive Statistics
- Frequency distributions show the frequency of each data value.
- Histograms are bar graphs showing the frequency of data within ranges.
- Bin width is the size of each bar in a histogram
- Frequency represents the number of data points in a bin
- Range is the difference between highest and lowest data values
- Mode is the most frequent data value
- Density is the frequency normalized so that the area under the chart equals one.
- Skewness describes the asymmetry of the distribution.
- Positive skew (right-hand skew) - data tailing off to the right
- Negative skew (left-hand skew) - data tailing off to the left
- Normal distribution has no skew.
- Modality describes how many peaks the graph has
- Unimodal (one peak)
- Bimodal (two peaks)
- Multimodal (multiple peaks)
- Uniform (truly random) data is represented by a flat line
Sample Statistics
- Central tendency measures the center of data.
- Mean is the average of all values (Σ X / N)
- Median is the middle value when sorted.
- Mode is the most frequent value.
- When data is not normally distributed, the median is preferred.
- Dispersion measures the spread of data.
- Variance calculates the average of squared differences from the mean.
- Standard deviation is the square root of variance.
- 68.27% of observations fall within one standard deviation of mean.
- 95.45% of observations fall within two standard deviations of mean.
- 99.7% of observations fall within three standard deviations of mean.
- Geometric mean is a better measure of central tendency for skewed data
Measures of Dispersion
- Interquartile range (IQR) is the difference between the 25th and 75th percentile.
- Box plot visually represents distribution, showing the median, quartiles and outliers.
Categorical Summaries and Displays
- Bar charts are for categorical data
- Histograms are for continuous data
- Contingency tables present the relationship between two categorical variables
- Conditional distribution
- Relative frequency distribution can use row percentages or column percentages
- Case control studies consider whether they have the outcome.
Scatter Plot
- Scatter plot shows relationship between two co-varying variables
- Evaluate relationship direction and strength
Correlation Coefficient
- Quantifies the linear relationship strength between two variables.
- r takes values from -1 to +1
- r = 1 perfect positive linear relationship
- r = -1 perfect negative linear relationship
- r = 0 no linear relationship
Z-scores
- Linear transformation of a measurement
- centre and spread change, but shape doesn't change
- Compare scores from different normal distributions
- Reference range calculation
- A Z-score measures the distance from the mean, in standard deviation units.
Standard Normal Distribution
- Displays probability of observing a z-score.
- Total area under the curve is one.
Logged Variables
- Logarithmic scales represent equal multiplicative change
- Pulls low values apart and high values together
- Log transformation reduces positive skew and eases analysis
- Back transformation converts log values back to original scale
- Geometric mean is suitable for positively skewed data.
Prevalence and Incidence
- Prevalence is the proportion of the population with a disease at a specific time.
- Incidence is the rate of new cases of a disease during a specified period.
- Cumulative incidence (also risk) is the proportion getting the disease in a specific time period.
Observational and Experimental Designs
- Experimental studies (interventional) manipulate a variable to observe its effect.
- Observational studies observe natural variation in a population.
- Cohort and case-control studies are subtypes of observational studies
- Cross-sectional studies collect data at a single point in time.
Sample Size and Power/Regression/Systematic Review
- Sample size and power analysis inform the needed sample size to detect an effect.
- Regression models examine relationships between variables.
- Systematic review synthesizes findings from multiple studies.
Meta-analysis
- Meta-analysis combines results from multiple studies.
Hypothesis Testing
- Null hypothesis states no difference or association.
- Alternative hypothesis asserts a difference or association.
- P-value is the probability of observing results as extreme as, or more extreme than, those observed, if the null hypothesis is true.
- Test statistic measures how far the data are from the null.
- Degrees of freedom affect the shape of the distribution and are often related to sample size.
Point Estimates and Parameters
- Statistical inference uses samples to make statements about populations.
- Point estimates (sample values) are best guesses for population parameters (unknown values)
- Confidence interval estimates the range likely encompassing a specific population parameter.
Confidence Intervals
- Confidence intervals provide range of likely values for population parameter
- Widens with decreasing sample size.
- 95% CI means that, in repeated sampling, 95% of such estimated ranges will contain true value.
Sampling Considerations
- Random errors reflect variability in repeated sampling
- Standard error estimates how much point estimates deviate from population parameters during repeated sampling.
Statistical Assumptions and Methods
- Methods like t-tests have underlying assumptions like data normality, independence, and homogeneity of variances.
- Using correct statistical method and applying appropriate corrections is critical
- Non-parametric methods may be needed when assumptions cannot be met.
Chi-squared Test of Independence
- Assesses independence between two categorical variables.
- Examines if there's an association between variables.
- Calculations involve (observed - expected)²/ expected for all cells.
- Assumes no more than 20% of expected values are less than 5 in each cell.
- Degrees of freedom (df) are calculated as (rows- 1) x (columns - 1)
Clinical Trials
- A clinical trial is a specific type of experimental study.
Risk Difference
- The difference in probabilities between two groups.
Risk Ratio
- The ratio of probabilities between two groups.
Odds Ratio
- The ratio of odds between two groups.
Rank-Based Tests
- Non-parametric tests rank observations
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on data classification and descriptive statistics. This quiz covers various types of data, including numerical and categorical, as well as concepts like outcome and exposure variables. Challenge yourself with questions about frequency distributions, histograms, and more!