Podcast
Questions and Answers
What does the null hypothesis represent?
What does the null hypothesis represent?
- No difference or no association (correct)
- A specific probability distribution
- There is a difference or association
- An alternative hypothesis
A smaller p-value indicates insufficient evidence to reject the null hypothesis.
A smaller p-value indicates insufficient evidence to reject the null hypothesis.
False (B)
The p-value is the probability of observing a difference as extreme as what was observed, assuming the _______ is true.
The p-value is the probability of observing a difference as extreme as what was observed, assuming the _______ is true.
null hypothesis
Match the following terms with their definitions:
Match the following terms with their definitions:
Which of the following is NOT a type of categorical variable?
Which of the following is NOT a type of categorical variable?
A box plot is best used for displaying normally distributed data.
A box plot is best used for displaying normally distributed data.
What is the purpose of using the interquartile range (IQR) in data analysis?
What is the purpose of using the interquartile range (IQR) in data analysis?
The _____ is the most common value in a data set.
The _____ is the most common value in a data set.
Which of the following describes a positive skew in a distribution?
Which of the following describes a positive skew in a distribution?
When data is normally distributed, the mean and median will be different.
When data is normally distributed, the mean and median will be different.
What are the two measures commonly used to describe central tendency?
What are the two measures commonly used to describe central tendency?
In a graph representing normal distribution, approximately _____% of observations lie within +/- 2 standard deviations of the mean.
In a graph representing normal distribution, approximately _____% of observations lie within +/- 2 standard deviations of the mean.
Which of the following best describes a bimodal distribution?
Which of the following best describes a bimodal distribution?
What is a point estimate?
What is a point estimate?
Sampling distribution captures the fixed nature of population parameters.
Sampling distribution captures the fixed nature of population parameters.
What does standard error indicate?
What does standard error indicate?
A confidence interval provides a range of values which we are ____% confident contains the true population value.
A confidence interval provides a range of values which we are ____% confident contains the true population value.
Match the terms with their definitions:
Match the terms with their definitions:
What does a p-value represent in hypothesis testing?
What does a p-value represent in hypothesis testing?
A wider confidence interval indicates higher precision in estimating the population parameter.
A wider confidence interval indicates higher precision in estimating the population parameter.
What is the primary cause of bias in estimates?
What is the primary cause of bias in estimates?
Which of the following statements regarding the Central Limit Theorem (CLT) is true?
Which of the following statements regarding the Central Limit Theorem (CLT) is true?
A sufficiently large sample size can result in a normal distribution from a skewed parent population.
A sufficiently large sample size can result in a normal distribution from a skewed parent population.
What formula is used to calculate the standard error (SE)?
What formula is used to calculate the standard error (SE)?
The z-value used for a hypothesis test at a 95% confidence level is ____.
The z-value used for a hypothesis test at a 95% confidence level is ____.
What does a p-value inform us about in hypothesis testing?
What does a p-value inform us about in hypothesis testing?
If the 95% confidence interval does not contain the null value, the p-value is greater than 0.05.
If the 95% confidence interval does not contain the null value, the p-value is greater than 0.05.
What is the formula for the standard deviation when estimating a proportion?
What is the formula for the standard deviation when estimating a proportion?
The t-distribution is useful when the __________ standard deviation is unknown.
The t-distribution is useful when the __________ standard deviation is unknown.
Match the following statistical terms with their descriptions:
Match the following statistical terms with their descriptions:
What does a correlation coefficient (r) of 0 indicate?
What does a correlation coefficient (r) of 0 indicate?
Z-scores measure the distance of each observation from the median in units of standard deviation.
Z-scores measure the distance of each observation from the median in units of standard deviation.
What is the primary purpose of a scatter plot?
What is the primary purpose of a scatter plot?
The prevalence of a disease is defined as the proportion of people in a population that has the disease at a particular point in time, calculated as number of people with the disease divided by _____.
The prevalence of a disease is defined as the proportion of people in a population that has the disease at a particular point in time, calculated as number of people with the disease divided by _____.
Which of the following is true about z-scores?
Which of the following is true about z-scores?
A positive skew indicates that the mean is less than the median.
A positive skew indicates that the mean is less than the median.
What does the area under the curve in a standard normal distribution represent?
What does the area under the curve in a standard normal distribution represent?
The ____ rule states that about 95% of observations fall within two standard deviations of the mean.
The ____ rule states that about 95% of observations fall within two standard deviations of the mean.
Which statement about log transformation is correct?
Which statement about log transformation is correct?
Conditional distribution can be represented as either row or column percentages in a contingency table.
Conditional distribution can be represented as either row or column percentages in a contingency table.
What is the geometric mean used for?
What is the geometric mean used for?
The cumulative incidence is calculated as the number of new cases of a disease divided by the number of people initially _____.
The cumulative incidence is calculated as the number of new cases of a disease divided by the number of people initially _____.
What is indicated by an r value of -1?
What is indicated by an r value of -1?
Case control studies focus on groups based on whether they have the outcome of interest.
Case control studies focus on groups based on whether they have the outcome of interest.
Flashcards
Data Variable Types
Data Variable Types
Data variables can be numerical (discrete or continuous) or categorical (ordinal or nominal). Binary/dichotomous variables have two categories.
Derived Variables
Derived Variables
Calculated from other variables using thresholds or cutoffs.
Transformed Variables
Transformed Variables
Variables changed using methods like log transformations or standardized scores.
Frequency Distribution
Frequency Distribution
Signup and view all the flashcards
Histogram
Histogram
Signup and view all the flashcards
Skewness
Skewness
Signup and view all the flashcards
Central Tendency (Mean)
Central Tendency (Mean)
Signup and view all the flashcards
Central Tendency (Median)
Central Tendency (Median)
Signup and view all the flashcards
Standard Deviation
Standard Deviation
Signup and view all the flashcards
Interquartile Range (IQR)
Interquartile Range (IQR)
Signup and view all the flashcards
Population Parameter
Population Parameter
Signup and view all the flashcards
Null Hypothesis
Null Hypothesis
Signup and view all the flashcards
Sample Estimate
Sample Estimate
Signup and view all the flashcards
P-value
P-value
Signup and view all the flashcards
Sampling Distribution
Sampling Distribution
Signup and view all the flashcards
Test Statistic (Z-statistic)
Test Statistic (Z-statistic)
Signup and view all the flashcards
p-value < 0.05
p-value < 0.05
Signup and view all the flashcards
Standard Error
Standard Error
Signup and view all the flashcards
p-value > 0.05
p-value > 0.05
Signup and view all the flashcards
Bias
Bias
Signup and view all the flashcards
Precision
Precision
Signup and view all the flashcards
Confidence Interval
Confidence Interval
Signup and view all the flashcards
P-value
P-value
Signup and view all the flashcards
Central Limit Theorem (CLT)
Central Limit Theorem (CLT)
Signup and view all the flashcards
Normal Distribution of Sample Means
Normal Distribution of Sample Means
Signup and view all the flashcards
Skewed Population and Sample Means
Skewed Population and Sample Means
Signup and view all the flashcards
Normal Approximation of Binomial
Normal Approximation of Binomial
Signup and view all the flashcards
Error Factor (z x SE)
Error Factor (z x SE)
Signup and view all the flashcards
Confidence Interval (CI)
Confidence Interval (CI)
Signup and view all the flashcards
T-distribution
T-distribution
Signup and view all the flashcards
Degrees of Freedom (df)
Degrees of Freedom (df)
Signup and view all the flashcards
Sample Standard Deviation (s)
Sample Standard Deviation (s)
Signup and view all the flashcards
Categorical data
Categorical data
Signup and view all the flashcards
Contingency table
Contingency table
Signup and view all the flashcards
Conditional distribution
Conditional distribution
Signup and view all the flashcards
Case-control study
Case-control study
Signup and view all the flashcards
Scatter plot
Scatter plot
Signup and view all the flashcards
Correlation coefficient (r)
Correlation coefficient (r)
Signup and view all the flashcards
Z-score
Z-score
Signup and view all the flashcards
Standard normal distribution
Standard normal distribution
Signup and view all the flashcards
68-95-99.7 rule
68-95-99.7 rule
Signup and view all the flashcards
Logarithmic scale
Logarithmic scale
Signup and view all the flashcards
Log transformation
Log transformation
Signup and view all the flashcards
Prevalence
Prevalence
Signup and view all the flashcards
Incidence
Incidence
Signup and view all the flashcards
Cumulative incidence
Cumulative incidence
Signup and view all the flashcards
Study Notes
Defining Data
- Classify data as numerical or categorical
- Numerical data can be further classified as discrete or continuous
- Categorical data can be further classified as ordinal or nominal
- Binary/dichotomous data has two possible values
- Derived variables are created from categories using a threshold or cutoff
- Transformed variables involve transformations like log transformations or standardized scores
Outcome and Exposure
- Outcome variables are response or dependent variables (Y)
- Exposure variables are explanatory or independent variables (X)
- Case control groups can be outcome or exposure dependent
- Treatment groups are exposure dependent
- Predictor is the exposure variable
Descriptive Statistics
- Frequency distributions show the frequency of each data value.
- Histograms are bar graphs showing the frequency of data within ranges.
- Bin width is the size of each bar in a histogram
- Frequency represents the number of data points in a bin
- Range is the difference between highest and lowest data values
- Mode is the most frequent data value
- Density is the frequency normalized so that the area under the chart equals one.
- Skewness describes the asymmetry of the distribution.
- Positive skew (right-hand skew) - data tailing off to the right
- Negative skew (left-hand skew) - data tailing off to the left
- Normal distribution has no skew.
- Modality describes how many peaks the graph has
- Unimodal (one peak)
- Bimodal (two peaks)
- Multimodal (multiple peaks)
- Uniform (truly random) data is represented by a flat line
Sample Statistics
- Central tendency measures the center of data.
- Mean is the average of all values (Σ X / N)
- Median is the middle value when sorted.
- Mode is the most frequent value.
- When data is not normally distributed, the median is preferred.
- Dispersion measures the spread of data.
- Variance calculates the average of squared differences from the mean.
- Standard deviation is the square root of variance.
- 68.27% of observations fall within one standard deviation of mean.
- 95.45% of observations fall within two standard deviations of mean.
- 99.7% of observations fall within three standard deviations of mean.
- Geometric mean is a better measure of central tendency for skewed data
Measures of Dispersion
- Interquartile range (IQR) is the difference between the 25th and 75th percentile.
- Box plot visually represents distribution, showing the median, quartiles and outliers.
Categorical Summaries and Displays
- Bar charts are for categorical data
- Histograms are for continuous data
- Contingency tables present the relationship between two categorical variables
- Conditional distribution
- Relative frequency distribution can use row percentages or column percentages
- Case control studies consider whether they have the outcome.
Scatter Plot
- Scatter plot shows relationship between two co-varying variables
- Evaluate relationship direction and strength
Correlation Coefficient
- Quantifies the linear relationship strength between two variables.
- r takes values from -1 to +1
- r = 1 perfect positive linear relationship
- r = -1 perfect negative linear relationship
- r = 0 no linear relationship
Z-scores
- Linear transformation of a measurement
- centre and spread change, but shape doesn't change
- Compare scores from different normal distributions
- Reference range calculation
- A Z-score measures the distance from the mean, in standard deviation units.
Standard Normal Distribution
- Displays probability of observing a z-score.
- Total area under the curve is one.
Logged Variables
- Logarithmic scales represent equal multiplicative change
- Pulls low values apart and high values together
- Log transformation reduces positive skew and eases analysis
- Back transformation converts log values back to original scale
- Geometric mean is suitable for positively skewed data.
Prevalence and Incidence
- Prevalence is the proportion of the population with a disease at a specific time.
- Incidence is the rate of new cases of a disease during a specified period.
- Cumulative incidence (also risk) is the proportion getting the disease in a specific time period.
Observational and Experimental Designs
- Experimental studies (interventional) manipulate a variable to observe its effect.
- Observational studies observe natural variation in a population.
- Cohort and case-control studies are subtypes of observational studies
- Cross-sectional studies collect data at a single point in time.
Sample Size and Power/Regression/Systematic Review
- Sample size and power analysis inform the needed sample size to detect an effect.
- Regression models examine relationships between variables.
- Systematic review synthesizes findings from multiple studies.
Meta-analysis
- Meta-analysis combines results from multiple studies.
Hypothesis Testing
- Null hypothesis states no difference or association.
- Alternative hypothesis asserts a difference or association.
- P-value is the probability of observing results as extreme as, or more extreme than, those observed, if the null hypothesis is true.
- Test statistic measures how far the data are from the null.
- Degrees of freedom affect the shape of the distribution and are often related to sample size.
Point Estimates and Parameters
- Statistical inference uses samples to make statements about populations.
- Point estimates (sample values) are best guesses for population parameters (unknown values)
- Confidence interval estimates the range likely encompassing a specific population parameter.
Confidence Intervals
- Confidence intervals provide range of likely values for population parameter
- Widens with decreasing sample size.
- 95% CI means that, in repeated sampling, 95% of such estimated ranges will contain true value.
Sampling Considerations
- Random errors reflect variability in repeated sampling
- Standard error estimates how much point estimates deviate from population parameters during repeated sampling.
Statistical Assumptions and Methods
- Methods like t-tests have underlying assumptions like data normality, independence, and homogeneity of variances.
- Using correct statistical method and applying appropriate corrections is critical
- Non-parametric methods may be needed when assumptions cannot be met.
Chi-squared Test of Independence
- Assesses independence between two categorical variables.
- Examines if there's an association between variables.
- Calculations involve (observed - expected)²/ expected for all cells.
- Assumes no more than 20% of expected values are less than 5 in each cell.
- Degrees of freedom (df) are calculated as (rows- 1) x (columns - 1)
Clinical Trials
- A clinical trial is a specific type of experimental study.
Risk Difference
- The difference in probabilities between two groups.
Risk Ratio
- The ratio of probabilities between two groups.
Odds Ratio
- The ratio of odds between two groups.
Rank-Based Tests
- Non-parametric tests rank observations
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.