Podcast
Questions and Answers
What is the median in a data set?
What is the median in a data set?
Which measure of spread is defined as the square root of the variance?
Which measure of spread is defined as the square root of the variance?
Which of the following statements about outliers is true?
Which of the following statements about outliers is true?
What does the interquartile range (IQR) represent?
What does the interquartile range (IQR) represent?
Signup and view all the answers
What is true about the mode of a dataset?
What is true about the mode of a dataset?
Signup and view all the answers
What does a p-value less than 0.05 in a Shapiro-Wilk test indicate?
What does a p-value less than 0.05 in a Shapiro-Wilk test indicate?
Signup and view all the answers
According to the central limit theorem, what happens to the means of repeated random samples?
According to the central limit theorem, what happens to the means of repeated random samples?
Signup and view all the answers
What defines the standard error (SE) in the context of the central limit theorem?
What defines the standard error (SE) in the context of the central limit theorem?
Signup and view all the answers
In a Q-Q plot, what does it indicate if the points lie close to the diagonal line y = x?
In a Q-Q plot, what does it indicate if the points lie close to the diagonal line y = x?
Signup and view all the answers
What are the components of the distribution of a variable in the sample?
What are the components of the distribution of a variable in the sample?
Signup and view all the answers
What symbols are used to represent the sample mean and sample variance?
What symbols are used to represent the sample mean and sample variance?
Signup and view all the answers
In what type of data are median and quantiles considered more appropriate?
In what type of data are median and quantiles considered more appropriate?
Signup and view all the answers
What does a probability distribution function f(x) specify?
What does a probability distribution function f(x) specify?
Signup and view all the answers
What is a characteristic of the Normal distribution?
What is a characteristic of the Normal distribution?
Signup and view all the answers
Which statement about z-scores is correct?
Which statement about z-scores is correct?
Signup and view all the answers
What percentage of probability is included between -1σ and 1σ in a Normal distribution?
What percentage of probability is included between -1σ and 1σ in a Normal distribution?
Signup and view all the answers
What type of variable is a random variable?
What type of variable is a random variable?
Signup and view all the answers
What is the empirical description of probability distributions based on?
What is the empirical description of probability distributions based on?
Signup and view all the answers
What does a relative frequency table provide?
What does a relative frequency table provide?
Signup and view all the answers
In what situation are contingency tables primarily used?
In what situation are contingency tables primarily used?
Signup and view all the answers
Which of the following is NOT appropriate for frequency tables?
Which of the following is NOT appropriate for frequency tables?
Signup and view all the answers
When using a bar plot to illustrate categorical variables, which is essential?
When using a bar plot to illustrate categorical variables, which is essential?
Signup and view all the answers
What does the presence of marginal totals in a contingency table indicate?
What does the presence of marginal totals in a contingency table indicate?
Signup and view all the answers
What is the key distinction between a histogram and a bar plot?
What is the key distinction between a histogram and a bar plot?
Signup and view all the answers
In a right skewed histogram, which of the following is true?
In a right skewed histogram, which of the following is true?
Signup and view all the answers
Cumulative frequencies can be applied to which type of variable?
Cumulative frequencies can be applied to which type of variable?
Signup and view all the answers
What does the Wilcoxon-Mann-Whitney test assess?
What does the Wilcoxon-Mann-Whitney test assess?
Signup and view all the answers
What is represented by a p-value less than 0.05 in hypothesis testing?
What is represented by a p-value less than 0.05 in hypothesis testing?
Signup and view all the answers
Which statement accurately describes the null hypothesis (H0)?
Which statement accurately describes the null hypothesis (H0)?
Signup and view all the answers
What is the formula used to calculate the 95% Confidence Interval for a mean?
What is the formula used to calculate the 95% Confidence Interval for a mean?
Signup and view all the answers
What does a difference between population means of $d = 0$ indicate?
What does a difference between population means of $d = 0$ indicate?
Signup and view all the answers
What is the z-value associated with a 95% confidence level?
What is the z-value associated with a 95% confidence level?
Signup and view all the answers
In the context of proportions, what is crucial about the ratio X/Y?
In the context of proportions, what is crucial about the ratio X/Y?
Signup and view all the answers
Which of the following assumptions must hold for the t-test?
Which of the following assumptions must hold for the t-test?
Signup and view all the answers
When calculating the standard error for the mean, what is the formula used?
When calculating the standard error for the mean, what is the formula used?
Signup and view all the answers
What should be used instead of the z-distribution for smaller samples?
What should be used instead of the z-distribution for smaller samples?
Signup and view all the answers
Which of the following approaches can be used to test the assumption of normality?
Which of the following approaches can be used to test the assumption of normality?
Signup and view all the answers
What can be stated about the confidence interval in relation to the true mean µ?
What can be stated about the confidence interval in relation to the true mean µ?
Signup and view all the answers
What happens when the assumptions for the t-test are not satisfied?
What happens when the assumptions for the t-test are not satisfied?
Signup and view all the answers
Study Notes
Biostatistics Lectures 1-4 Summary
- Biostatistics is the collection, classification, analysis, and interpretation of data from biomedical research. It helps create medical knowledge.
- Science is empirical, relying on observations and experiences. Inductive reasoning draws general conclusions from specific observations.
- Basic research and clinical research are interconnected. Clinical research uses randomized studies to avoid biased results.
- In biostatistics, samples are studied because they are subsets of populations. However, the sample itself is not the main focus.
- Samples are used to make inferences about larger populations. Larger samples have a higher likelihood of accurately reflecting the population. Small, biased samples are more prone to random error, which can lead to inaccurate population representations.
- Random error (sampling error) is the difference between the sample mean and the population mean due to sampling.
- Sample quantities are known, measured values. Population quantities are unknown and have to be estimated.
- Clinical research involves study design, data collection, data processing, and data analysis.
- Statistical software, like R, facilitates reproducible research, which is a crucial aspect of science. Reproducible research means the results can be verified and repeated by others.
- Reproducible research requires using code throughout the process.
- Rectangular data is used in most studies. This is a tabular structure where rows represent cases (observations, records) and columns represent characteristics or variables.
- Primary key is a unique identifier for each case
Types of Variables
- Categorical variables:
- Nominal: No inherent order (e.g., blood type, sex).
- Ordinal: Inherent order (e.g., educational level, satisfaction).
- Dichotomous: Only two categories (e.g., yes/no, diseased/healthy).
- Numerical variables:
- Continuous: Measured on a continuum with infinite possible values (e.g. temperature).
- Discrete: Counted, with finite possible values (e.g., number of children).
Frequency Tables
- Frequency tables present the number or percentage of participants in each category.
- Useful for categorical variables and grouped numeric data like "age group".
- Can also be used for ordinal variables.
- Cumulative frequencies can be displayed if the number of categories is limited.
Contingency Tables
- Contingency tables (cross-tabulations) are used to explore the association between two categorical variables.
- Examining the relationship between two categorical variables (e.g., exposure and outcome).
- Similar marginals and category-specific proportions suggest that the variables are not associated.
Plotting Categorical Variables
- Pie charts and bar plots illustrate relative frequencies. Pie charts show percentages, bar charts show counts.
- Bar graphs, arranged horizontally or vertically, are useful to show categorical data.
Plotting Numeric Variables
- Histograms and box plots are used for numeric data.
- Histograms show the distribution of data across different ranges of values or classes, graphically depicting data frequency patterns.
- Box plots visualize data spread and identify outliers as the values that are significantly different from most of the data.
Measures of Location
- Mean: Average of a set of values. Sensitive to outliers.
- Median: The middle value when data is sorted. Not sensitive to outliers.
- Quantiles: Values that divide the data into segments based on proportions (e.g., 10th percentile, median = 50th percentile, quartiles).
- Mode: Most frequent value.
Measures of Spread
- Variance: The average of squared deviations from the mean.
- Standard deviation: Square root of the variance.
- Range: Difference between maximum and minimum values.
- Interquartile range (IQR): Difference between 75th and 25th percentile.
Distribution as a Concept
- Probability distributions describe the probability of different values in a variable.
- Density plots show the total area under the curve equal to 100%.
Population versus Sample
- Population involves the entire group or collection of data of interest. The mean and variance are unknown
- Samples are parts of populations. Sample means and sample variances are known and are used to estimate their population counterparts.
Describing Numerical Variables
- Median and quantiles are useful for skewed data.
Probability Distributions
- Probability distributions are used to describe variation in numeric data, giving the probability of each possible numerical outcome
- Probability distributions are typically empirically described by mean, standard deviation, median, and quantiles.
The Normal Distribution
- Defined by its mean and standard deviation (i.e., μ and σ)
- Its symmetrical which makes it useful for calculations
- A large volume of data follows this curve, making it a useful analytical tool
- Most probability falls within 1, 2, and 3 standard deviations from the mean
- Used in many biostatistical calculations and test to create a standardized framework
Normal Distribution Tests
- Contextual knowledge—knowing the variable.
- Shapiro-Wilk test: Determines if a variable's distribution deviates significantly from normal. Lower p-value (p<0.05) suggests deviation
- Q-Q plots (quantile-quantile plots): Compare the quantiles of the variable to the quantiles of a normal distribution. Straight line points to a normal distribution.
Importance of the Normal Distribution
- Many variables follow a typical normal distribution.
- Central limit theorem: The average of repeated samples approximates a normal distribution even if the underlying distribution isn't normal.
- The standard error measures how accurate sample means are in estimating the population mean; a smaller SE shows greater accuracy
The Three Distributions
- Population distribution: Entire group including mean and stdev, unknown
- Sample distribution: Portion of the population, including mean and stdev, known.
- Sampling distribution: Distribution of sample means, including mean and standard error.
Confidence Intervals
- Confidence intervals provide a range within which the true population mean is likely to fall.
- 95% CI means that 95% of repeated intervals would contain the true value if the experiment were repeated many times.
- For small samples, t-distribution is used to account for the uncertainty in the sample deviation, and standard deviation, using a t-value instead of a z-value (larger t-values show greater uncertainty).
Assumptions of t-tests
- Data comes from normally distributed populations.
- Sample data must be independent.
- Populations have equal standard deviations.
Comparing Two Independent Sample Means
- Different sample means are often because of random error but also may reflect actual population differences in means.
- Testing involves a null hypothesis that sample means are equally different (H0) or different means (H1)
P-values and Hypothesis Testing
- P-value: The probability of observing the data or more extreme results if the null hypothesis is true.
- A small p-value (typically less than 0.05) suggests the null hypothesis is unlikely, and we reject it in favor of the alternative hypothesis.
Proportions
- Proportion: Part of the whole; a fraction of a total.
- Proportion values are limited to 0-1.
- We cannot use typical test methods for analyzing proportions.
Binomial Distribution
- Used for discrete variables with binary outcomes (success/failure).
- Shaped in a skewed pattern and defined by p, probability of success and n or number of trials
- This is used to define the probability that an outcome occurs at least x times in n trials.
Confidence Intervals for Proportions
- Confidence intervals for proportions estimate the range in which the true population proportion lies, with a given probability for repeated trials.
- Exact methods are used when sample sizes are small, while larger sample sizes allow the use of the normal distribution methodology.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz provides a summary of the first four lectures on biostatistics, covering key concepts such as data collection, classification, and interpretation in biomedical research. Focus is placed on the importance of sample sizes and the impact of random error on research outcomes. Understanding these foundational elements is crucial for advancing medical knowledge through scientific inquiry.