Biostatistics Lectures 1-4 Summary
39 Questions
3 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the median in a data set?

  • The difference between the highest and lowest value
  • The average of all values divided by the number of values
  • The value that separates the lower half from the upper half (correct)
  • The most frequently occurring value
  • Which measure of spread is defined as the square root of the variance?

  • Interquartile range
  • Standard deviation (correct)
  • Mean absolute deviation
  • Range
  • Which of the following statements about outliers is true?

  • Outliers are calculated as any values further from the mean.
  • Outliers are values within the interquartile range.
  • Outliers lie beyond the whiskers in a box plot. (correct)
  • Outliers must always be removed from the dataset.
  • What does the interquartile range (IQR) represent?

    <p>The range of the middle 50% of the data</p> Signup and view all the answers

    What is true about the mode of a dataset?

    <p>It can be more than one value if there are multiple most frequent values.</p> Signup and view all the answers

    What does a p-value less than 0.05 in a Shapiro-Wilk test indicate?

    <p>There is a significant deviation from normality.</p> Signup and view all the answers

    According to the central limit theorem, what happens to the means of repeated random samples?

    <p>They will always be normally distributed, regardless of the population distribution.</p> Signup and view all the answers

    What defines the standard error (SE) in the context of the central limit theorem?

    <p>The standard deviation of the distribution of sample means.</p> Signup and view all the answers

    In a Q-Q plot, what does it indicate if the points lie close to the diagonal line y = x?

    <p>The variable is likely normally distributed.</p> Signup and view all the answers

    What are the components of the distribution of a variable in the sample?

    <p>Mean x̄, standard deviation s.</p> Signup and view all the answers

    What symbols are used to represent the sample mean and sample variance?

    <p>X and S</p> Signup and view all the answers

    In what type of data are median and quantiles considered more appropriate?

    <p>Skewed or non-normal data</p> Signup and view all the answers

    What does a probability distribution function f(x) specify?

    <p>The probabilities of different values of a random variable</p> Signup and view all the answers

    What is a characteristic of the Normal distribution?

    <p>Mean, median, and mode are the same</p> Signup and view all the answers

    Which statement about z-scores is correct?

    <p>They indicate the number of standard deviations a value is from the mean</p> Signup and view all the answers

    What percentage of probability is included between -1σ and 1σ in a Normal distribution?

    <p>68%</p> Signup and view all the answers

    What type of variable is a random variable?

    <p>Its values are determined by random phenomena</p> Signup and view all the answers

    What is the empirical description of probability distributions based on?

    <p>Measures used for frequency distributions</p> Signup and view all the answers

    What does a relative frequency table provide?

    <p>The percentage of participants in each category</p> Signup and view all the answers

    In what situation are contingency tables primarily used?

    <p>To analyze the relationship between two categorical variables</p> Signup and view all the answers

    Which of the following is NOT appropriate for frequency tables?

    <p>Continuous variables with many categories</p> Signup and view all the answers

    When using a bar plot to illustrate categorical variables, which is essential?

    <p>Bars should be the same width with space in between</p> Signup and view all the answers

    What does the presence of marginal totals in a contingency table indicate?

    <p>The categories are independent of each other</p> Signup and view all the answers

    What is the key distinction between a histogram and a bar plot?

    <p>Histograms have no space between bars while bar plots do</p> Signup and view all the answers

    In a right skewed histogram, which of the following is true?

    <p>Mean is greater than the median</p> Signup and view all the answers

    Cumulative frequencies can be applied to which type of variable?

    <p>Discrete numeric and ordinal variables</p> Signup and view all the answers

    What does the Wilcoxon-Mann-Whitney test assess?

    <p>Whether two samples come from the same distribution</p> Signup and view all the answers

    What is represented by a p-value less than 0.05 in hypothesis testing?

    <p>There is evidence against the null hypothesis, suggesting significance</p> Signup and view all the answers

    Which statement accurately describes the null hypothesis (H0)?

    <p>Sample means differ due to random error alone</p> Signup and view all the answers

    What is the formula used to calculate the 95% Confidence Interval for a mean?

    <p>CI = X ± Z*SE</p> Signup and view all the answers

    What does a difference between population means of $d = 0$ indicate?

    <p>Both population means are the same</p> Signup and view all the answers

    What is the z-value associated with a 95% confidence level?

    <p>1.96</p> Signup and view all the answers

    In the context of proportions, what is crucial about the ratio X/Y?

    <p>The numerator should not be included in the denominator</p> Signup and view all the answers

    Which of the following assumptions must hold for the t-test?

    <p>Both samples must be independent.</p> Signup and view all the answers

    When calculating the standard error for the mean, what is the formula used?

    <p>SE = s/√n</p> Signup and view all the answers

    What should be used instead of the z-distribution for smaller samples?

    <p>T-distribution</p> Signup and view all the answers

    Which of the following approaches can be used to test the assumption of normality?

    <p>Q-Q plots and Shapiro-Wilk test</p> Signup and view all the answers

    What can be stated about the confidence interval in relation to the true mean µ?

    <p>It gives a range in which µ will lie with a high probability.</p> Signup and view all the answers

    What happens when the assumptions for the t-test are not satisfied?

    <p>Alternative statistical methods should be considered.</p> Signup and view all the answers

    Study Notes

    Biostatistics Lectures 1-4 Summary

    • Biostatistics is the collection, classification, analysis, and interpretation of data from biomedical research. It helps create medical knowledge.
    • Science is empirical, relying on observations and experiences. Inductive reasoning draws general conclusions from specific observations.
    • Basic research and clinical research are interconnected. Clinical research uses randomized studies to avoid biased results.
    • In biostatistics, samples are studied because they are subsets of populations. However, the sample itself is not the main focus.
    • Samples are used to make inferences about larger populations. Larger samples have a higher likelihood of accurately reflecting the population. Small, biased samples are more prone to random error, which can lead to inaccurate population representations.
    • Random error (sampling error) is the difference between the sample mean and the population mean due to sampling.
    • Sample quantities are known, measured values. Population quantities are unknown and have to be estimated.
    • Clinical research involves study design, data collection, data processing, and data analysis.
    • Statistical software, like R, facilitates reproducible research, which is a crucial aspect of science. Reproducible research means the results can be verified and repeated by others.
    • Reproducible research requires using code throughout the process.
    • Rectangular data is used in most studies. This is a tabular structure where rows represent cases (observations, records) and columns represent characteristics or variables.
    • Primary key is a unique identifier for each case

    Types of Variables

    • Categorical variables:
      • Nominal: No inherent order (e.g., blood type, sex).
      • Ordinal: Inherent order (e.g., educational level, satisfaction).
      • Dichotomous: Only two categories (e.g., yes/no, diseased/healthy).
    • Numerical variables:
      • Continuous: Measured on a continuum with infinite possible values (e.g. temperature).
      • Discrete: Counted, with finite possible values (e.g., number of children).

    Frequency Tables

    • Frequency tables present the number or percentage of participants in each category.
    • Useful for categorical variables and grouped numeric data like "age group".
    • Can also be used for ordinal variables.
    • Cumulative frequencies can be displayed if the number of categories is limited.

    Contingency Tables

    • Contingency tables (cross-tabulations) are used to explore the association between two categorical variables.
    • Examining the relationship between two categorical variables (e.g., exposure and outcome).
    • Similar marginals and category-specific proportions suggest that the variables are not associated.

    Plotting Categorical Variables

    • Pie charts and bar plots illustrate relative frequencies. Pie charts show percentages, bar charts show counts.
    • Bar graphs, arranged horizontally or vertically, are useful to show categorical data.

    Plotting Numeric Variables

    • Histograms and box plots are used for numeric data.
    • Histograms show the distribution of data across different ranges of values or classes, graphically depicting data frequency patterns.
    • Box plots visualize data spread and identify outliers as the values that are significantly different from most of the data.

    Measures of Location

    • Mean: Average of a set of values. Sensitive to outliers.
    • Median: The middle value when data is sorted. Not sensitive to outliers.
    • Quantiles: Values that divide the data into segments based on proportions (e.g., 10th percentile, median = 50th percentile, quartiles).
    • Mode: Most frequent value.

    Measures of Spread

    • Variance: The average of squared deviations from the mean.
    • Standard deviation: Square root of the variance.
    • Range: Difference between maximum and minimum values.
    • Interquartile range (IQR): Difference between 75th and 25th percentile.

    Distribution as a Concept

    • Probability distributions describe the probability of different values in a variable.
    • Density plots show the total area under the curve equal to 100%.

    Population versus Sample

    • Population involves the entire group or collection of data of interest. The mean and variance are unknown
    • Samples are parts of populations. Sample means and sample variances are known and are used to estimate their population counterparts.

    Describing Numerical Variables

    • Median and quantiles are useful for skewed data.

    Probability Distributions

    • Probability distributions are used to describe variation in numeric data, giving the probability of each possible numerical outcome
    • Probability distributions are typically empirically described by mean, standard deviation, median, and quantiles.

    The Normal Distribution

    • Defined by its mean and standard deviation (i.e., μ and σ)
    • Its symmetrical which makes it useful for calculations
    • A large volume of data follows this curve, making it a useful analytical tool
    • Most probability falls within 1, 2, and 3 standard deviations from the mean
    • Used in many biostatistical calculations and test to create a standardized framework

    Normal Distribution Tests

    • Contextual knowledge—knowing the variable.
    • Shapiro-Wilk test: Determines if a variable's distribution deviates significantly from normal. Lower p-value (p<0.05) suggests deviation
    • Q-Q plots (quantile-quantile plots): Compare the quantiles of the variable to the quantiles of a normal distribution. Straight line points to a normal distribution.

    Importance of the Normal Distribution

    • Many variables follow a typical normal distribution.
    • Central limit theorem: The average of repeated samples approximates a normal distribution even if the underlying distribution isn't normal.
    • The standard error measures how accurate sample means are in estimating the population mean; a smaller SE shows greater accuracy

    The Three Distributions

    • Population distribution: Entire group including mean and stdev, unknown
    • Sample distribution: Portion of the population, including mean and stdev, known.
    • Sampling distribution: Distribution of sample means, including mean and standard error.

    Confidence Intervals

    • Confidence intervals provide a range within which the true population mean is likely to fall.
    • 95% CI means that 95% of repeated intervals would contain the true value if the experiment were repeated many times.
    • For small samples, t-distribution is used to account for the uncertainty in the sample deviation, and standard deviation, using a t-value instead of a z-value (larger t-values show greater uncertainty).

    Assumptions of t-tests

    • Data comes from normally distributed populations.
    • Sample data must be independent.
    • Populations have equal standard deviations.

    Comparing Two Independent Sample Means

    • Different sample means are often because of random error but also may reflect actual population differences in means.
    • Testing involves a null hypothesis that sample means are equally different (H0) or different means (H1)

    P-values and Hypothesis Testing

    • P-value: The probability of observing the data or more extreme results if the null hypothesis is true.
    • A small p-value (typically less than 0.05) suggests the null hypothesis is unlikely, and we reject it in favor of the alternative hypothesis.

    Proportions

    • Proportion: Part of the whole; a fraction of a total.
    • Proportion values are limited to 0-1.
    • We cannot use typical test methods for analyzing proportions.

    Binomial Distribution

    • Used for discrete variables with binary outcomes (success/failure).
    • Shaped in a skewed pattern and defined by p, probability of success and n or number of trials
    • This is used to define the probability that an outcome occurs at least x times in n trials.

    Confidence Intervals for Proportions

    • Confidence intervals for proportions estimate the range in which the true population proportion lies, with a given probability for repeated trials.
    • Exact methods are used when sample sizes are small, while larger sample sizes allow the use of the normal distribution methodology.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz provides a summary of the first four lectures on biostatistics, covering key concepts such as data collection, classification, and interpretation in biomedical research. Focus is placed on the importance of sample sizes and the impact of random error on research outcomes. Understanding these foundational elements is crucial for advancing medical knowledge through scientific inquiry.

    More Like This

    Biomedical Measurement Theories: Chapter 2 of SEBB 3043
    12 questions
    Biostatistics Study Design
    22 questions
    Biostatistics Module #9 Quiz
    37 questions

    Biostatistics Module #9 Quiz

    BeneficiaryFantasticArt avatar
    BeneficiaryFantasticArt
    Use Quizgecko on...
    Browser
    Browser