Biostatistics Lectures 1-4 Summary

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the median in a data set?

The difference between the highest and lowest value
The average of all values divided by the number of values
The value that separates the lower half from the upper half (correct)
The most frequently occurring value

Which measure of spread is defined as the square root of the variance?

Interquartile range
Standard deviation (correct)
Mean absolute deviation
Range

Which of the following statements about outliers is true?

Outliers are calculated as any values further from the mean.
Outliers are values within the interquartile range.
Outliers lie beyond the whiskers in a box plot. (correct)
Outliers must always be removed from the dataset.

What does the interquartile range (IQR) represent?

The range of the middle 50% of the data (D)

Signup and view all the answers

What is true about the mode of a dataset?

It can be more than one value if there are multiple most frequent values. (D)

Signup and view all the answers

What does a p-value less than 0.05 in a Shapiro-Wilk test indicate?

There is a significant deviation from normality. (C)

Signup and view all the answers

According to the central limit theorem, what happens to the means of repeated random samples?

They will always be normally distributed, regardless of the population distribution. (D)

Signup and view all the answers

What defines the standard error (SE) in the context of the central limit theorem?

The standard deviation of the distribution of sample means. (D)

Signup and view all the answers

In a Q-Q plot, what does it indicate if the points lie close to the diagonal line y = x?

The variable is likely normally distributed. (D)

Signup and view all the answers

What are the components of the distribution of a variable in the sample?

Mean x̄, standard deviation s. (A)

Signup and view all the answers

What symbols are used to represent the sample mean and sample variance?

X and S (C)

Signup and view all the answers

In what type of data are median and quantiles considered more appropriate?

Skewed or non-normal data (C)

Signup and view all the answers

What does a probability distribution function f(x) specify?

The probabilities of different values of a random variable (C)

Signup and view all the answers

What is a characteristic of the Normal distribution?

Mean, median, and mode are the same (D)

Signup and view all the answers

Which statement about z-scores is correct?

They indicate the number of standard deviations a value is from the mean (C)

Signup and view all the answers

What percentage of probability is included between -1σ and 1σ in a Normal distribution?

68% (D)

Signup and view all the answers

What type of variable is a random variable?

Its values are determined by random phenomena (B)

Signup and view all the answers

What is the empirical description of probability distributions based on?

Measures used for frequency distributions (B)

Signup and view all the answers

What does a relative frequency table provide?

The percentage of participants in each category (A)

Signup and view all the answers

In what situation are contingency tables primarily used?

To analyze the relationship between two categorical variables (A)

Signup and view all the answers

Which of the following is NOT appropriate for frequency tables?

Continuous variables with many categories (A)

Signup and view all the answers

When using a bar plot to illustrate categorical variables, which is essential?

Bars should be the same width with space in between (C)

Signup and view all the answers

What does the presence of marginal totals in a contingency table indicate?

The categories are independent of each other (B)

Signup and view all the answers

What is the key distinction between a histogram and a bar plot?

Histograms have no space between bars while bar plots do (A)

Signup and view all the answers

In a right skewed histogram, which of the following is true?

Mean is greater than the median (B)

Signup and view all the answers

Cumulative frequencies can be applied to which type of variable?

Discrete numeric and ordinal variables (B)

Signup and view all the answers

What does the Wilcoxon-Mann-Whitney test assess?

Whether two samples come from the same distribution (C)

Signup and view all the answers

What is represented by a p-value less than 0.05 in hypothesis testing?

There is evidence against the null hypothesis, suggesting significance (D)

Signup and view all the answers

Which statement accurately describes the null hypothesis (H0)?

Sample means differ due to random error alone (D)

Signup and view all the answers

What is the formula used to calculate the 95% Confidence Interval for a mean?

CI = X ± Z*SE (D)

Signup and view all the answers

What does a difference between population means of $d = 0$ indicate?

Both population means are the same (D)

Signup and view all the answers

What is the z-value associated with a 95% confidence level?

1.96 (C)

Signup and view all the answers

In the context of proportions, what is crucial about the ratio X/Y?

The numerator should not be included in the denominator (D)

Signup and view all the answers

Which of the following assumptions must hold for the t-test?

Both samples must be independent. (D)

Signup and view all the answers

When calculating the standard error for the mean, what is the formula used?

SE = s/√n (D)

Signup and view all the answers

What should be used instead of the z-distribution for smaller samples?

T-distribution (D)

Signup and view all the answers

Which of the following approaches can be used to test the assumption of normality?

Q-Q plots and Shapiro-Wilk test (C)

Signup and view all the answers

What can be stated about the confidence interval in relation to the true mean µ?

It gives a range in which µ will lie with a high probability. (D)

Signup and view all the answers

What happens when the assumptions for the t-test are not satisfied?

Alternative statistical methods should be considered. (A)

Signup and view all the answers

Flashcards

Frequency Table

Displays the number of participants in each category of a variable.

Relative Frequency Table

Shows the percentage of participants in each category.

Contingency Table

Table showing the relationship between two categorical variables.

Categorical Variables

Variables with categories (e.g., colors, types).

Signup and view all the flashcards

Pie Chart

Visualizes relative frequencies with slices representing percentages.

Signup and view all the flashcards

Bar Plot

Displays categorical frequency using bars.

Signup and view all the flashcards

Histogram

Graphical display for numerical data with bins.

Signup and view all the flashcards

Skewed Distribution

Data's distribution is not symmetrical (e.g., right-skewed).

Signup and view all the flashcards

Mean

The average of a set of values, calculated by summing all values and dividing by the count.

Signup and view all the flashcards

Median

The middle value when a dataset is ordered. It's less affected by outliers than the mean.

Signup and view all the flashcards

Interquartile Range (IQR)

The difference between the 75th and 25th percentiles. A measure of data spread.

Signup and view all the flashcards

Outlier

A data point significantly different from other data points in a dataset.

Signup and view all the flashcards

Frequency Distribution

A visual or mathematical representation of data. Shows how often data items fall within specific ranges.

Signup and view all the flashcards

Normal Distribution

A common probability distribution where data points cluster around a central value, forming a bell shape.

Signup and view all the flashcards

Shapiro-Wilk test

A statistical test used to determine if a dataset comes from a normally distributed population.

Signup and view all the flashcards

Q-Q Plot

A graphical tool used to assess whether a dataset is normally distributed. Quantiles are plotted against each other.

Signup and view all the flashcards

Central Limit Theorem

Sampling distributions of sample means will be approximately normally distributed, even if the original population is not normally distributed.

Signup and view all the flashcards

Standard Error (SE)

The standard deviation of the sampling distribution of sample means, indicating the variability of sample means around the population mean.

Signup and view all the flashcards

Population Mean

The average value of a variable in an entire population; symbolized by µ.

Signup and view all the flashcards

Population Variance

A measure of how spread out the values in a population are from the population mean. Symbolized by σ².

Signup and view all the flashcards

Sample Mean

The average value of a variable calculated from a sample of the population; symbolized by X.

Signup and view all the flashcards

Sample Variance

A measure of how spread out the values in a sample are from the sample mean. Symbolized by S².

Signup and view all the flashcards

Normal Distribution

A symmetrical probability distribution described by its mean (µ) and standard deviation (σ),often used to model continuous variables in populations.

Signup and view all the flashcards

z-score

A value that tells you how many standard deviations a data point is from the mean in a normal distribution, calculated as (X-µ)/σ.

Signup and view all the flashcards

Probability Distribution

A function that describes the probabilities of different outcomes for a random variable.

Signup and view all the flashcards

Random Variable

A variable whose value depends on the outcome of a random experiment or phenomenon. It's often symbolized with an uppercase letter like X.

Signup and view all the flashcards

Confidence Interval (CI)

A range of values that likely contains the true value of a population parameter (like the mean), with a specified probability (e.g., 95%).

Signup and view all the flashcards

Standard Error (SE)

The standard deviation of the sampling distribution of a sample statistic (like the mean).

Signup and view all the flashcards

95% Confidence Interval

An interval where 95 out of 100 samples from a population are expected to contain the true mean value.

Signup and view all the flashcards

t-distribution

Used to estimate population parameters for smaller sample sizes, when the population standard deviation is unknown.

Signup and view all the flashcards

One-sample t-test

A statistical method to estimate the 95% confidence interval for a population mean using a t-distribution.

Signup and view all the flashcards

Sample size (n)

The number of observations in a sample, influencing the width of the confidence interval.

Signup and view all the flashcards

Assumptions of t-tests

Requirements for the validity of a t-test, typically normality, independence, and equal variances in respective populations.

Signup and view all the flashcards

Z-value

A critical value for a given confidence level from the standard normal distribution.

Signup and view all the flashcards

Wilcoxon-Mann-Whitney Test

A non-parametric test to determine if two samples come from the same distribution, unlike the parametric t-test which assumes normality.

Signup and view all the flashcards

Null Hypothesis (H0)

The assumption that there's no difference between two population means.

Signup and view all the flashcards

Alternative Hypothesis (H1)

The claim that there is a real difference between two population means.

Signup and view all the flashcards

P-value

The probability of observing results as extreme as, or more extreme than, the ones obtained, IF the null hypothesis is true.

Signup and view all the flashcards

Statistical Significance (p<0.05)

The probability value (p) is less than 0.05, which indicates the observed results are unlikely to occur if the null hypothesis were true, leading to rejection of the null.

Signup and view all the flashcards

Study Notes