Probability: Sample Distributions

A sample distribution is a probability distribution of a statistic obtained by selecting multiple samples from a population.
The sample distribution is used to make inferences about the population parameter.
Probability concepts:
- Law of Large Numbers (LLN): the average of the results will converge to the population mean as the sample size increases.
- Central Limit Theorem (CLT): the distribution of the sample mean will be approximately normal, even if the population distribution is not normal.

Random sampling is a method of selecting a sample from a population to ensure representativeness.
Types of random sampling:
- Simple Random Sampling: every individual in the population has an equal chance of being selected.
- Stratified Random Sampling: the population is divided into subgroups, and random samples are selected from each subgroup.
- Cluster Random Sampling: the population is divided into clusters, and random samples are selected from each cluster.

A confidence interval is a range of values within which the population parameter is likely to lie.
Confidence level: the probability that the interval contains the population parameter (e.g., 95% confidence level means 95% of the intervals will contain the parameter).
Margin of error: the maximum amount by which the sample statistic may differ from the population parameter.

Bias: a systematic error in the sample statistic that causes it to differ from the population parameter.
Bias correction techniques:
- Bootstrapping: resampling the data with replacement to estimate the bias.
- Jackknifing: systematically removing one observation at a time to estimate the bias.

A statistical test used to determine whether a hypothesis about the population parameter is true or not.
Null hypothesis (H0): a statement of no effect or no difference.
Alternative hypothesis (H1): a statement of an effect or a difference.
Test statistic: a numerical value used to determine whether to reject or fail to reject the null hypothesis.
P-value: the probability of observing the test statistic (or more extreme) assuming the null hypothesis is true.

A percentile is a value below which a certain percentage of the data falls.
Types of percentiles:
- 25th percentile (Q1): the value below which 25% of the data falls.
- 50th percentile (median): the value below which 50% of the data falls.
- 75th percentile (Q3): the value below which 75% of the data falls.

A sample distribution is a probability distribution of a statistic obtained by selecting multiple samples from a population.
It's used to make inferences about the population parameter.
Key probability concepts include:
- Law of Large Numbers (LLN): the average of the results will converge to the population mean as the sample size increases.
- Central Limit Theorem (CLT): the distribution of the sample mean will be approximately normal, even if the population distribution is not normal.

Random sampling is a method of selecting a sample from a population to ensure representativeness.
Types of random sampling include:
- Simple Random Sampling: every individual in the population has an equal chance of being selected.
- Stratified Random Sampling: the population is divided into subgroups, and random samples are selected from each subgroup.
- Cluster Random Sampling: the population is divided into clusters, and random samples are selected from each cluster.

A confidence interval is a range of values within which the population parameter is likely to lie.
Confidence level: the probability that the interval contains the population parameter (e.g., 95% confidence level means 95% of the intervals will contain the parameter).
Margin of error: the maximum amount by which the sample statistic may differ from the population parameter.

Bias: a systematic error in the sample statistic that causes it to differ from the population parameter.
Bias correction techniques include:
- Bootstrapping: resampling the data with replacement to estimate the bias.
- Jackknifing: systematically removing one observation at a time to estimate the bias.

A statistical test used to determine whether a hypothesis about the population parameter is true or not.
Null hypothesis (H0): a statement of no effect or no difference.
Alternative hypothesis (H1): a statement of an effect or a difference.
Test statistic: a numerical value used to determine whether to reject or fail to reject the null hypothesis.
P-value: the probability of observing the test statistic (or more extreme) assuming the null hypothesis is true.

A percentile is a value below which a certain percentage of the data falls.
Types of percentiles include:
- 25th percentile (Q1): the value below which 25% of the data falls.
- 50th percentile (median): the value below which 50% of the data falls.
- 75th percentile (Q3): the value below which 75% of the data falls.