Statistics: Descriptive and Inferential

Descriptive Statistics

Definition: Summarizes and describes the features of a data set.
Key Measures:
- Central Tendency: Indicates the center of a data set.
  - Mean: Average of all data points.
  - Median: Middle value when data points are ordered.
  - Mode: Most frequently occurring value.
- Dispersion: Describes the spread of data.
  - Range: Difference between the maximum and minimum values.
  - Variance: Average squared deviation from the mean.
  - Standard Deviation: Square root of variance, indicating average distance from the mean.
- Shape of Distribution:
  - Skewness: Measure of asymmetry in data distribution.
  - Kurtosis: Measure of the "tailedness" of the distribution.
Visual Representation:
- Histograms: Bar graphs representing frequency distribution.
- Box Plots: Visual representation of the median, quartiles, and outliers.

Inferential Statistics

Definition: Makes predictions or inferences about a population based on a sample.
Key Concepts:
- Population vs. Sample:
  - Population: Entire group of interest.
  - Sample: Subset of the population used for analysis.
- Estimation:
  - Point Estimation: Single value estimate of a population parameter.
  - Interval Estimation: Range of values, usually expressed as a confidence interval.
- Hypothesis Testing:
  - Null Hypothesis (H0): Statement of no effect or no difference.
  - Alternative Hypothesis (H1): Statement indicating the presence of an effect or difference.
  - Type I Error: Rejecting a true null hypothesis.
  - Type II Error: Failing to reject a false null hypothesis.
- P-Value: Probability of obtaining results at least as extreme as the observed results, under the assumption that the null hypothesis is true.
- Confidence Levels: Commonly used levels include 90%, 95%, and 99% which indicate the likelihood that the true parameter lies within the interval.
Techniques:
- T-tests: Compare means between two groups.
- ANOVA: Compare means among three or more groups.
- Regression Analysis: Examines relationships between variables.

Descriptive Statistics

Summarizes and describes features of a data set, allowing for an understanding of its main characteristics.
Central Tendency measures:
- Mean: Average calculated by summing all data points and dividing by the number of points.
- Median: The middle value when data points are arranged in order, dividing the data set into two equal halves.
- Mode: The value that appears most frequently in the data set.
Dispersion measures the spread of the data:
- Range: The difference between the maximum and minimum values, indicating how far apart values are.
- Variance: Average of the squared deviations from the mean, representing data spread.
- Standard Deviation: The square root of variance, illustrating the average distance of data points from the mean.
Shape of Distribution includes:
- Skewness: Quantifies the asymmetry of the data distribution, indicating whether data points are concentrated on one side of the mean.
- Kurtosis: Measures the "tailedness" or sharpness of the distribution, indicating the presence of outliers.
Visual Representations of data distributions:
- Histograms: Bar graphs that display frequency distribution of data, showing how data points are distributed across different intervals.
- Box Plots: Illustrate median, quartiles, and potential outliers, providing a visual summary of the data's distribution.

Inferential Statistics

Enables predictions or inferences about a broader population based on analysis of a sample.
Population vs. Sample distinction:
- Population: The complete group of individuals or items being studied.
- Sample: A smaller subset selected from the population, used for analysis to draw conclusions about the population.
Estimation techniques:
- Point Estimation: A single value estimate of a population parameter, providing a best guess.
- Interval Estimation: A range of values (confidence interval) suggesting where the true population parameter lies with a certain level of confidence.
Hypothesis Testing framework:
- Null Hypothesis (H0): Assumes no effect or difference exists; tested against alternative hypotheses.
- Alternative Hypothesis (H1): Proposes that there is an effect or difference.
- Type I Error: Incorrectly rejecting a true null hypothesis (false positive).
- Type II Error: Failing to reject a false null hypothesis (false negative).
P-Value: The probability of observing results as extreme as the ones obtained, assuming the null hypothesis is true; used to determine statistical significance.
Confidence Levels: Common levels include:
- 90%, 95%, and 99% indicate the likelihood that the population parameter lies within the specified confidence interval.
Statistical Techniques for analysis include:
- T-tests: Used to compare means between two groups, examining their statistical significance.
- ANOVA (Analysis of Variance): Used for comparing means among three or more groups.
- Regression Analysis: Investigates relationships between variables, helping in prediction and modeling.