CHS 729 Module 1 Review - Inferential Statistics - Lecture Notes

Summary

This document is a review of Module 1, focusing on statistical concepts such as inferential statistics and the importance of probability in understanding data. It highlights the normal distribution and the assumptions that should be considered when analyzing data. Also includes key concepts of the Frequentist and Bayesian approaches to probability.

Full Transcript

Foundations -- Inference & Theory ================================= Why Do We Use Statistics? ------------------------- ### Human Bias & Common-Sense Fallibility - People struggle to evaluate evidence impartially due to **belief bias**---we tend to accept arguments that align with our prior...

Foundations -- Inference & Theory ================================= Why Do We Use Statistics? ------------------------- ### Human Bias & Common-Sense Fallibility - People struggle to evaluate evidence impartially due to **belief bias**---we tend to accept arguments that align with our prior beliefs, even if they are **logically invalid**. - Statistics acts as a **safeguard** against these biases by providing objective, quantitative methods to test claims. ### The Limits of Common Sense - Our **intuitive reasoning** is shaped by language, culture, and evolutionary pressures---not by the need to conduct rigorous scientific analyses. - Without statistics, we risk **confirmation bias**, seeing patterns that aren\'t real, and making decisions based on flawed logic. ### Cautionary Tale: Simpson's Paradox - Aggregated statistics can **mislead** if we don't account for underlying factors. - **1973 UC Berkeley admissions case** - **Lesson**: Always question how data are grouped---statistics alone can't tell the full story. Inferential Statistics: Estimating the Unknown ---------------------------------------------- ### Why Inferential Statistics? - We use samples because **recruiting an entire population is impractical** (e.g., studying smoking behaviors across an entire country is impossible, so we rely on a sample). - **The goal**: Determine whether observed effects (signals) in our sample reflect **true population patterns** or are simply **random fluctuations (noise)**. ### The Logic of Falsification in Statistics - Three-Step Process of Statistical Inference: 1. **Assume no effect exists (null hypothesis: H₀).** 2. **Measure the observed signal and noise in the sample** (e.g., mean, difference, or association). 3. **Assess how probable the data are under H₀.** - If the probability is **very low**, we may reject H₀, suggesting a real effect exists. - **Coin Flip Example**: 4. If a fair coin is flipped 100 times and lands on **heads 55 times**, is it truly biased? 5. We assume **H₀: the coin is fair** and calculate how **probable** it is to get 55 heads **by random chance**. 6. If the probability is low, we might **reject the null hypothesis** and suspect bias. ### The Role of Probability in Inferential Statistics - **Probability vs. Statistics**: - **Probability**: Starts with a known model (e.g., fair coin) and predicts event likelihoods. - **Statistics**: Starts with data and **infers** the model that likely produced them. - **Key Probability Concepts**: - **Frequentist View**: Probability is the **long-run frequency** of an event occurring (e.g., flipping a fair coin infinitely should yield 50% heads). - **Bayesian View**: Probability represents a **degree of belief**, which can be updated with new evidence. ### The Normal Distribution - Many natural variables follow a **normal distribution**. - Properties: - **Mean (μ)**: The central tendency. - **Standard Deviation (σ)**: The spread of data. - **Empirical Rule**: - \~68% of values fall within **1σ of the mean**. - \~95% within **2σ**. - \~99.7% within **3σ**. - The normal distribution is foundational for **hypothesis testing** and many statistical methods. ### Understanding Area Under the Curve (AUC) - Probability as Area: If data follow a normal distribution, we can calculate the probability of an event occurring by **measuring the area under the curve**. - Example: - If someone is **6'6" tall**, how rare is that? - **Using the normal distribution**, we can determine how often people exceed that height. ### Statistical Assumptions and Their Pitfalls - **Randomness & Representativeness** - A study's conclusions are only as good as the **sampling method**. - Example: If a poll only surveys **people who answer landlines**, it may **overestimate** support for older demographics. - **The Role of Distributions** - We assume certain distributions (e.g., normal, binomial) to make statistical inferences. - **Not all data fit a normal distribution**---checking assumptions is **critical**. - **The Importance of Context** - Statistics provide tools, but **interpretation requires critical thinking**. - Example: A **low p-value** does **not** always mean a result is meaningful---it could be due to **sample size effects** or **violations of assumptions**. Key Takeaways ------------- - **Inferential statistics** allow us to estimate population patterns from samples, accounting for uncertainty. - **Probability** underpins statistical inference, helping us assess how likely results are under different assumptions. - **Frequentist and Bayesian** approaches offer different ways of thinking about probability. - **Data must be interpreted carefully**---statistical tools help, but human reasoning and context are essential.

Use Quizgecko on...
Browser
Browser