Statistics in a Nutshell PDF
Document Details
Uploaded by FriendlyUnakite
UCAM Universidad Católica de Murcia
Fernando Cánovas
Tags
Summary
This document is a lecture covering biostatistics, epidemiology, and public health, focusing on statistical inference. It explores the normal distribution, central limit theorem, hypothesis contrast, and confidence intervals.
Full Transcript
Statistics in a nutshell Biostatistics, Epidemiology and Public Health Fernando Cánovas Bachelor’s Degree in Dentistry TOC The normal distribution Repetitive sampling Central Theorem of the Limit Hypothesis contrast P-value Confidence intervals 2 Variables Standardization Transformation of mul...
Statistics in a nutshell Biostatistics, Epidemiology and Public Health Fernando Cánovas Bachelor’s Degree in Dentistry TOC The normal distribution Repetitive sampling Central Theorem of the Limit Hypothesis contrast P-value Confidence intervals 2 Variables Standardization Transformation of multivariate quantities to highlight or reduce dimension Z= (x−µ) σ 2 Variables The normal distribution 2 The normal distribution Standard normal 2 The normal distribution Generalities Gaussian, Gauss, Laplace–Gauss or bell curve often used in natural and social sciences mean, median and mode match symmetrical about the mean formulation: f (x) = √1 σ 2π 1 x−µ 2 ) σ e− 2 ( follow: 2 The normal distribution Understanding quantities weigth of a population of tigers (µ = 500, σ = 100) IQ of spanish population (µ = 160, σ = 10) hardness of lunar rocks (µ = 20, σ = 2) for Z = 0.5, P will be 0.31 and 31% of tigers will weigth more than 550 kg of spanish will score higher than 165 of rocks will have a hardness higher than 21 2 The normal distribution Understanding quantities weigth of a population of tigers (µ = 500, σ = 100) IQ of spanish population (µ = 160, σ = 10) hardness of lunar rocks (µ = 20, σ = 2) for Z = 3, P will be ? and ?% of tigers will weigth more than ? kg of spanish will score higher than ? of rocks will have a hardness higher than ? 2 The normal distribution Understanding quantities weigth of a population of tigers (µ = 500, σ = 100) IQ of spanish population (µ = 160, σ = 10) hardness of lunar rocks (µ = 20, σ = 2) for Z = 3, P will be 0.0013 and 0.13% (13 of 10,000) of tigers will weigth more than 800 kg of spanish will score higher than 190 of rocks will have a hardness higher than 26 2 Statistics Inference P OPULATION (parameters: µ, π, σ 2 ) ⇑ ⇑ ⇑ ⇑ ⇑ S AMPLE (statistics: M, p, S2 ) 2 Statistics Inference P OPULATION (parameters: µ, π, σ 2 ) ⇑ ⇑ hypothesis contrast statistical inference ⇑ ⇑ confidence interval ⇑ S AMPLE (statistics: M, p, S2 ) 2 Statistics Notes on sampling Statistical inference is based in distributions, where repetitive samples of the same size are compared sampling with replacement without replacement 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference 2 Statistics Inference difference between distributions (population vs. samples) sample size n = 4 mean µ = 25.5 2 Statistics Inference difference between distributions (population vs. samples) sample size n = 4 mean µ = 25.5 2 Statistics Inference difference between distributions (population vs. samples) sample size n = 4 mean µ = 25.5 2 Statistics Inference difference between distributions (population vs. samples) sample size n = 4 S = SE mean µ = 25.5 2 Statistics Inference difference between distributions (population vs. samples) sample size n = 100 S = SE mean µ = 25.5 2 Inference Central Limit Theorem Central Limit Theorem when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Central Limit Theorem difference between distributions (population vs. samples) sample size n = 30 mean µ = 25.5 2 Inference Hypothesis statement unproved theory that is formulated as a starting point for an investigation 2 Inference Hypothesis statement unproved theory that is formulated as a starting point for an investigation example patients who take treatment A heal earlier than non-treated patients or treatment A is effective 2 Inference Sceptical actitude H0 null hypothesis example treatment A is NOT effective 2 Inference Hypothesis statement healing time without treatment is 30 days is treatment A effective? ⇒ will the patient heal earlier? 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast P-value (probability value or asymptotic significance) is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary would be greater or equal to the actual observed results 2 Inference Hypothesis contrast are data compatible with hypothesis? 2 Inference Hypothesis contrast hypothesis hidden animals (A) are tigers A has ’wings’ ⇒ rejection A has ’4 legs’ ⇒ not rejection data incompatible data compatible state that is not a tiger not state that is not a tiger probably a tiger A weigths ’900 kg’ ⇒ rejection A weigths ’290 kg’ ⇒ not rejection data incompatible data compatible state that is not a tiger not state that is not a tiger probably a tiger 2 Hypothesis testing Example 1 treatment + exercises A µ = 25.5 and σ = 9days treatment + exercises B M = 26.7 H0 = (M > 26.7|µ = 25.5) exercises B increase time to symptomatology means are not equal ⇓ ⇓ simulate a population with µ = 25.5 and σ = 9 ⇓ sampling n = 36, 1,000,000 iterations ⇓ 2 Hypothesis testing The transposed conditional fallacy Pr(observation|hypothesis) ̸= Pr(hypothesis|observation) the probability of observing a results given that some hypothesis is true is not equivalent to the probability that a hypothesis is true given that some result has been observed 2 Hypothesis testing Example 1 ⇓ M = 25.5 and SE = √9 36 = 1.5 2 Hypothesis testing Example 1 ⇓ M = 25.5 and SE = √9 36 = 1.5 P − value = P(M > 26.7|µ = 25.5) = P(Z > 0.8) = 0.212 2 Hypothesis testing Example 1 ⇓ M = 25.5 and SE = √9 36 = 1.5 P − value = P(M > 26.7|µ = 25.5) = P(Z > 0.8) = 0.212 igh! too h data are compatible with hypothesis ⇓ ⇓ means are equal or time to symptomatology is the same 2 Hypothesis testing Example 2 treatment + exercises A µ = 25.5 and σ = 9days treatment + exercises B M = 28.7 H0 = (M > 28.7|µ = 25.5) exercises B increase time to symptomatology means are not equal ⇓ ⇓ simulate a population with µ = 25.5 and σ = 9 ⇓ sampling n = 36, 1,000,000 iterations ⇓ 2 Hypothesis testing Example 2 ⇓ M = 25.5 and SE = √9 36 = 1.5 2 Hypothesis testing Example 2 ⇓ M = 25.5 and SE = √9 36 = 1.5 P − value = P(M > 28.7|µ = 25.5) = P(Z > 2.13) = 0.212 2 Hypothesis testing Example 2 ⇓ M = 25.5 and SE = √9 36 = 1.5 P − value = P(M > 28.7|µ = 25.5) = P(Z > 2.13) = 0.02 low! very data are incompatible with hypothesis ⇓ ⇓ means are not equal, moderate evidence that time is higher 2 Hypothesis testing Example 3 treatment + exercises A µ = 25.5 and σ = 3days treatment + exercises B M = 27.5 H0 = (M > 27.5|µ = 25.5) exercises B increase time to symptomatology means are not equal ⇓ ⇓ simulate a population with µ = 25.5 and σ = 3 ⇓ sampling n = 36, 1,000,000 iterations ⇓ 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √3 36 = 0.5 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √3 36 = 0.5 P − value = P(M > 27.5|µ = 25.5) = P(Z > 4) = 0.00003 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √3 36 = 0.5 P − value = P(M > 27.5|µ = 25.5) = P(Z > 4) = 0.00003 ! extremely low ⇓ data are incompatible with hypothesis ⇓ means are not equal, strong evidence that time is higher 2 Hypothesis testing Stablishing conclusions Statement performed an experiment to know if there was an effect, and according to the results, was there an effect? as lower P, stronger the evidence provide a P value, better than stablish a threshold (significance) 2 Confidence intervals Around the statistic 95%CI = statistics ± 2 ∗ SE 2 Confidence intervals Around the statistic 95%CI = statistics ± 2 ∗ SE SE = √S n 2 Confidence intervals Around the mean How many of those M ± 2 ∗ SE intervals will contain the true value of the populational mean µ? 2 Confidence intervals Around the mean How many of those M ± 2 ∗ SE intervals will contain the true value of the populational mean µ? 95% (5% of sampled means will be more than 2 ∗ θ) 2 Confidence intervals Around the mean How many of those M ± 2 ∗ θ intervals will not contain the true value of the populational mean µ? 2 Confidence intervals Around the mean How many of those M ± 2 ∗ θ intervals will not contain the true value of the populational mean µ? 2 Confidence intervals Around the mean How many of those M ± 2 ∗ θ intervals will not contain the true value of the populational mean µ? 2 Confidence intervals Around the mean How many of those M ± 2 ∗ θ intervals will not contain the true value of the populational mean µ? 2 Confidence intervals Around the mean How many of those M ± 2 ∗ θ intervals will not contain the true value of the populational mean µ? 5% of sampled; 1 in 20 2 Hypothesis testing Example 3 treatment + exercises A µ = 25.5 and σ = 3days treatment + exercises B M = 27.5 H0 = (M > 27.5|µ = 25.5) exercises B increase recovery time means are not equal ⇓ ⇓ simulate a population with µ = 25.5 and σ = 3 ⇓ sampling n = 36, 1,000,000 iterations ⇓ 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √3 36 = 0.5 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √3 36 = 0.5 P − value = P(M > 27.5|µ = 25.5) = P(Z > 4) = 0.00003 2 Hypothesis testing Example 3 ⇓ M = 25.5 and SE = √336 = 0.5 P − value = P(M > 27.5|µ = 25.5) = P(Z > 4) = 0.00003 ! extremely low ⇓ data are incompatible with hypothesis ⇓ means are not equal, strong evidence that time is higher ⇓ 95%CI = M ± 2 ∗ SE = 27.5 ± 2 ∗ 0.5) = [26.5, 28.5] µ = 25.5 2 Bibliography Detels R., Gulliford M., Abdool Karim Q., and Tan C. (2017). Oxford textbook of global public health (Sixth). Oxford University Press. DOI: 10.1093/med/9780199661756.001.0001 2 UCAM Universidad Católica de Murcia © UCAM