Full Transcript

# Statistical Formulae and Distributions ## Discrete Distributions ### Bernoulli Distribution * $P(X = x) = p^x(1-p)^{1-x}$, $x = 0, 1$ * $E(X) = p$ * $Var(X) = p(1-p)$ ### Binomial Distribution * $P(X = x) = {n \choose x} p^x(1-p)^{n-x}$, $x = 0, 1, 2,..., n$ * $E(X) = np$ * $Var(...

# Statistical Formulae and Distributions ## Discrete Distributions ### Bernoulli Distribution * $P(X = x) = p^x(1-p)^{1-x}$, $x = 0, 1$ * $E(X) = p$ * $Var(X) = p(1-p)$ ### Binomial Distribution * $P(X = x) = {n \choose x} p^x(1-p)^{n-x}$, $x = 0, 1, 2,..., n$ * $E(X) = np$ * $Var(X) = np(1-p)$ ### Poisson Distribution * $P(X = x) = \frac{e^{-\lambda}\lambda^x}{x!}$, $x = 0, 1, 2,...$ * $E(X) = \lambda$ * $Var(X) = \lambda$ ## Continuous Distributions ### Uniform Distribution * $f(x) = \frac{1}{b-a}$, $a \leq x \leq b$ * $E(X) = \frac{a+b}{2}$ * $Var(X) = \frac{(b-a)^2}{12}$ ### Exponential Distribution * $f(x) = \lambda e^{-\lambda x}$, $x \geq 0$ * $E(X) = \frac{1}{\lambda}$ * $Var(X) = \frac{1}{\lambda^2}$ ### Normal Distribution * $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$, $-\infty < x < \infty$ * $E(X) = \mu$ * $Var(X) = \sigma^2$ ## Sampling Distributions ### Sample Mean * $E(\bar{X}) = \mu$ * $Var(\bar{X}) = \frac{\sigma^2}{n}$ * $\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$ ### Sample Proportion * $E(\hat{p}) = p$ * $Var(\hat{p}) = \frac{p(1-p)}{n}$ * $\hat{p} \sim N(p, \frac{p(1-p)}{n})$ ## Confidence Intervals ### Mean (σ Known) * $\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$ ### Mean (σ Unknown) * $\bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}$ ### Proportion * $\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$ ## Hypothesis Testing ### Test Statistic for Mean (σ Known) * $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$ ### Test Statistic for Mean (σ Unknown) * $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$ ### Test Statistic for Proportion * $z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$ ## Chi-Square Tests ### Goodness-of-Fit Test * $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$ * $df = k - 1$ ### Test of Independence * $\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$ * $df = (r - 1)(c - 1)$ ## Regression Analysis ### Simple Linear Regression Model * $y = \beta_0 + \beta_1 x + \epsilon$ ### Least Squares Estimators * $\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}}$ * $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$ ### ANOVA Table (Simple Linear Regression) | Source | DF | Sum of Squares (SS) | Mean Square (MS) | F | | :----------- | :--- | :------------------------------------ | :-------------------------- | :--------- | | Regression | 1 | $SSR = \hat{\beta}_1^2 S_{xx}$ | $MSR = SSR / 1$ | $MSR/MSE$ | | Error | n-2 | $SSE = \sum (y_i - \hat{y}_i)^2$ | $MSE = SSE / (n-2)$ | | | Total | n-1 | $SST = \sum (y_i - \bar{y})^2 = SSR + SSE$ | | | ### Coefficient of Determination * $R^2 = \frac{SSR}{SST}$ ### Standard Error of the Estimate * $s_e = \sqrt{MSE}$ ## Notation * $X$: Random variable * $x$: Observed value of $X$ * $n$: Sample size * $\mu$: Population mean * $\sigma$: Population standard deviation * $\bar{x}$: Sample mean * $s$: Sample standard deviation * $p$: Population proportion * $\hat{p}$: Sample proportion * $\lambda$: Rate parameter (Poisson, Exponential) * $z_{\alpha/2}$: z-score with an area of $\alpha/2$ in the upper tail * $t_{\alpha/2, df}$: t-score with an area of $\alpha/2$ in the upper tail and $df$ degrees of freedom * $O_i$: Observed frequency in category $i$ * $E_i$: Expected frequency in category $i$ * $O_{ij}$: Observed frequency in cell $(i, j)$ * $E_{ij}$: Expected frequency in cell $(i, j)$ * $S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})$ * $S_{xx} = \sum (x_i - \bar{x})^2$ * $SSR$: Regression sum of squares * $SSE$: Error sum of squares * $SST$: Total sum of squares * $MSR$: Regression mean square * $MSE$: Error mean square