Module 0 PDF
Document Details
Tags
Summary
This document provides an introduction to statistics, focusing on concepts like descriptive and inferential statistics, data analysis, computational statistics, and aspects of data handling. It discusses various statistical techniques and their application in different scenarios.
Full Transcript
previous page next page A couple of quotes BACK FORWARD 1 previous page next page Preface statistics (descriptive or explora...
previous page next page A couple of quotes BACK FORWARD 1 previous page next page Preface statistics (descriptive or exploratory vs inferential) from a user (engineer or scientist) point of view – role of statistics in engineering/mechanics/science – ‘big’ data vs ‘small’ data: data analytics ∗ machine (statistical) learning (AI) vs traditional inference aims of data analysis – prediction vs understanding – correlation vs causation – statistical significance vs practical significance; role of theory – data analysis for design – Occam’s razor: simplest (parsimonious) model for the purpose survey of statistical techniques and tests – judicious use and interpretation of results, understanding limitations – underlying/recurring concepts prerequisites? undergraduate statistics, calculus, matrix algebra, programming BACK FORWARD 2 previous page next page Computational statistics software environments such as R facilitates traditional (and non- traditional) analyses – understanding physical relationships as manifested in data, data visualization computer simulations – testing assumptions of statistical models statistical techniques made practical by computers – maximum likelihood, bootstrapping, Bayesian approaches – machine (statistical) learning (artificial intelligence) – questioning of traditional approaches BACK FORWARD 3 previous page next page Preliminaries problem statement(s): what can be learned (or inferred) from a set of (sampled) data? what is the uncertainty in quantitative estimates? Is there sufficiently strong empirical support for a given hypothesis? statistics as means of making sense of random phenomena: what is randomness? – unpredictability – modeling physical phenomena and relationships: deterministic and random elements y = f (x; β) + where y is a scalar (single) dependent variable (also termed response variable), x is a vector of (multiple) independent variables, β a vector of model parameters, and is a random variable ∗ f (x; β) is ‘deterministic’ model based on physics/mechanics (maybe), - statistical model; x usually considered perfectly known, β ‘true’ parameters to be estimated (fitted) to data ∗ statistical model often makes a distribution assumption (parametric model), but could be distribution-free (non-parametric) BACK FORWARD 4 previous page next page Sample and Population population: all possible realizations of an experiment – statistical homogeneity (or stationarity)? – population parameters (characteristics): not random (at least conceptually), but usually unknown or even unknowable what does given data represent? – sample: subset of (usually much smaller than but possibly identical to) a population ∗ representative sample: relevant statistics of sample statistically similar to population characteristics ∗ (simple) random sample: each point in population has an equal chance of being sampled ∗ implications of samples that are not necessarily representative nor random – bias – which population is the ‘target’ of sample data? ∗ effects of scale (or possibly other external effects) ∗ dimensionless (or normalized) characteristics to minimize effects of scale and to broaden population BACK FORWARD 5 previous page next page Descriptive and Inferential Statistics Descriptive statistics (also exploratory data analysis) – focus on sample, no (immediate) concern for relationship to any population – appropriate when understanding of data is low – possible preliminary to inferential statistics inferential statistics – going beyond the sample: relationship(s) between sample statistics and population parameters – some understanding of data available; meaningful formulation and testing of hypotheses ∗ plausible physical and statistical model assumptions can be made about data – random aspects =⇒ relationships between sample and population characteristics must be framed in probabilistic terms BACK FORWARD 6 previous page next page Review: Basic probability concepts and terminology I random variable, X, that takes on a value, x, (or between values, x − (dx/2) and x + (dx/2)) in a random manner – different types: numerical (discrete, continuous), categorical (ordered and non-ordered) – function of random variables must be random variable =⇒ sample statistic (a function of random variables) is a random variable probability statements about (continuous) X: dx dx P x− 1, central moments about µx: E[(X − µx) ] = (X − µx)m f (x)dx – variance, σx2 = E[(X − µx)2] (σx = standard deviation is a possibly dimensional scale or range parameter, σx/µx = coefficient of variation) – normalized third central moment, gSk = E[(X − µx)3]/σx3 ; for distributions symmetric about µx like the normal gSk = 0 – normalized fourth central moment, gKu = E[(X − µx)4]/σx4 ; for normal distribution gKu = 3, so excess kurtosis = gKu − 3 > 0 implies greater (than normal) importance of distribution tails (normalized) higher-order (> 2) moments give more information about and are more influenced by distribution tails, but sample estimates are correspondingly more uncertain standard distributions defined by limited number (usually