Review Weeks 1-4 MD115 PDF

Introduction – Basic concepts and types of data; formulating a plan for analysis MD115 Evi Farazi, PhD Week 1 What is Biostatistics? Statistics: deals with the collection, classification, analysis and interpretation of data. Biostatistics: deals with the collection, classification, analysis and interpretation of biological and biomedical data. Populations and Samples Sample characteristics are known and can be measured (e.g. mean of sample characteristic). Population characteristics are not known but can be inferred from the sample (e.g. estimate of a mean of a population characteristic). To make accurate inferences, appropriate sample selection and accurate measurement of the characteristic of interest are important, otherwise bias occurs. If no bias, data analysis provides meaningful results (i.e. how likely results reflect real differences or are due by chance (random error)). Biostatistics in Medicine Helps generate medical knowledge Help understand natural and experimental observations in science and medicine. Design, manage and analyze clinical trials. Determine how diseases develop, progress and spread. The Steps of Clinical Research Types of variables and Study Design analysis, sample size Collect and enter data Data Collection in database/spreadsheet Data quality checks, Data Processing generate variables, Can use merge datasets statistical software, Data Analysis Descriptive, univariate, e.g. R multivariate analyses statistical environment Generate tables and Output Results figures Reproducible Research o A research study should be reproducible o A code should be used throughout the steps of a research study: o Processing of raw data o Data analysis o Results presentation Data in Tabular (Rectangular) Form Patient ID Age Sex Interventional drug Disease improvement 1 45 Male Yes Yes 2 54 Female No No 3 67 Female Yes No … … … … … o Unit of observation: unit described by the data (e.g. patient) o Observation (Record): rows of the table. A set of values that refer to a particular unit of observation o Variable (Field): columns of the table. A set of values that reflect a particular characteristic (e.g. age) o Primary key: observation ID. A variable that uniquely defines a unit of observation (e.g. patient ID) Types of Variables Categorical (non-numerical) Numerical Nominal: without inherent Continuous: measurement ordering (e.g. sex, on a continuous scale (e.g. occupation) weight) Ordinal: with inherent Discrete: limited number ordering (e.g. educational of discrete values (e.g. level) number of times a child Dichotomous: binary, only got sick during the year) has two possible values (e.g. survive or die) Different types of data require different statistical approaches Types of Variables Can convert between different types of variables Numerical Categorical (non-numerical) Age Age group (e.g. length ( a ) 3 > class ( a ) [ 1 ] "numeric" NA is a special value in R that represents a missing value > a a [ 1 ] 1 2 NA 4 > is. na ( a ) [ 1 ] FALSE FALSE TRUE FALSE Vector Indexing Indexing = selecting elements from a vector or other object Use the indexing operator [ ], in one of four ways 1. Positive integer vector Vector positions of the elements we “keep” 2. Negative integer vector Vector positions of the elements that we exclude 3. Logical vector Of equal length to main vector (or recycled) Keep positions with TRUE, exclude positions with FALSE 4. Character vector The names of our data vector (if defined) Indexing is ubiquitous in R ! Loading Datasets into R (reading them into data.frames) R can import any kind of data (CSV files, other statistical packages, databases, Excel worksheets, etc) Probably the simplest way: from Excel files > install. packages ( "readxl" ) > library ( readxl ) dat

Review Weeks 1-4 MD115 PDF

Document Details

Tags

Related

Summary

Full Transcript