HS 2801 Notes (Week 10) - Descriptive & Comparative Statistics PDF
Document Details
Uploaded by ProlificWormhole4480
Tags
Summary
These notes cover descriptive and comparative statistics, including univariate, bivariate, and multivariable analysis. They also explain different types of variables (nominal, ordinal, quantitative) and measures like mean, median, and mode. The notes are likely part of a larger course, probably in a health or life science field.
Full Transcript
Week 10 Descriptive and Comparative statistics - Biostatistics: is the science of analyzing data and interpreting the results so that they can be applied to solving problems related to biology, health, or related fields - Univariate analysis: describes one variable in a data set...
Week 10 Descriptive and Comparative statistics - Biostatistics: is the science of analyzing data and interpreting the results so that they can be applied to solving problems related to biology, health, or related fields - Univariate analysis: describes one variable in a data set using simple statistics like counts (frequencies), proportions, and averages - Bivariable analysis: uses rate ratios, odds ratios, and other comparative statistical tests to examine the associations between two variables (mostly exposure and outcome) - Multivariable analysis: encompasses statistical tests such as multiple regression models that examine the relationships among three or more variables What is a variable - Any quantity that varies from one entity to another (sometime within an entity over time) - Any attribute, phenomenon, or event that can have different values - To describes characteristic of a person, place, thing, or idea - We measure variables when an experiment is carried out or an observation is made Types of variables - Nominal - No intrinsic or logical order or value - University programs - Countries - Types of fruits, can assign them different numbers but they do not have any other numeric properties - Ordinal - Intrinsic value but with no clear or equal differences between levels - a set of ordered categories - Primary vs. secondary vs. university education - Mild vs. moderate vs. severe pain - Rating scales (assigning numbers) - Legitimate to say: 1 ≠ 2; 5 > 4 > 3 > 2 > 1 - Can not say 4 is not 2 times larger than 2 - (4-3)=(3-2)=(2-1) - Displaying nominal (categorical) or ordinal (ranked) data - Can use pie chart or bar chart - Quantitative - Any positive real number, depends on the nature of the variable can be expressed in decimals - Meaningful numeric scales - Age, blood pressure, # of friends, temperature - Assigned numbers have total mathematical meaning - Continuous - Can take any value - Can be plotted as line - Blood pressure or temperature - Discrete - Can take finite or limited number of values - Can be plotted as lines - Age, number of drinks - Interval vs ratio - Ratio variable - Ratio is meaningful - Zero means absence of characteristic - 0m = no height - Blood pressure, age, income, weight - Interval variable - Difference is meaningful - No natural zero - A value of 0 does not indicate total absence of the characteristic Mean: A sample mean is calculated by adding up all the values for a particular variable and dividing that sum by the total number of individuals with a value for the variable=arithmetic average Median: list number in ascending or descending order, gives data into 2 parts, find the number that's in the middle. If the middle has 2 numbers, you add them and divide by 2. Mode: most frequently occurring value for particular variable in data set Measure of variability (spread, dispersion) - The range for variable between minimum (lowest) and the maximum (highest) - Quartiles mark the three value that divide a data set into 4 equal part - Interquartile range (IQR) = capturs the middle 50% of values for numeric variable - from Q3-Q1 - Q2 is median - Minimum value = Q1 - IQR(1.5) - Maximum value = Q3 + IQR(1.5) Variance - extent of deviation from the average value of that variable in the data set - Calculated by adding together the squares of the differences between each observation and the sample mean (µ) and then dividing by the total number of observation - Standard deviation: square root of variance - Standard error: standard deviation/square root of n Confidence intervals (CI) - Provide information about the expected value of a measure in a source population based on the measured value in a study population - Larger sample will yield a narrower confidence interval - A 95% confidence interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population - mean systolic blood pressure of a sample is 120 mmHg; 95%CI: 110-130 - We are 95% confident that the real average (in the source population) is between 110-130; 5% chance that the true value of mean is either larger than 130 or smaller than 110 Comparative statistics - Comparing main factors between exposed and unexposed in cohort studies - Average age of exposed=Average age of unexposed - % male in exposed=% male in unexposed - Testing if randomization was effective in experimental studies - Comparing the outcome status - We can NOT just look at the calculated values (these are estimates from samples, subject to random sampling error) Inferential Statistics - Techniques that use statistics from a random sample of a population to make evidence based assumptions (inference) about the values of parameters in the population as a whole - Decision about parameters via information obtained from a sample is via hypothesis testing Hypothesis testing - To test an explicit statement or a ‘hypothesis’ about a population parameter - The null hypothesis (H0 ): there is no difference between the two or more values being compared - The alternative hypothesis (Ha ): there is a difference between the two or more populations being compared, there is association between exposure and outcome 1. Take a random sample from the population of interest 2. Set up two competing hypotheses (based on research question) - Null hypothesis; no effect, no difference between sample and original population - Alternate hypothesis: there is an effect, difference in the two populations 3. Use sample statistics (mean, frequency) to decide whether to support or reject the null a. By calculation of a test statistics 4. Determine if null hypothesis is really, trye, what the observed sample statistics will be Idea of probability p.value - Introduced by Fisher to determine whether the observed sample supports the null - Between 0.1 and 0.9: no reason to suspect null is false, no association -