Biostatistics Midterm Notes PDF
Document Details
Uploaded by CorgiLover
Tags
Summary
These notes cover fundamental concepts in biostatistics, including descriptive and inferential statistics, different types of variables, and ways to visualize data. They provide a summary of key terms and methods.
Full Transcript
Descriptive - used to describe or depict data Inferential - used to generalize results from a sample to a population through hypothesis testing Graph = visually show the data distribution Central Tendency = a summary measure that attempts to describe a whole set of data with a single value that repr...
Descriptive - used to describe or depict data Inferential - used to generalize results from a sample to a population through hypothesis testing Graph = visually show the data distribution Central Tendency = a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution. Dispersion = the extent to which numerical data is likely to vary about an average value Independent = the variable you manipulate or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study. Dependent = the variable that changes as a result of the independent variable manipulation. It’s the outcome you’re interested in measuring, and it “depends” on your independent variable. Discrete = variables that assume only a finite number of values, for example, race categorized as non-Hispanic white, Hispanic, black, Asian, other. Discrete variables may be further subdivided into: Dichotomous = with only two options, e.g., yes or no, lived or died, male or female Normal = similar to ordinal variables except that the responses are unordered. Race/ethnicity is an example of a categorical variable, for which response options might be white, black, Hispanic, American Indian or Alaskan native, Asian or Pacific Islander, or other. Another example of a categorical variable is blood type with response options A, B, AB and O. Ordinal = have ordered values such as self-reported health status with respone options ''Excellent, Very Good, Good, Fair Poor.'' While the response options are ordered, they are not necessarily equally spaced. Continuous = are sometimes called quantitative or measurement variables; they can take on any value within a range of plausible values. For example, total serum cholesterol level, height, weight and systolic blood pressure are examples of continuous variables Interval = variables for which their central characteristic is that they can be measured along a continuum and they have a numerical value (for example, temperature measured in degrees Celsius or Fahrenheit). So the difference between 20°C and 30°C is the same as 30°C to 40°C. However, temperature measured in degrees Celsius or Fahrenheit is NOT a ratio variable. Ratio = are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable. So, temperature measured in degrees Celsius or Fahrenheit is not a ratio variable because 0°C does not mean there is no temperature. However, temperature measured in Kelvin is a ratio variable as 0 Kelvin (often called absolute zero) indicates that there is no temperature whatsoever. Other examples of ratio variables include height, mass, distance and many more. The name "ratio" reflects the fact that you can use the ratio of measurements. So, for example, a distance of ten metres is twice the distance of 5 metres. the representation of numerical data by rectangles (or bars) of equal width and varying height used to summarize discrete or continuous data that are measured on an interval scale Minimum: The minimum value in the given dataset First Quartile (Q1): The first quartile is the median of the lower half of the data set. Median: The median is the middle value of the dataset, which divides the given dataset into two equal parts. The median is considered as the second quartile. Third Quartile (Q3): The third quartile is the median of the upper half of the data. Maximum: The maximum value in the given dataset. Interquartile Range (IQR): The difference between the third quartile and first quartile is known as the interquartile range. (i.e.) IQR = Q3-Q1 Outlier: The data that falls on the far left or right side of the ordered data is tested to be the outliers. Generally, the outliers fall more than the specified distance from the first and third quartile. Positively Skewed: If the distance from the median to the maximum is greater than the distance from the median to the minimum, then the box plot is positively skewed. Negatively Skewed: If the distance from the median to minimum is greater than the distance from the median to the maximum, then the box plot is negatively skewed. Symmetric: The box plot is said to be symmetric if the median is equidistant from the maximum and minimum values a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution method to summarize a set of data that is measured using an interval scale simply a “t-chart” or twocolumn table which outlines the various possible outcomes and the associated frequencies observed in a sample the value that is repeatedly occurring in a given set the average is the sum of a collection of numbers divided by the count of numbers in the collection Standard deviation is the spread of a group of numbers from the mean. The variance measures the average degree to which each point differs from the mean. While standard deviation is the square root of the variance, variance is the average of all data points within a group. the difference between the highest and lowest values the difference between the upper and lower quartile values in a set of data; it is commonly referred to as IQR and is used as a measure of spread and variability in a data set in a normal curve, 68% of scores will fall between 1 below the mean & 1 above the mean in a normal curve, 95% of scores will fall between 2 below the mean & 2 above the mean in a normal curve, 99.7% of scores will fall between 3 below the mean & 3 above the mean Used to compare the mean of a sample with the population mean Z = (x - x)/SD example: a student scores a 95 on quiz. Average score was 80 & standard deviation was 5. Find z score...(95-80)/5 = 15/5 = 3 statements that predict relationship between variables testable & translates research into question null hypothesis — there's NO relationship between variables research hypothesis — there is a relationship between variables directional — more specific (ex. subjects reporting lower levels of pain will heal faster) non-directional — less specific (ex. we think pain will affect healing time but we aren't sure how) criteria used to determine statistical significance determined BEFORE collecting data % of time researcher will conclude that there is a statistically significant difference between groups or relationship between variables when there truly isn't (type 1 error) most common for this class unless otherwise told we will say significance = 0.05 TYPE 1 ERROR probability of concluding there's A difference between groups or relationship between variables when there truly isn't probability of incorrectly rejecting the null hypothesis TYPE 2 ERROR probability of concluding that there's no difference between groups or no relationship between variables when there truly is probability of accepting (failure to reject) null when it isn't true depends on number of variables, level of measurement of variables, assumptions of statistical test p value = probability — relative likelihood that a certain event will or will not occur relative to some other event ex) if p value = 0.02 it would mean that 2% of the time out of 100x would you see a difference between groups this extreme just by chance compare calculated p value to the level of significance if calculated p value is LESS than level of significance = REJECT null hypothesis if calculated p value is GREATER than level of significance = FAIL TO REJECT (aka accept) null hypothesis REJECT the null = there IS a difference between groups or a relationship between the variables FAIL TO REJECT null = there is NOT a difference between groups or a relationship between the variables A researcher is studying differences in statistics of test scores between male and female students. null hypothesis — there will be no difference between test scores of male students and female students research hypothesis — there will be a difference between test scores of male students and female students level of significance = 0.05 p = 0.01 Type of research hypothesis equals nondirectional because it does not specifically state the difference in the scores between the genders 0.01 is less than 0.05 which means that we will reject the null hypothesis because there is a statistically significant difference between test scores of male and female students A researcher is studying differences in statistics of test scores between male and female students. null hypothesis — there will be no difference between test scores of male students and female students research hypothesis — there will be a difference between test scores of male students and female students level of significance = 0.05 p = 0.08 Type of research hypothesis equals nondirectional because it does not specifically state the difference in the scores between the genders 0.08 is greater than 0.05 which means that we will fail to reject the null hypothesis because there is not a statistically significant difference between test scores of male and female students compare means of 2 independent groups of subjects to use this test variables have to have: independence — two unrelated groups normality — normally distributed homogeneity of variance — is the variance of the two groups similar? dependent variable level of measurement: internal/ratio — need to be able to calculate mean EXAMPLE: Researcher hypothesized that drinking orange juice daily would increase memory. Compared mean score on a memory test between 2 groups: one who never drank OJ & one who drank OJ. Null Hypothesis: there will be no difference in memory score Research Hypothesis: there will be a difference in memory score Level of Significance: 0.05 Test: t Test for independent groups Results: p value = 0.037 0.037 < 0.05 REJECT the null How to write results: On average, memory scores of orange juice drinkers (M =11.42, SD = 2.07) was significantly higher than those who didn't drink orange juice (M = 9.5, SD = 2.15), t(22) = 2.225, p = 0.037 Compares mean of 2 paired groups of subjects Also called t test for dependent groups To use test variables must have: normality dependent variable level of measurement: internal/ratio EXAMPLE: Researcher hypothesized that attending preschool will result in a change in IQ compared to staying at home before kindergarten. He enrolled 12 sets of identical twins & 1 twin attended preschool the year before kindergarten & the other stayed home. Later he measured their IQs. Null Hypothesis: there will be no difference in IQ between preschool & home groups Research Hypothesis: there will be a difference Level of Significance: 0.05 Test: paired t test RESULTS p value = 0.59 FAIL TO REJECT the null How to write results: on average, the IQ of kids who attend preschool the year before kindergarten (M = 103, SD = 4.17) was not significantly different than the IQ of kids who stayed home (M = 104.9, SD = 3.95), t(11) = 2.110, p = 0.059 analysis of variance compare means of 3 or more groups of subjects to use test variables must have: independence normality homogeneity of variances dependent variable level of measurement: interval/ratio research hypothesis is that at least one group is different post hoc testing completed is a significance is found — only done if you reject null different degrees of freedom: df (between) = k-1 df ( within) = nk -k or N-k df (total) = nk-1 or N-1 EXAMPLE: Researcher wanted to study whether there would be a difference in knowledge recalled between patients who received printed discharge instructions only, verbal discharge instructions only, or a combination. Null Hypothesis: there will be no difference. Research Hypothesis: at least 1 group will have a different knowledge recall. Level of Significance: 0.05 Test: ANOVA RESULTS p value =