BMS 511 Biostatistics & Statistical Analysis PDF

Document Details

.keeks.

Uploaded by .keeks.

Marian University

2018

Guang Xu

Tags

biostatistics statistical analysis scatterplots correlation

Summary

This document contains lecture notes on biostatistics and statistical analysis, focusing on relationships, scatterplots, and correlation. The material covers concepts like bivariate data, interpreting scatterplots, and types of relationships. The content also includes examples and figures.

Full Transcript

BMS 511 Biostatistics & Statistical Analysis Chapter 3 Relationships: Scatterplots and correlation Guang Xu, PhD, MPH Assistant Professor of Biostatistics and Public Health College of Osteopathic Medicine Marian Unive...

BMS 511 Biostatistics & Statistical Analysis Chapter 3 Relationships: Scatterplots and correlation Guang Xu, PhD, MPH Assistant Professor of Biostatistics and Public Health College of Osteopathic Medicine Marian University Previous Learning Objectives Describing distributions with numbers Measures of center: mean and median Measures of spread: quartiles and standard deviation The five-number summary and boxplots IQR and outliers Dealing with outliers Choosing among summary statistics Organizing a statistical problem Copyright © 2018 W. H. Freeman and Company Learning Objectives Demonstrate Relationships: Scatterplots and correlation Bivariate data Scatterplots Interpreting scatterplots Adding categorical variables to scatterplots The correlation coefficient r Facts about correlation Copyright © 2018 W. H. Freeman and Company Bivariate data (1 of 2) For each individual studied, we record data on two variables. We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables? Here we have two quantitative variables recorded for each of 16 students: 1. how many beers they drank 2. their resulting blood alcohol content (BAC) Copyright © 2018 W. H. Freeman and Company Bivariate data (2 of 2) Student ID Number of Beers Blood Alcohol Content 1 5 0.1 2 2 0.03 3 9 0.19 6 7 0.095 7 3 0.07 9 3 0.02 11 4 0.07 13 5 0.085 4 8 0.12 5 3 0.04 8 5 0.06 10 5 0.05 12 6 0.1 14 7 0.09 15 1 0.01 16 4 0.05 Copyright © 2018 W. H. Freeman and Company Scatterplots A scatterplot is used to display quantitative bivariate data. Each variable makes up one axis. Each individual is a point on the graph. Copyright © 2018 W. H. Freeman and Company Explanatory and response variables A response (dependent) variable measures an outcome of a study. An explanatory (independent) variable may explain or influence changes in a response variable When there is an obvious explanatory variable, it is plotted on the x (horizontal) axis of the scatterplot. Copyright © 2018 W. H. Freeman and Company Scaling a scatterplot The same data is displayed in all four plots; the range of the scales is the only difference in the plots. Both variables should be given a similar amount of space: Plot is roughly square. Points should occupy all the plot space (no blank space). Copyright © 2018 W. H. Freeman and Company Interpreting scatterplots After plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for... – Form: linear, curved, clusters, no pattern – Direction: positive, negative, no direction – Strength: how closely the points fit the “form”... and clear deviations from that pattern – Outliers of the relationship Copyright © 2018 W. H. Freeman and Company Types of relationships (1 of 6) Copyright © 2018 W. H. Freeman and Company Types of relationships (2 of 6) Weak or no relationship Copyright © 2018 W. H. Freeman and Company Types of relationships (3 of 6) The form of the relationship between two quantitative variables refers to the overall pattern. Copyright © 2018 W. H. Freeman and Company Types of relationships (4 of 6) Positive association: High values of one variable tend to occur together with high values of the other variable. Copyright © 2018 W. H. Freeman and Company Types of relationships (5 of 6) Negative association: High values of one variable tend to occur together with low values of the other variable. Copyright © 2018 W. H. Freeman and Company Types of relationships (6 of 6) The strength of the relationship between two quantitative variables refers to how much variation, or scatter, there is around the main form. Copyright © 2018 W. H. Freeman and Company Outliers in scatterplots An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship. Copyright © 2018 W. H. Freeman and Company Adding categorical variables to scatterplots (1 of 2) Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph. The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size. Copyright © 2018 W. H. Freeman and Company Adding categorical variables to scatterplots (2 of 2) Copyright © 2018 W. H. Freeman and Company Example—adding categorical variables Energy expended as a function of running speed for various treadmill inclines If we ignored the categorical variable “Incline,” the scatterplot shows little to no association. However, for each value of “Incline,” we see a strong, positive association, and it is stronger for the steeper inclines. Copyright © 2018 W. H. Freeman and Company The correlation coefficient: r (1 of 2) The correlation coefficient is a measure of the direction and strength of a relationship. It is calculated using the mean and the standard deviation of both the x and y variables. Time to swim: Pulse rate: Copyright © 2018 W. H. Freeman and Company The correlation coefficient: r (2 of 2) Copyright © 2018 W. H. Freeman and Company The roles of the variables in r ( )( 𝑛 1 𝑥𝑖 − 𝑥 𝑦𝑖 − r treats x and y symmetrically 𝑟= ∑ 𝑛 − 1 𝑖 =1 𝑠𝑥 𝑠 “Time to swim” is the explanatory variable here and belongs on the x axis. However, in either plot r is the same (r = −0.75). Copyright © 2018 W. H. Freeman and Company r has no units (1 of 2) Copyright © 2018 W. H. Freeman and Company r has no units (2 of 2) Note the two scatterplots yield the same correlation, even though the left plot is measured in minutes while the right plot is measured in hours. Copyright © 2018 W. H. Freeman and Company –1 < r < +1 Strength is indicated by the absolute value of r Direction is indicated by the sign of r (+ or –) Copyright © 2018 W. H. Freeman and Company r is not resistant (1 of 2) Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers. Moving just one point away from the linear pattern here weakens the correlation from −0.91 to −0.75 (closer to zero). Copyright © 2018 W. H. Freeman and Company r is not resistant (2 of 2) Copyright © 2018 W. H. Freeman and Company Software: SPSS Software: SPSS Student Discount: https://www.ibm.com/products/spss-statistics-gradp ack/details#product-header-top Regular: https://www.ibm.com/products/spss-statistics/pricin g#section-heading-7 Variance Another measure of dispersion Mean sum of 2   i ( X   ) 2 squares N Population variance Sample variance 2 s   i ( X  X ) 2 Standard deviation - n 1 Positive square root of the variance Application of SPSS Application of SPSS Group 1 Group 2 Group 3 Group 4 X1 Y1 X2 Y2 X3 Y3 X4 Y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Application of SPSS Group 1 Group 2 Group 3 Group 4 X1 Y1 X2 Y2 X3 Y3 X4 Y4 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Scatterplot Useful summary of a set of bivariate data (two variables) Gives a good visual picture of the relationship between two variables Pattern indicates the type and strength of the relationship between two variables Can show a non-linear relationship between two variables Can show outliers in the data Aids in interpretation of the correlation coefficient or regression model Scatterplot 2 1 test 0 -1 -2 0 50 100 150 200 Scatterplot Scatterplot Scatterplot Scatterplot Scatterplot Scatterplot Scatterplot Scatterplot Scatterplot Learning Objectives Demonstrate Relationships: Scatterplots and correlation Bivariate data Scatterplots Interpreting scatterplots Adding categorical variables to scatterplots The correlation coefficient r Facts about correlation Copyright © 2018 W. H. Freeman and Company

Use Quizgecko on...
Browser
Browser