NUTR 551 Starting Points in Data Analysis Lecture 3 PDF

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Summary

This lecture provides an introduction to data analysis, focusing on graphical methodologies like histograms and normal curves and quantile-quantile (Q-Q) plots for assessing data normality and the use of statistical measures like skewness and kurtosis for numerical data analysis.

Full Transcript

NUTR 551 Starting Points in Data Analysis 1 Assignment Posted on myCourses with grading rubric Student groups and topics will be posted next week 2 Assessing normality https://builtin.com/dat...

NUTR 551 Starting Points in Data Analysis 1 Assignment Posted on myCourses with grading rubric Student groups and topics will be posted next week 2 Assessing normality https://builtin.com/data-science/empirical-rule 3 if we divide data in 4: 3 quartiles —> always -1 Quartiles Q1 Median Q3 First Quartile (Q1): divides the lowest 25% of the data from the highest 75% ◦ 25th percentile or Lower Quartile Second Quartile (Q2): divides data in half ◦ 50th percentile or median Third Quartile (Q3): divides the highest 25% of the data from the lowest 75% ◦ 75th percentile or Upper Quartile 4 median not in the middle: not a normal distribution if it was normal, median would be in the middle where the mean would be as well Boxplot Q1 IQR Q3 whiskers whiskers Standardized format of displaying a distribution based on quartiles ◦ First quartile, median, third quartile + whiskers IQR (interquartile range) = Q3 – Q1 ◦ Sometimes referred to as a “middle 50%” ◦ Measure of variability ◦ Median and IQR often reported for variables that are not normally distributed instead of mean and SD Le, Chap T. Introductory Biostatistics Whiskers: Q1-(1,5*IQR) and Q3+(1,5*IQR) 5 Outliers Extreme observations in your variable of interest ◦ What might cause outliers to arise in your data? kcal SPSS identifies outliers according to Tukey’s fences method ◦ Values below Q1 – (1.5*IQR) or above Q3 + (1.5*IQR)  these are marked with a ○ ◦ Values below Q1 – (3*IQR) or above Q3 + (3*IQR)  these are marked with an * broader more conservative way to determine outliers harder to determine that a particular value is an outlier 6 Andrysiaket al. EURASIP Journal on Wireless Communications and Networking. 2016:245 7 8 Visual assessments of normality Histogram with normal curve 9 Quantile- Quantile (Q-Q) Plot Determines if a variable comes from a specified distribution SPSS also outputs detrended normal Q-Q Plot (will see in lab) tells if data follows this distribution Q-Q plot: will compare distribution of data to a standard distribution If perfectly normally distributed, all points would track the line Detrended normal Q-Q plot: straight horizontal line and dots around it shows how far from the expected values it is —> how far are we from expected distribution —> redundant, will use Q-Q plot 10 Stem & Leaf Plot 2030, 2175, 2198, 2270, 2275, 2290, 2340, 2375 Frequency Stem Leaf 1 20. 30 2 21. 75 98 3 22. 70 75 90 2 23. 40 75 Stem = First digit(s) Leaf = last digit(s) Displays the frequency at which certain classes of values appear in the data Can be used to examine distribution of data as well as extreme values 11 % kcal from fat Stem-and-Leaf Plot Frequency Stem & Leaf tells the amount of extremes (difference of box 54.00 Extremes (==55) Stem width: 10.00 Each leaf: 10 case(s) 12 Statistical tests for normality Supplementary to graphical assessments, because sensitive to values in the tails Shapiro-Wilk test data normally distributed —> p>0.05, if we reject it (p 90% of the graph is the yellow part http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/what-is-a-z- score-what-is-a-p-value.htm 26 p value of 0.03 = 3% chance that the results I got are not real and are just lucky —> 3% chance of making mistake —> less than 5% is ok If p value is 0.01 = 1% chance that the result came as a fluke p-value is not the probability of the null hypothesis is true p-value Probability value ◦ Based on the assumption that H0 is true ◦ Gives the probability that results arose simply by chance α (significance level): probability of rejecting the null hypothesis when the null is true ◦ 1%: p < 0.01 ◦ 5%: p < 0.05 ◦ 10%: p < 0.10 p-value is from statistical test p can do a one-tailed test 28 One-tailed vs. two-tailed tests One-tailed test ◦ Testing the possibility of a relationship in one direction only (either µ > µ0 or µ < µ0) ◦ Statistical power is greater to detect an effect Reject H0 if Reject H0 if Z < -1.645 Z > 1.645 http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_HypothesisTest- Means-Proportions/BS704_HypothesisTest-Means-Proportions3.html 29 Two-tailed test ◦ Testing the possibility of a relationship in either direction (µ ≠ µ) Reject H0if Z < -1.960 or if Z > 1.960 30 Most of the times, two-tailed tests Thought experiment Think of a scenario where a researcher could conduct a one-tailed test. Ex: 2 drugs with one cheaper, can do a one-tailed test What about a scenario where they could not? Ex: new drug, needs to do a two-tailed test to see if the new drug is better or worst than the other one 31

Use Quizgecko on...
Browser
Browser