NUTR 551 Starting Points in Data Analysis Lecture 3 PDF

Document Details

ExaltedCanyon98

Uploaded by ExaltedCanyon98

School of Human Nutrition

Tags

data analysis statistics normality introduction to data analysis

Summary

This lecture provides an introduction to data analysis, focusing on graphical methodologies like histograms and normal curves and quantile-quantile (Q-Q) plots for assessing data normality and the use of statistical measures like skewness and kurtosis for numerical data analysis.

Full Transcript

NUTR 551 Starting Points in Data Analysis 1 Assignment Posted on myCourses with grading rubric Student groups and topics will be posted next week 2 Assessing normality https://builtin.com/dat...

NUTR 551 Starting Points in Data Analysis 1 Assignment Posted on myCourses with grading rubric Student groups and topics will be posted next week 2 Assessing normality https://builtin.com/data-science/empirical-rule 3 if we divide data in 4: 3 quartiles —> always -1 Quartiles Q1 Median Q3 First Quartile (Q1): divides the lowest 25% of the data from the highest 75% ◦ 25th percentile or Lower Quartile Second Quartile (Q2): divides data in half ◦ 50th percentile or median Third Quartile (Q3): divides the highest 25% of the data from the lowest 75% ◦ 75th percentile or Upper Quartile 4 median not in the middle: not a normal distribution if it was normal, median would be in the middle where the mean would be as well Boxplot Q1 IQR Q3 whiskers whiskers Standardized format of displaying a distribution based on quartiles ◦ First quartile, median, third quartile + whiskers IQR (interquartile range) = Q3 – Q1 ◦ Sometimes referred to as a “middle 50%” ◦ Measure of variability ◦ Median and IQR often reported for variables that are not normally distributed instead of mean and SD Le, Chap T. Introductory Biostatistics Whiskers: Q1-(1,5*IQR) and Q3+(1,5*IQR) 5 Outliers Extreme observations in your variable of interest ◦ What might cause outliers to arise in your data? kcal SPSS identifies outliers according to Tukey’s fences method ◦ Values below Q1 – (1.5*IQR) or above Q3 + (1.5*IQR)  these are marked with a ○ ◦ Values below Q1 – (3*IQR) or above Q3 + (3*IQR)  these are marked with an * broader more conservative way to determine outliers harder to determine that a particular value is an outlier 6 Andrysiaket al. EURASIP Journal on Wireless Communications and Networking. 2016:245 7 8 Visual assessments of normality Histogram with normal curve 9 Quantile- Quantile (Q-Q) Plot Determines if a variable comes from a specified distribution SPSS also outputs detrended normal Q-Q Plot (will see in lab) tells if data follows this distribution Q-Q plot: will compare distribution of data to a standard distribution If perfectly normally distributed, all points would track the line Detrended normal Q-Q plot: straight horizontal line and dots around it shows how far from the expected values it is —> how far are we from expected distribution —> redundant, will use Q-Q plot 10 Stem & Leaf Plot 2030, 2175, 2198, 2270, 2275, 2290, 2340, 2375 Frequency Stem Leaf 1 20. 30 2 21. 75 98 3 22. 70 75 90 2 23. 40 75 Stem = First digit(s) Leaf = last digit(s) Displays the frequency at which certain classes of values appear in the data Can be used to examine distribution of data as well as extreme values 11 % kcal from fat Stem-and-Leaf Plot Frequency Stem & Leaf tells the amount of extremes (difference of box 54.00 Extremes (==55) Stem width: 10.00 Each leaf: 10 case(s) 12 Statistical tests for normality Supplementary to graphical assessments, because sensitive to values in the tails Shapiro-Wilk test data normally distributed —> p>0.05, if we reject it (p 90% of the graph is the yellow part http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/what-is-a-z- score-what-is-a-p-value.htm 26 p value of 0.03 = 3% chance that the results I got are not real and are just lucky —> 3% chance of making mistake —> less than 5% is ok If p value is 0.01 = 1% chance that the result came as a fluke p-value is not the probability of the null hypothesis is true p-value Probability value ◦ Based on the assumption that H0 is true ◦ Gives the probability that results arose simply by chance α (significance level): probability of rejecting the null hypothesis when the null is true ◦ 1%: p < 0.01 ◦ 5%: p < 0.05 ◦ 10%: p < 0.10 p-value is from statistical test p can do a one-tailed test 28 One-tailed vs. two-tailed tests One-tailed test ◦ Testing the possibility of a relationship in one direction only (either µ > µ0 or µ < µ0) ◦ Statistical power is greater to detect an effect Reject H0 if Reject H0 if Z < -1.645 Z > 1.645 http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_HypothesisTest- Means-Proportions/BS704_HypothesisTest-Means-Proportions3.html 29 Two-tailed test ◦ Testing the possibility of a relationship in either direction (µ ≠ µ) Reject H0if Z < -1.960 or if Z > 1.960 30 Most of the times, two-tailed tests Thought experiment Think of a scenario where a researcher could conduct a one-tailed test. Ex: 2 drugs with one cheaper, can do a one-tailed test What about a scenario where they could not? Ex: new drug, needs to do a two-tailed test to see if the new drug is better or worst than the other one 31

Use Quizgecko on...
Browser
Browser