MED106_1c Types of Frequency Distributions PDF
Document Details
![AppreciableDouglasFir](https://quizgecko.com/images/avatars/avatar-11.webp)
Uploaded by AppreciableDouglasFir
University of Nicosia Medical School
Avgis Hadjipapas
Tags
Summary
This presentation introduces frequency distributions, focusing on normal distributions. It explains how skewness and outliers can affect measures of central tendency and dispersion. The document also includes graphical representations (histogram, boxplots).
Full Transcript
Introduction to measurement II: frequency distributions and the normal distribution Avgis Hadjipapas Professor in Neuroscience and Research Methods [email protected] Session LOBs LOB4: Outline the normal distribution and it...
Introduction to measurement II: frequency distributions and the normal distribution Avgis Hadjipapas Professor in Neuroscience and Research Methods [email protected] Session LOBs LOB4: Outline the normal distribution and its statistical qualities and calculate probabilities based on these. LOB5: Recognise deviations from normality in a variable distribution and outline skewness. LOB6: Describe how skewness and outliers affect measures of central tendency and dispersion and decode which summary statistics are applicable for different types of distributions 2 Frequency distribution (histograms) When describing the distribution of a numeric variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread. Histogram with a line connecting Histogram with a smoothed curve each column ➔ too detailed highlighting the overall pattern of the distribution 3 Types of distribution for numeric variables unimodal= one peak, bimodal =two Symmetric peaks, multimodal … (normal) (unimodal) distribution No of customers in restaurant Bimodal distribution Skewed (unimodal) distribution Time of day Not all distributions have a simple overall shape, especially when there are few observations. 4 Types of distribution for numeric variables A distribution is said to be symmetrical (or normal) if the right and left sides of the histogram are approximately mirror images of each other (also called bell-shaped or Gaussian) A distribution is skewed to the right (or positively skewed) if the right side of the histogram extends much further out than the left side (i.e. has a right ‘tail’) A distribution is skewed to the left (or negatively skewed) if the left side of the histogram extends much further out than the right side (i.e. has a left ‘tail’) 5 Assessing skewness in distributions 6 Effect of the distribution on measures of central tendency VERY IMPORTANT: in NORMAL DISTRIBUTIONS, the mean= mode=median. This means that the data are distributed around the mean , which is also the distribution’s most common (‘expected’) value Impact of skewed data on mean and median Years until death after diagnosis with stomach cancer Normal distribution… Stomach cancer: x = 3.4 M = 3.4 Mean and median are the same Years until death after diagnosis with multiple myeloma Positively skewed distribution… x = 3.4 Multiple myeloma: M = 2.5 The mean is pulled toward the skew Outliers In addition to skewness, another important kind of deviation is an outlier Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to interpret them! The overall pattern is fairly symmetrical except for 2 individuals who lie outside the main distribution frequency A large gap in the very low very high distribution is typically a activity activity compared to compared to sign of an outlier the overall the overall distribution distribution In this case, despite being extreme, these outliers seem valid, therefore they should not be excluded hours of physical activity per month Impact of outliers on mean and median Frequency (number of people) x = 3.4 x = 4.2 in M=3.4 M = 3.6 g y d Without the outliers le p o e With the outliers p f o t rc en e P Number of cigarettes smoked per day The mean is pulled to the The median, on the other hand, right (in this case!) by the remains largely unaffected outliers (from 3.4 to 4.2) (from 3.4 to 3.6) 10 Identifying skewness and outliers from Boxplots Comparing box plots for a normal and a right (positively) skewed distribution 15 14 Boxplots are an 13 12 alternative graphical 11 Years until death 10 method for evaluating 9 8 normality of the 7 distribution and 6 5 identifying skewness 4 3 or outliers 2 1 0 normally distribited variable skewed variable 11 How does the distribution affect our choice of summary statistic? (overview) The distribution of numeric variables should always be checked using a histogram and/or a box-plot Since the mean is affected by skewness and/or outliers, it should only be used for variables that are normally distributed and do not have outliers If the sample is large (i.e. >500 individuals) a few outliers will not affect the mean, so it can be used. Skewness however always affects the mean no matter how large is the sample The standard deviation, like the mean, is affected by skewness and outliers and thus, like the mean, is used only when the variable is normally distributed and no outliers are present 12 How does the distribution affect our choice of summary statistic? (overview) If a variable is not normally distributed (i.e. it is skewed) or has outliers, then the median and the interquartile range should be used as measures of central tendency and dispersion, respectively, rather than the mean and standard deviation The mode is used only infrequently in scientific research 13 Distributions and probability This is a Histogram of height showing a normal distribution The mean and standard deviation of height in this sample is.. 16 How can we use the standard deviation to estimate a range of values in a given distribution? mean+1*s.dev= 1.774+0.147 = 1.92 mean-1*s.dev= 1.774-0.147 = 1.627 How can we use the standard deviation to estimate a range of values in a given distribution? mean+1*s.dev= 1.774+0.147 = 1.92 mean-1*s.dev= 1.774-0.147 = 1.627 68% of values in sample contained in this range How do we calculate a range that covers 95% of the values in the sample? 1.96 is the magic number..! mean+1.96*s.dev =1.774+0.286 = 2.06 mean-1.96*s.dev= 1.774-0.286 = 1.486 How do we calculate a range that covers 95% of the values in the sample? 1.96 is the magic number..! mean+1.96*s.dev =1.774+0.286 = 2.06 mean-1.96*s.dev= 1.774-0.286 = 1.486 95% of values in sample contained in this range Can we use the standard deviation to predict the probability of values occurring? If we choose a random person from the specific sample, the probability that this person will have a height between 1.63m and 1.92m is 68% (check slide 18) If we choose a random person from the specific sample, the probability that this person will have a height between 1.49m and 2.06m is 95% (check slide 20) Note: The above applies explicitly for distributions which are perfectly normal 21 HOMEWORK 1 If we choose a random person from the distribution of height presented in the previous slides (assuming a perfect normal distribution), what is the probability that this person will: 1. have a height 1.92m? 22 HOMEWORK 2 In a sample of 600 individuals, body mas index (BMI) shows a perfect normal distribution and for the purposes of further investigation, a team of doctors want to select only people who are underweight. For this purpose, they decided to label as ‘underweight’ anyone who is 2 standard deviations below the mean BMI. How many people do you expect to be selected? 23 Session LOBs LOB4: Outline the normal distribution and its statistical qualities and calculate probabilities based on these. LOB5: Recognise deviations from normality in a variable distribution and outline skewness. LOB6: Describe how skewness and outliers affect measures of central tendency and dispersion and decode which summary statistics are applicable for different types of distributions 24 Further reading (optional) Petrie A. & Sabin C. Medical Statistics at a Glance, 3rd Edition, Chapters 3, 7, 8 [ISBN : 978-1-4051-8051-1] Kirkwood B. & Sterne J. Essential Medical Statistics, 2nd Edition, Chapter 5 [ISBN : 978-1-118-30096-1] 25