MED106 1b Types of Variables and Basic Summary Statistics - IB PDF
Document Details
Uploaded by AppreciableDouglasFir
University of Nicosia
IB
Avgis Hadjipapas
Tags
Summary
This IB past paper provides an introduction to calculating basic summary statistics, like mean, median, standard deviation, and interquartile range. It includes specific examples of measures of central tendency such as mode and median and measures of dispersion such as variance and interquartile range. The paper also demonstrates how to represent data graphically using boxplots and histograms.
Full Transcript
Introduction to measurement IB: basic summary (descriptive) statistics Avgis Hadjipapas Professor for Neuroscience and Research Methods [email protected] Session LOBs LOB3: Calculate basic summary statistics, such as m...
Introduction to measurement IB: basic summary (descriptive) statistics Avgis Hadjipapas Professor for Neuroscience and Research Methods [email protected] Session LOBs LOB3: Calculate basic summary statistics, such as mean, median, standard deviation, interquartile range, proportions. Basic summary statistics for numeric variables What is a single, central, typical value for a given variable around which other values cluster? (centre or location of distribution) => measures of central tendency Measures of central tendency: mean, median, mode What is the extent of spread of values of a given variable (especially with respect to the central value of a distribution)=> measures of dispersion Measures of dispersion: variance, standard deviation, interquartile range Measures of central tendency: the mode Ask the question: what value can I expect, i.e. what is most common value of this variable? This is known as the mode Example: drinks on a night out in small sample of students 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 6 ‘3’ is the modal (most common) value Measures of central tendency: the median The median is the middle point of a distribution (i.e. the value such that half of the observations are smaller and half of the observations are larger) Measures of central tendency: the median The median is the middle point of a distribution (i.e. the value such that half of the observations are smaller and half of the observations are larger) 50% < median, 50% of people with BP lower than 122 mmHg 50%> median, 50% of people with BP higher than 122 mmHg Measures of central tendency: the mean The mean (or arithmetic mean or average) : simply add all values together and then divide by the number of values Note: The mean plays very important role in normal distribution, where most values are clustered around the mean. There mean=mode= median. i.e. arithmetic average happens to also be most common value and center of the distribution. We will revisit this later Homework 2 (mean and median) Using Excel, please attempt the following: 1. What is the mean height? 2. What is the median height? Note: for calculating mean and median use the Excel function ‘AVERAGE’ and ‘MEDIAN’, respectively Measures of dispersion: the range and quantiles Dispersion: how much do I expect my variable to vary/spread around central location? Range = largest value (max) - smallest value (min) Easy to compute, but not very informative Considers only two observations (the smallest and largest), thus is highly affected by extreme values Quantiles solve this problem: first sort values from min to max, then split in parts … – tertiles (splitting a numeric variable into 3 categories) – quartiles (splitting a numeric variable into 4 categories) – quintiles (splitting a numeric variable into 5 categories) Quartiles and percentiles Quartiles and Percentiles 180 data Q1=25th prctile Blood Pressure in sample of 101 patients [mmHg] 160 Q2=50th prctile- median Q3=75th prctile 140 120 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 observations sorted form smallest to largest Q1=25th percentile: 25% of observations smaller than this value Q2=50th percentile= median: 50% of observations smaller than this value Q3= 75th percentile: 75% of observations smaller than this value Interquartile Range (IQR) 180 Blood Pressure in sample of 101 patients [mmHg] 160 140 120 Interquartile Range (IQR)= Q3- Q1 100 80 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 observations sorted form smallest to largest IQR= Q3-Q1 IQR= the width of the range of values that contains 50% of central data Measures of dispersion: the standard deviation In a normal distribution, values are clustered around the mean (revisit later) However, values can be more or less spread out around the mean The standard deviation ‘s’ is used to quantify the typical (‘standard’) spread or variation (or dispersion) around the mean Just for info: s is the square root of variance, which is another measure of dispersion Standard deviation explained graphically 1) Calculate mean 2) Calculate how each data point deviates from mean 3) Calculate the average deviation from the mean 4) Standard deviation: how much do individual data points/values differ (‘deviation’) from the mean on average, i.e. typically (‘standard’) Presenting numeric data with graphs Box-plot Histogram Boxplot (5-number summary) 25 6. Largest = max = 6.1 1 BOXPLOT 24 5. 6 7 23 5. 3 Q3= third quartile 6 22 4. = 4.35 9 5 Years until death 21 4. 7 4 20 4. 5 3 M = median = 3.4 19 4. 2 2 18 4. 1 1 17 3. 0 9 Q1= first quartile Disease X 16 3. 8 = 2.2 15 3. Five-number summary: 7 min, Q1, median, Q3, max 14 3. Smallest = min = 0.6 Histogram Summary graph for a single numeric variable Very useful to understand the pattern of variability in the data The range of values that a variable can take is divided into intervals of equal size The histogram shows the number of individual data points that fall in each interval Frequency distribution table Height (m) Number of people 1.45-1.49 2 1.50-1.54 4 1.55-1.59 3 1.60-1.64 13 1.65-1.69 23 1.70-1.74 23 1.75-1.79 12 1.80-1.84 17 1.85-1.89 18 1.90-1.94 15 1.95-1.99 14 2.00-2.04 5 2.05-2.09 5 2.10-2.14 3 2.15-2.19 1 Total 160 The first column represents the number Histogram of people with height between 1.45-1.49m 1.50-1.54m 1.55-1.59m 2.15-2.19m 25 20 No of observations 15 10 5 0 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 Height in m Homework 3 (standard deviation) Using Excel, please attempt the following: 1. What is the standard deviation of height? Now assume a hypothetical scenario where 3 persons of short stature and 3 persons of very high stature were part of this dataset (let’s add them in our dataset!): Replace the first 3 height values in our dataset with: 1.36, 1.21, 1.28 Replace the last 3 height values in our dataset with: 2.20, 2.24, 2.31 2. How are the mean, median and standard deviation affected? Note: for calculating the standard deviation use the Excel function ‘STDEV’ Session LOBs LOB3: Calculate basic summary statistics, such as mean, median, standard deviation, interquartile range, proportions. Further reading (optional) Petrie A. & Sabin C. Medical Statistics at a Glance, 3rd Edition, Chapters 1, 4, 5,6 [ISBN : 978-1-4051-8051-1] Kirkwood B. & Sterne J. Essential Medical Statistics, 2nd Edition, Chapters 3, 4, 15 [ISBN : 978-1-118-30096-1]