Unit 6.1 Descriptive Stats PDF
Document Details
Uploaded by FriendlyTrust
University of KwaZulu-Natal - Westville
Dr Z White
Tags
Summary
This document provides detailed information about descriptive statistics, covering topics such as study outcomes, data analysis procedures, and statistical measures. It's designed for an undergraduate-level course on data analysis and quantitative methods.
Full Transcript
Unit 6.1 Data analysis – Quantitative Part 1 Descriptive statistics Dr Z White Study outcomes After completing this unit, you will be able to: demonstrate an understanding of the key concepts and describe them in your own words (refer to concepts in study guide) unde...
Unit 6.1 Data analysis – Quantitative Part 1 Descriptive statistics Dr Z White Study outcomes After completing this unit, you will be able to: demonstrate an understanding of the key concepts and describe them in your own words (refer to concepts in study guide) understand the steps in data processing demonstrate knowledge of the broad classification of statistics demonstrate knowledge of the different measures of central tendency and variability demonstrate knowledge of and interpret inferential statistics demonstrate the ability to develop a frame of analysis (Data management) for a research proposal demonstrate the ability to perform descriptive statistics in Microsoft Excel demonstrate the knowledge of methods of communicating and displaying analysed data (RHC400) demonstrate the ability to compile tables and graphs as part of a research report (RHC400) This study unit will enable you to complete Protocol template section 13 of your research proposal "...data analysis must be planned before data are collected. Without a plan, data which are unsuitable, insufficient or excessive may be collected." Section should include: detailed description of the data collection methods data management techniques statistical tools software used for analysis Choosing appropriate statistical procedures Aim and objective of the study Type and distribution of the data used Nature of the observations (paired/unpaired) Simple descriptive statistics Ratio = A/B Incidence rate:Number of new cases over period Population at risk Prevalence rate = Total number of cases of the disease at the time Population at risk =reported as per 100 000 Refer to p170-171 for examples Descriptive Statistics Explain and summarise data How many? What is the midpoint or average? How is the data spread? Level of measurement of variables dictates the type of statistical analysis that will be used It is therefore important for you to clearly indicate your variables in your proposal and indicate the level of measurement. This will help/enable you to select the most appropriate statistical (descriptive) analysis to answer your research question/objective Deciding How to Categorize a Variable Normal distribution Skewed distribution Median Range, IQR Frequency distributions Number of times a result occurs Systematic arrangement of the lowest to the highest scores linked with the number of times the score occurs NB categories should not overlap Frequency counts: listing or counting of each observation in a category (nominal/ordinal) *Interval & ratio data: frequencies can be used, but for continuous data, mean/median/mode are mostly used to report data Frequency distributions Counts the number of times a particular value occurs in a data set How many respondents have a certain characteristic (frequency) n What percentage of the sample has that characteristic (relative frequency) % Groups respondents into the subcategories into which a variable can be divided See examples p 168-169 in text book Grundlingh N, Zewotir TT, Roberts DJ, Manda S. Assessment of prevalence and risk factors of diabetes and pre-diabetes in South Africa. J Health Popul Nutr. 2022 Mar 2;41(1):7. doi: 10.1186/s41043-022-00281-2. PMID: 35236427; PMCID: PMC8889060. Measures of central tendency Indication of value around which data is dispersed Only be used for continuous data Mode: Most frequently occurring observation in the dataset / distribution Median: Midpoint score or value in a group of data ranked from lowest to highest Mean: Sum of all the observations divided by the number of observation NB: Extreme values may shift the mean artificially upwards or downwards = median truer measure of central tendency for skewed distributions/data Mean = used for Normally distributed data Refer to p172-173 for examples Normal distributions/data Bell shaped Mean is located at center of the distribution and curve symmetrical about the mean Measures of variability / dispersion / spread Indicators of variability: how widespread values or scores are in a distribution Degree to which the data are spread out around the mean or median Range: Difference between the smallest and largest values in a distribution Interquartile range (IQR): The range within which the middle 50% of the scores (data/distribution) fall Standard deviation (SD / s.d.) [variance is used to calculate the SD]: How values vary about the mean of the distribution (spread around the mean) Measure of the average distance of the individual data from their mean Wider the spread = larger SD Empirical rule - Normally distributed data 68% of all values will be ±1SD away from the mean ~95% of the values will lie no further than 2 SD either side of the mean ~99% of values will lie within 3 SD either side of the mean Interpretation of SD SD = Reported with mean for normally distributed data EXAMPLE: Baseline birth weight: 1.18 ± 0.04kg Interpretation = ~95% of all babies in this study weighed between 1.1 and 1.26 kg OR 68% of all babies in this study weighed between 1,14 and 1,22 kg TIP – use 68% then you add and subtract one (1) SD from the mean to get the spread, for example: 1,18 (mean) – SD (0,04) = 1,14 AND 1,18 (mean) + SD (0,04) = 1,22 NB always remember unit (eg kg) Interquartile range (IQR) Quartiles divide the distribution into quarters or25% from the lowest to the highest value: 1st quartile = point below which 25% of observations lie 2nd quartile = point below which 50% of observations lie (median) 3rd quartile = point below which 75% of observations lie Interquartile range (IQR) IQR is the difference (in the value) between the third (at 75th percentile) and the first quartile (at the 25th percentile) Measure of the difference between that value below which a quarter of the values lie (first quartile), and that value above which a quarter of the values lie (third quartile) Most appropriate measure of spread when median is used in skewed data (insensitive to the presence of outliers) EXAMPLE: Median (IQR) pain = 44 (25.3-68) Interpretation: The median level of pain in this group was 44. The IQR is from 25.3 to 68 so 25% had pain les than 25.3 and 25% more than 68. The participants in the intervention Exercises from group had a mean age of 40,5 years, test/exam with ages ranging from 26-55 years. 68% of all the participants in the Interpret “Age” reported intervention group in this study had a as a continuous variable age between 31,9 and 49,1 years for the intervention group. Exercises from test/exam Among participants in the intervention group, 2 were Interpret “Age” reported as a between the ages of 20-29, categorical variable for the 3 between the ages of 30- intervention group. 39, 7 between the ages of 40-49 and 2 above the age of 50 years More examples of questions from tests/exam Identify one of each of the following from abstract/Table (1 mark each) a) Continuous variable b) Categorical variable c) Measure of central tendency d) Measure of variability/dispersion e) Nominal data/scale (level of measurement) f) Interval data/scale (level of measurement) Doing descriptive stats in Excel Look at the next few “screenshots” slides on how Research example to do descriptive stats in excel using the following Research objective: example (I have also To determine the current academic performance of 3rd students included some video links for enrolled in the RHC300 module more detailed instructions) Concept: Academic performance Indicator: Module grades Variable: Semester mark Decision level (working definitions) Report variable as continuous variable (Mean/median + measure of variance) Report variable as categorical variable (frequency distribution): ≥70% = Good 50-69 = Average