Summarising Data Notes PDF
Document Details
Uploaded by StableEpilogue
King's College London
Tags
Summary
These notes explain different types of data, such as continuous, discrete, ordinal, and categorical data. It also explains how to summarize quantitative data, including measures of central tendency like mean and median, and measures of variability like standard deviation. Examples and visual representations are included.
Full Transcript
1 Summarising data NOTES TO ACCOMPANY ONLINE LECTURES1 Why summarise data? Data quality monitoring Data checking and data cleaning Baseline data in a study Before doing a complex analysis Quantitative...
1 Summarising data NOTES TO ACCOMPANY ONLINE LECTURES1 Why summarise data? Data quality monitoring Data checking and data cleaning Baseline data in a study Before doing a complex analysis Quantitative data Definition Quantitative data are data which can be measured numerically and may be continuous or discrete: Continuous data lie on a continuum and so can take any value between 2 limits. The only limitation is that imposed by the accuracy of the method of measurement so that some continuous data may be recorded as integers although that is an approximation to the true value Discrete data do not lie on a continuum and can only take certain values, usually counts (integers) Examples Weight is a continuous variable because it is measured using weighing scales. A person’s weight lies on a continuum and the only limitation is the accuracy of the scales The number of previous pregnancies in a pregnant woman is discrete data since it is counted and only whole numbers are possible Ordinal data Quantitative data are always ordinal – the data values can be arranged in a numerical order from the smallest to the largest. Questionnaire scale data are often ordinal and are often counts, such as when adding the number of positive responses to a set of questions to get a total score. Categorical data may also have an inherent ordering and so be ordinal, such as stage of disease. Notes In practice, continuous data may look discrete because of the way they are measured and/or reported. For example gestational age of babies is often reported in whole weeks, such as 38 weeks, and so appears to be discrete. It is however continuous because it could be reported to a greater degree of accuracy, for example as a decimal, such as 38.5 weeks All continuous measurements are limited by the accuracy of the instrument used to measure them, and many quantities such as age and height are reported in whole numbers for convenience 2 Categorical data Definition Categorical data are data where individuals fall into a number of separate categories or classes. For example: gender: male or female = 2 classes disease status: alive or dead = 2 classes stage of cancer: I, II, III or IV = 4 classes marital status: married, single, divorced, widowed or legally separated = 5 classes Ordering Different categories of categorical data may be assigned a number for coding purposes and if there are several categories there may be an implied ordering, such as with stage of cancer where stage I is the least advanced and stage IV the most advanced. This means that such data are ordinal. Obviously, calculating a mean stage of cancer for a group of individuals is probably unhelpful. Dichotomous data This is where there are only 2 classes and all individuals fall into one or other of the classes. These data are also known as binary data. Categorising continuous data It is possible to re-classify continuous data into groups, perhaps for ease of reporting. For example it is common to report birthweight in bands, giving the numbers of babies who fall into each birthweight band. Example: categorizing birthweight