Organization and Presentation of Data PDF
Document Details
Kyambogo University
Ndahura Nicholas Bari (PhD)
Tags
Summary
This document provides a detailed overview of methods for the organization and presentation of data, particularly focusing on various table types (frequency, grouped, relative, and cumulative). It details concepts, features, and examples. The document is part of a university lecture, likely biostatistics, covering data presentation techniques for a university class.
Full Transcript
ORGANIZATION AND PRESENTATION OF DATA GVN 7106: BIOSTATISTICS AND DATA ANALYSIS Ndahura Nicholas Bari (PhD) Kyambogo University Intended Learning Outcomes (ILOs) By the end of this session, you should be able...
ORGANIZATION AND PRESENTATION OF DATA GVN 7106: BIOSTATISTICS AND DATA ANALYSIS Ndahura Nicholas Bari (PhD) Kyambogo University Intended Learning Outcomes (ILOs) By the end of this session, you should be able to: 1. List all the various tables used in data presentation 2. Describe each table discussed for data presentation 3. Mention the features of a good table 4. Construct all the tables discussed 5. Know appropriate table to be used for specific data Lists A list is the simplest form of a table consisting of two columns. First column is observational unit giving identity number to the variable value. The second column is the value of the variable for that unit. Frequency table This can be ungrouped data table or grouped data table. The table has three columns summarizing the data by showing the frequencies of the variables measured. The ungrouped data table is a technique for systematically arranging the data to indicate the frequency of each variable value. This reduces the size of the data especially when the raw data is large. It can be represented as a frequency table or histogram. The columns are as follows: The first column is observational unit giving identity number to the variable value. The second column is the value of the variable for that unit. The third column is the corresponding frequency of each variable. Ungrouped data table Grouped data frequency table Grouped data frequency table is used when the variables are continuous or large the variable values may become infinite and need to be grouped. A summary frequency table is produced by distributing the data into classes or categories and determining a fixed number of values that will be contained in each class. A grouped data frequency table also has three columns. The first column contains the identity number of the class intervals. The second column is the class groups containing the variable values. The third column is the corresponding total frequency of values in each class interval. Grouped data fr Relative frequency Relative frequency is the proportion or percentage of the total number of data points that fall within a specific category or range. It’s calculated by dividing the frequency of a particular category by the total number of observations. When to Use: Comparing categories: If you want to compare the proportion of observations in different categories relative to the total number of observations. Understanding distribution: To understand the distribution of data in terms of proportions or percentages, rather than absolute counts. Visualizing data: When creating graphs like pie charts or bar charts to show the percentage of each category. Statistical analysis: In inferential statistics, where proportions or probabilities are needed to perform calculations. Example: If you have a survey where 30 out of 100 people prefer a particular brand, the relative frequency is 30/100 or 30%. Relative frequency table The relative frequency table has a fourth column that contains a proportion of the total frequency in each row either for the grouped or ungrouped frequency table. This is calculated by dividing the class frequency by the total frequency for all the classes and expressed as a percentage. Itbecomes a relative frequency distribution when the frequency column is replaced by the relative frequency column. A grouped data relative frequency table The relative frequency table has a fourth column that contains a proportion of the total frequency in each row either for the grouped or ungrouped frequency table. This is calculated by dividing the class frequency by the total frequency for all the classes and expressed as a percentage. Itbecomes a relative frequency distribution when the frequency column is replaced by the relative frequency column. Relative frequency table Relative frequency can help you compare the frequencies of different values or categories across data sets that have different sizes or scales. It can also help you visualize the distribution of data in a pie chart or a relative frequency histogram. Cumulative frequency table The relative frequency table can be further transformed to a cumulative frequency table. The cumulative frequency table could either be for the grouped or ungrouped data. Ithas a column of frequencies that are cumulative from the first row in that the frequency at a value is the sum of the frequencies of the values less than or equal to that value. Cumulative frequency table Cumulative frequency can help you find the median, quartiles, and percentiles of a data set. It can also help you visualize the distribution of data in a cumulative frequency table, a cumulative frequency polygon, or an ogive. Cumulative frequency Cumulative frequency is the sum of the frequencies for all categories up to and including a certain category. It helps in understanding the number of observations that fall below a certain value. When to Use: Analyzing data distribution: To determine how many observations fall below a certain value or to understand the distribution of data across categories. Creating ogives: To construct ogive graphs, which are useful for visualizing cumulative data. Finding percentiles and quartiles: When you need to calculate percentiles, quartiles, or other statistical measures that involve the position of data in the distribution. Summarizing data: To get a sense of the cumulative total and distribution up to a certain point, especially in grouped data. Example: In a dataset of test scores, if you want to know how many students scored below 70, you would use cumulative frequency to sum the frequencies of all score ranges below 70. Properties of a good table Should have a title indicating at least who and what i.e. person and the variable being measured or presented. Should be drawn to scale. Should have a key. Should be appropriately labelled. Should have uniform features. Should not be clumsy but simple and self- explanatory. Summary Tabular presentation of data is one of the methods used in descriptive biostatistics in highlighting the characteristics of the spread of any population data. There are several forms of tables ranging from a simple list to frequency and cross tabulated tables. Tables provide a visual understanding of proportions of the values of variables of population data being presented. DIAGRAMMATIC PRESENTATION OF DATA By the end of this session, you should be able to: 1. State the various types and categories of diagrams used in descriptive biostatistics 2. State the properties of a good diagram 3. Identify and know the appropriate diagrammatic presentation for various category of variables 4. Successfully construct the various type of diagrams Bar charts Bar charts are graphic presentation of categorical variables drawn in rectangular forms with their lengths proportional to the frequencies or magnitudes of the variable represented. The bar charts can be in the form of two or more bars that are drawn adjacent to each other for comparison. The individual bar’s frequencies should sum up to percent. It is a comparison of bars by their frequencies of varying columns and shades. Types of bar charts Divergent bar chart: Categorical variable on bar charts is bars drawn on opposite directions. Single bars are drawn in opposite directions i.e. Single/basic/simple bar Stacked /segmented bar above and below the zero line and chart: Represents categorical chart: In a stacked bar chart, also in an increasing and decreasing data with rectangular bars. The bars are divided into segments, order respectively with the length of each bar is each representing a different sub- frequency of each bar compared proportional to the value or category within the main with the frequency of its opposite count of the category it category. The length of the bar is bar represents. the sum of its segments. Imagine you are comparing the Use : This chart is ideal for Use: This type of chart helps in performance of different departments in comparing the frequency or understanding the composition of a company against a target goal. Each count of categories. For department's performance might be each category and comparing the example, it can display the represented as a bar extending from the total and relative sizes of sub- central baseline (the target). number of patients in different categories. It’s beneficial for Departments exceeding the target show age groups or the count of seeing the proportion of different bars extending in one direction, while occurrences of different sub-groups within each category. those falling short of the target show Types of bar charts Grouped/clustered bar chart: displays multiple bars for each Histogram: a graphical representation of the category, allowing comparisons distribution of numerical data. It consists of between sub-categories within contiguous (touching) bars that represent the each main category. frequency (or count) of data points falling within Use: Useful for comparing sub- specific intervals, called bins or classes. groups within each main Use: Visualizing distribution histograms help in category. For instance, if you understanding the distribution of data—whether want to compare the average it's skewed, symmetric etc. blood pressure levels of different age groups across different genders, you can use a grouped Pie chart Pictogram A pie chart: consists of a circle whose area Also known as cartogram. This within the circle represents the total frequency. involves the use of drawings or The total area is then divided proportionally symbols to represent into various segments to represent various diagrammatically the various variables. variables of interest. A unit value of Use: Pie charts are effective for showing the the variable should be represented relative proportions of different categories by a standard symbol or drawing to within a whole. depict its magnitude or frequency. Data Presentation Summary Data Presentation Summary Data Presentation Summary MAPS AND GRAPHS USED IN BIOSTATISTICS GVN 7106: BIOSTATISTICS AND DATA ANALYSIS Ndahura Nicholas Bari (PhD) Kyambogo University Intended Learning Outcomes (ILOs) By the end of this session, you should be able to: 1. Define and describe various types of diagrams and graphs used if presenting data 2. Identify rightly appropriate diagrams and graphs for the right data presentation 3. Construct various types of graphs 4. State the differences between various graphs Relative frequency graph A relative frequency graph is a way of displaying data to show the proportion or percentage of occurrences of each value or category in a dataset, relative to the total number of observations. Unlike a simple frequency graph that shows the raw count of occurrences, a relative frequency graph normalizes these counts into proportions or percentages, making it easier to compare the data across different groups or datasets and provide insight into the distribution of data. Frequency polygon A frequency polygon is a type of graph used to analyse data sets consisting of numerical measurements. A frequency polygon is used to show a relationship between two variables. For example, if there is a relationship between height and weight, then a frequency polygon can be constructed on two scales to show how height and weight relate to each other. Frequency polygons may also be referred to as histograms or as cumulative distributions graphs (CDFs). The concept of frequency polygons is commonly used when analysing statistical data which has been grouped into classes. This type of graph is usually drawn with a histogram but can be drawn without a histogram as well. While a histogram is a graph with rectangular bars without spaces, a frequency polygon graph is a line graph that represents cumulative frequency distribution data Ogive An ogive (cumulative frequency polygon) is a type of frequency polygon that shows cumulative frequencies. An ogive graph plots cumulative frequency on the y-axis and class boundaries along the x- axis. It’s very similar to a histogram, only instead of rectangles, an ogive has a single point marking where the top right of the rectangle would be. Percentage Ogive: a graph of cumulative relative frequency polygon is called a percentage ogive Scatter diagram or Dot Graphs/plots Dot graphs plots are used to display the distribution of your sample data when you have continuous variables. These graphs stack dots along the horizontal X-axis to represent the frequencies of different values. More dots indicate greater frequency. Each dot represents a set number of observations. Scatter diagram/plot A scatter plot identifies a possible relationship between changes observed in two different sets of variables. It provides a visual and statistical means to test the strength of a relationship between two variables. Scatter plots can be effective in measuring the strength of relationships A one-dimensional scattergram plots data points along just one dimension, typically on a single axis. Summary Diagrams are used in descriptive statistics to reduce the size of data by compartmentalization for highlighting of the outstanding features of the population. There are various diagrams for appropriate variables which should be taken into consideration. The nature of the data collected and its variables determine the type of diagram that will be used to describe the population spread. Bar charts, pie charts, pictogram or picture diagram, map diagram or spot map are used for qualitative of categorical variables. Line chart, cumulative frequency diagram or ogive, frequency polygon or percentage ogive, histogram, dot graph or scatter diagram are used for quantitative or continuous variables. Example Levels of Analysis Univariate (uni-variable) analysis Bivariate (bi-variable) analysis Multivariate (multi-variable) analysis Univariate analysis Univariateanalysis is a statistical technique used to examine and summarize the distribution, central tendency, and dispersion of a single variable. Itfocuses on one variable at a time, providing insights into its characteristics without considering relationships with other variables. Itinvolves descriptive statistics and general descriptive information of the data. Gives a summary of key characteristics of the data It lays the foundation for further analysis. Bivariate analysis In bivariate analysis two variables are analysed together and examined for any possible significant association between them It is done after univariate analysis Involves a presumed dependent variable (the outcome or effect that is influenced or predicted based on changes in one or more independent/predictor or explanatory variables (variable that is manipulated or categorized to observe its effect on another variable) E.g. The relationship between blood pressure readings and salt intake Which is the dependent variable? Which is the independent variable? Multivariate analysis Multivariate analysis explores relationships and interactions among three or more variables. Helps to identify patterns, and make predictions based on multiple factors. Understanding relationships: Determine how multiple variables are related to one another and to the outcome variable(s). Grouping of data: Group observations into categories or clusters based on similarities among variables. Predictive modelling: Build models that can predict outcomes based on multiple predictor variables. Further the data analysis could be of two types descriptive and inferential analysis