Applied Business Statistics PDF

## Applied Business Statistics ### Chapter 2 - Summarising Data: Summary Tables and Graphs ### 2.1 Introduction Managers can only benefit from statistical findings if the information can easily be interpreted and effectively communicated to them. Summary tables and graphs are commonly used to convey descriptive statistical results. A table or a graph can convey information much more quickly and vividly than a written report. For graphs in particular, there is much truth in the adage 'a picture is worth a thousand words'. In practice, an analyst should always consider using summary tables and graphical displays ahead of written texts, in order to convey statistical information to managers. Summary tables and graphs can be used to summarise (or profile) a single random variable (e.g. most-preferred TV channel by viewers or pattern of delivery times) or to examine the relationship between two random variables (e.g. between gender and newspaper readership). The choice of a summary table and graphic technique depends on the data type being analysed (i.e. categorical or numeric). The sample dataset in Table 2.1 of the shopping habits of 30 grocery shoppers will be used to illustrate the different summary tables and graphs. ### Table 2.1 Sample data set of grocery shoppers | Customer | Store Preference | Visits | Spend | Family Size | Age | Gender | |---|---|---|---|---|---|---| | 1 | 1 | 2 | 946 | 1 | 26 | 2 | | 2 | 2 | 4 | 1842 | 1 | 45 | 1 | | 3 | 1 | 3 | 885 | 1 | 32 | 1 | | 4 | 1 | 3 | 1332 | 1 | 33 | 2 | | 5 | 1 | 2 | 744 | 1 | 65 | 1 | | 6 | 2 | 2 | 963 | 2 | 56 | 2 | | 7 | 2 | 2 | 589 | 1 | 58 | 2 | | 8 | 1 | 4 | 1026 | 2 | 32 | 1 | | 9 | 2 | 5 | 1766 | 2 | 45 | 2 | | 10 | 1 | 3 | 1232 | 2 | 30 | 1 | | 11 | 1 | 3 | 588 | 1 | 35 | 1 | | 12 | 2 | 4 | 1137 | 2 | 46 | 1 | | 13 | 3 | 4 | 2136 | 1 | 50 | 1 | | 14 | 1 | 2 | 964 | 1 | 23 | 2 | | 15 | 1 | 2 | 685 | 1 | 31 | 1 | | 16 | 2 | 3 | 1022 | 1 | 45 | 2 | | 17 | 1 | 3 | 754 | 2 | 36 | 1 | | 18 | 1 | 2 | 456 | 1 | 62 | 1 | | 19 | 2 | 4 | 1125 | 2 | 46 | 2 | | 20 | 2 | 3 | 1486 | 2 | 38 | 2 | | 21 | 2 | 3 | 945 | 1 | 43 | 1 | | 22 | 1 | 2 | 856 | 1 | 69 | 2 | | 23 | 2 | 4 | 1636 | 2 | 34 | 1 | | 24 | 1 | 3 | 1268 | 1 | 34 | 2 | | 25 | 1 | 2 | 966 | 1 | 26 | 2 | | 26 | 1 | 2 | 1445 | 1 | 28 | 2 | | 27 | 1 | 3 | 1032 | 1 | 30 | 1 | | 28 | 1 | 1 | 776 | 2 | 33 | 2 | | 29 | 2 | 4 | 1070 | 1 | 64 | 2 | | 30 | 2 | 3 | 642 | 2 | 23 | 1 | ### 2.2 Summarising Categorical Data #### Single Categorical Variable ##### Categorical Frequency Table A categorical frequency table summarises data for a single categorical variable. It shows how many times each category appears in a sample of data and measures the relative importance of the different categories. Follow these steps to construct a categorical frequency table: * List all the categories of the variable (in the first column). * Count and record (in the second column) the number of occurrences of each category. * Convert the counts per category (in the third column) into percentages of the total sample size. This produces a percentage categorical frequency table. It is always a good idea to express the counts as percentages because this makes them easy to understand and interpret. In addition, it makes the comparisons between samples of different sizes easier to explain. A categorical frequency table can be displayed graphically either as a bar chart or a pie chart. #### Bar Chart To construct a bar chart, draw a horizontal axis (x-axis) to represent the categories and a vertical axis (y-axis) scaled to show either the frequency counts or the percentages of each category. Then construct vertical bars for each category to the height of its frequency count (or percentage) on the y-axis. Note that the sum of the frequency counts (or %) across the bars must equal the sample size (or 100%). The bars must be of equal width to avoid distorting a category's importance. However, neither the order of the categories on the x-axis, nor the widths of the bars matter. It is only the bar heights that convey the information of category importance. #### Pie Chart To construct a pie chart, divide a circle into category segments. The size of each segment must be proportional to the count (or percentage) of its category. The sum of the segment counts (or percentages) must equal the sample size (or 100%). #### Example 2.1 Grocery Shoppers Survey A market research company conducted a survey amongst grocery shoppers to identify their demographic profile and shopping patterns. A random sample of 30 grocery shoppers was asked to complete a questionnaire that identified: * at which grocery store they most preferred to shop * the number of visits to the grocery store in the last month * the amount spent last month on grocery purchases * their age, gender and family size. The response data to each question is recorded in Table 2.1. Each column shows the 30 responses to each question and each row shows the responses of a single grocery shopper to all six questions. Refer to the 'store preference' variable in Table 2.1. 1. Construct a percentage frequency table to summarise the store preferences of the sample of 30 grocery shoppers. 2. Show the findings graphically as a bar chart and as a pie chart. #### Management Questions * Which grocery store is most preferred by shoppers? * What percentage of shoppers prefer this store? * What percentage of shoppers prefer to shop at Spar grocery stores? #### Solution 1. For the categorical variable 'store preference' there are three categories of grocery stores that shoppers use: 1 = Checkers; 2 = Pick n Pay; 3 = Spar. To construct the percentage frequency table, first count the number of shoppers that prefer each store - there are 10 ones (Checkers), 17 twos (Pick n Pay) and 3 threes (Spar). Then convert the counts into percentages by dividing the count per store by 30 (the sample size) and multiplying the result by 100 (i.e. Checkers = 10/30 × 100=33.3%; Pick n Pay = 17/30 × 100 = 56.7%; Spar = 3/30 × 100 = 10%). The percentage frequency table of grocery store preferences is shown in Table 2.2. #### Table 2.2 Percentage frequency table for grocery store preference of shoppers | Preferred store | Count | Percentage | |---|---|---| | 1 = Checkers | 10 | 33.3% | | 2 = Pick n Pay | 17 | 56.7% | | 3 = Spar | 3 | 10.0% | | Total | 30 | 100% | 2. The frequency table can be displayed graphically, either as a bar chart or a pie chart. The relative importance of each category of the frequency table is represented by a bar in a bar chart (see Figure 2.1) or by a segment of a circle in a pie chart (see Figure 2.2). #### Figure 2.1 Bar chart of grocery shoppers' store preferences **% of shoppers** | | | | -------- | -------- | | | 60 50 40 30 20 10 0 | | Checkers | | | Pick n Pay | | | | | | | Spar | | | | #### Figure 2.2 Pie chart of grocery shoppers' store preferences | | | | -------- | -------- | | | 3, 10% | | Checkers | | | | 10, 33.3% | | Pick n Pay | | | | 17, 56.7% | | Spar | 3, 10% | **Management Interpretation** * The grocery store most preferred by shoppers is Pick n Pay. * More than half of the sampled shoppers (56.7%) prefer to shop at Pick n Pay for their groceries. * Only 10% of the sampled shoppers prefer to do their grocery shopping at Spar. Charts and graphs must always be clearly and adequately labelled with headings, axis titles and legends to make them easy to read and to avoid any misrepresentation of information. The data source must, where possible, also be identified to allow a user to assess the credibility and validity of the summarised findings. Bar charts and pie charts display the same information graphically. In a bar chart, the importance of a category is shown by the height of a bar, while in a pie chart this importance is shown by the size of each segment (or slice). The differences between the categories are clearer in a bar chart, while a pie chart conveys more of a sense of the whole. A limitation of both the bar chart and the pie chart is that each displays the summarised information on only one variable at a time. #### Two Categorical Variables ##### Cross-tabulation Table A cross-tabulation table (also called a contingency table) summarises the joint responses of two categorical variables. The table shows the number (and/or percentage) of observations that jointly belong to each combination of categories of the two categorical variables. This summary table is used to examine the association between two categorical measures. Follow these steps to construct a cross-tabulation table: * Prepare a table with m rows (m = the number of categories of the first variable) and n columns (n = the number of categories of the second variable), resulting in a table with (m × n) cells. * Assign each pair of data values from the two variables to an appropriate category-combination cell in the table by placing a tick in the relevant cell. * When each pair of data values has been assigned to a cell in the table, count the number of ticks per cell to derive the joint frequency count for each cell. * Sum each row to give row totals per category of the row variable. * Sum each column to give column totals per category of the column variable. * Sum the column totals (or row totals) to give the grand total (sample size). These joint frequency counts can be converted to percentages for easier interpretation. The percentages could be expressed in terms of the total sample size (percent of total), or of row subtotals (percent of rows) or of column subtotals (percent of columns). To determine whether an association exists between two categorical variables, compare the overall percentage profile of one of the categorical variables to the percentage profile of this same variable for each level of the second categorical variable. If the overall percentage profile is the same (or very similar) to each level's percentage profile, then there is no association. If at least one level's percentage profile differs significantly from the overall percentage profile, then an association exists. The cross-tabulation table can be displayed graphically either as a stacked bar chart (also called a component bar chart) or a multiple bar chart. An inspection of the charts can also reveal evidence of an association of not. The more similar the chart profiles are (based on either row percentage-wise or column percentage-wise computations), the less likely there is an association and vice versa. ##### Stacked Bar Chart Follow these steps to construct a stacked bar chart: * Choose, say, the row variable, and plot the frequency of each category of this variable as a simple bar chart. * Split the height of each bar in proportion to the frequency count of the categories of the column variable. This produces a simple bar chart of the row variable with each bar split proportionately into the categories of the column variable. The categories of column variable are 'stacked' on top of each other within each category bar of the row variable. ##### Multiple Bar Chart Follow these steps to construct a multiple bar chart: * For each category of, say, the row variable, plot a simple bar chart constructed from the corresponding frequencies of the categories of the column variable. * Display these categorised simple bar charts next to each other on the same axes. The multiple bar chart is similar to a stacked bar chart, except that the stacked bars are displayed next to rather than on top of each other. The two charts convey exactly the same information on the association between the two variables. They differ only in how they emphasise the relative importance of the categories of the two variables. #### Example 2.2 Grocery Shoppers Survey - Store Preferences by Gender Refer to the 'store preference' variable and the 'gender' variable in Table 2.1. 1. Construct a cross-tabulation table of frequency counts between 'store preference' (as the row variable) and 'gender' (as the column variable) of shoppers surveyed. 2. Display the cross-tabulation as a stacked bar chart and as a multiple bar chart. 3. Construct a percentage cross-tabulation table to show the percentage split of gender for each grocery store. #### Management Questions * How many shoppers are male and prefer to shop at Checkers? * What percentage of all grocery shoppers are females who prefer Pick n Pay? * What percentage of all Checkers' shoppers are female? * Of all male shoppers, what percentage prefer to shop at Spar for their groceries? * Is there an association between gender and store preference (i.e. does store preference differ significantly between male and female shoppers)? #### Solution 1. The row categorical variable is 'store preference': 1 = Checkers; 2 = Pick n Pay; 3 = Spar. The column categorical variable is 'gender': 1 = female; 2 = male. #### Table 2.3 Cross-tabulation table - grocery store preferences by gender | Store | Gender | Total | |---|---|---| | 1 = Checkers | 7 | 3 | 10 | | 2 = Pick n Pay | 10 | 7 | 17 | | 3 = Spar | 2 | 1 | 3 | | Total | 19 | 11 | 30 | To produce the cross-tabulation table, count how many females prefer to shop at each store (Checkers, Pick n Pay and Spar) and then count how many males prefer to shop at each store (Checkers, Pick n Pay and Spar). These joint frequency counts are shown in Table 2.3. The cross-tabulation table can also be completed using percentages (row percentages, column percentages or as percentages of the total sample). 2. Figure 2.3 and Figure 2.4 show the stacked bar chart and multiple bar chart respectively for the cross-tabulation table of joint frequency counts in Table 2.3. #### Figure 2.3 Stacked bar chart - grocery store preferences by gender **Number of shoppers** | | | | -------- | -------- | | | 18 16 14 12 10 8 6 4 2 0 | | Checkers | | | Pick n Pay | | | | | | | Spar | | | | #### Figure 2.4 Multiple bar chart - grocery store preferences by gender **Number of shoppers** | | | | -------- | -------- | | | 12 10 8 6 4 2 0 | | Checkers | | | Pick n Pay | | | | | | | Spar | | | | 3. Table 2.4 shows, for each store separately, the percentage split by gender (row percentages), while Table 2.5 shows, for each gender separately, the percentage breakdown by the grocery store preferred (column percentages). #### Table 2.4 Row percentage cross-tabulation table (store preferences by gender) | Store | Gender | Total | |---|---|---| | 1 = Checkers | 70% | 30% | 100% | | 2 = Pick n Pay | 59% | 41% | 100% | | 3 = Spar | 67% | 33% | 100% | | Total | 63% | 37% | 100% | From Table 2.4, of those shoppers who prefer Checkers, 70% are female and 30% are male. Similarly, of those who prefer Pick n Pay, 59% are female and 41% are male. Finally, 67% of customers who prefer to shop at Spar are female, while 33% are male. Overall, 63% of grocery shoppers are female, while only 37% are male. #### Table 2.5 Column percentage cross-tabulation table (store preferences by gender) | Store | Gender | Total | |---|---|---| | 1 = Checkers | 37% | 27% | 33% | | 2 = Pick n Pay | 53% | 64% | 57% | | 3 = Spar | 11% | 9% | 10% | | Total | 100% | 100% | 100% | From Table 2.5, of all female shoppers, 37% prefer Checkers, 53% prefer Pick n Pay and 11% prefer to shop for groceries at Spar. For males, 27% prefer Checkers, 64% prefer Pick n Pay and the balance (9%) prefer to shop at Spar for their groceries. Overall, 33% of all shoppers prefer Checkers, 57% prefer Pick n Pay and only 10% prefer Spar for grocery shopping. #### Management Interpretation * Of the 30 shoppers surveyed, there are only three males who prefer to shop at Checkers. * 33.3% (10 out of 30) of all shoppers surveyed are females who prefer to shop at Pick n Pay. * 70% (7 out of 10) of all Checkers shoppers are female. (Refer to the row percentages in Table 2.4.) * Only 9% (1 out of 11) of all males prefer to shop at Spar. (Refer to the column percentages in Table 2.5.) * Since the percentage breakdown between male and female shoppers across the three grocery stores is reasonably similar to the overall gender profile regardless of store preference (i.e. 63% female and 37% male), it can be concluded that gender and store preference are not statistically associated. ### 2.3 Summarising Numeric Data Numeric data can also be summarised in table format and displayed graphically. The table is known as a numeric frequency distribution and the graph of this table is called a histogram. From Table 2.1, the numeric variable 'age of shoppers', will be used to illustrate the construction of a numeric frequency distribution and its histogram. #### Single Numeric Variable ##### Numeric Frequency Distribution A numeric frequency distribution summarises numeric data into intervals of equal width. Each interval shows how many numbers (data values) fall within the interval. Follow these steps to construct a numeric frequency distribution: * Determine the data range. Range = Maximum data value - Minimum data value For the age of grocery shoppers, the age data range is 69-23= 46 years. * Choose the number of intervals (k). While there is no strict formula to find k, each one of the following rules can be used as a guide based on sample size (n): Sturges' rule (k=1+2.322*log10(n)); Rice's rule (k=2xn) and the Square-root rule (k = √n). X-Static uses Rice's rule. As a general rule, choose between 5 and 10 intervals, depending on the sample size: the smaller the sample size, the fewer the number of intervals, and vice versa. For n = 30 shoppers, choose five intervals. * Determine the interval width. Interval width = Data range / Number of intervals Use this as a guide to determine a 'neat' interval width. For the 'age' variable, the approximate interval width is 46/5 = 9.2 years. Hence choose an interval width of 10 years. * Set up the interval limits. The lower limit for the first interval should be a value smaller than or equal to the minimum data value and should be a number that is easy to use. Since the youngest shopper is 23 years old, choose the lower limit of the first interval to be 20. The lower limits for successive intervals are found by adding the interval width to each preceding lower limit. The upper limits are chosen to avoid overlaps between adjacent interval limits. Lower limit Upper limit 20 20 - 29 (or 29) 30 30 - 39 (or 39) 40 40 - 49 (or 49) 50 50 - 59 (or 59) 60 60 - 69 (or 69) The format of <30 (less than 30) should be used if the source data is continuous, while an upper limit such as 29 can be used if the data values are discrete. * Tabulate the data values. Assign each data value to one, and only one, interval. A count of the data values assigned to each interval produces the summary table, called the numeric frequency distribution. When constructing a numeric frequency distribution, ensure that: * the interval widths are equal in size * the interval limits do not overlap (i.e. intervals must be mutually exclusive) . each data value is assigned to only one interval * the intervals are fully inclusive (i.e. cover the data range) . the sum of the frequency counts must equal the sample size, n, or that the percentage frequencies sum to 100%. The frequency counts can be converted to percentages (or proportions) by dividing each frequency count by the sample size. The resultant summary table is called a percentage (or relative) frequency distribution. It shows the percentage (or proportion) of data values within each interval. ##### Histogram A histogram is a graphic display of a numeric frequency distribution. Follow these steps to construct a histogram: * Arrange the intervals consecutively on the x-axis from the lowest interval to the highest. There must be no gaps between adjacent interval limits. * Plot the height of each bar (on the y-axis) over its corresponding interval, to show either the frequency count or percentage frequency of each interval. The area of a bar (width x height) measures the density of values in each interval. #### Example 2.3 Grocery Shoppers Survey - Profiling the Ages of Shoppers Refer to the 'age of shoppers' variable in Table 2.1. 1. Construct a numeric frequency distribution for the age profile of grocery shoppers. 2. Compute the percentage frequency distribution of shoppers' ages. 3. Construct a histogram of the numeric frequency distribution of shoppers' ages. #### Management Questions * How many shoppers are between 20 and 29 years of age? * What is the most frequent age interval of shoppers surveyed? * What percentage of shoppers belong to the most frequent age interval? * What percentage of shoppers surveyed are 60 years or older? * What is the maximum age for the youngest 20% of shoppers surveyed? #### Solution 1 and 2 The numeric and percentage frequency distributions for the ages of grocery shoppers are shown in Table 2.6, and are based on the steps shown above. #### Table 2.6 Numeric (and percentage) frequency distribution - age of shoppers | Age (years) | Tally | Count | Percentage | Relative | |---|---|---|---|---| | 20-29 | | 6 | 20% | 0.2 | | 30-39 | | 9 | 30% | 0.3 | | 40-49 | | 8 | 27% | 0.27 | | 50-59 | | 4 | 13% | 0.13 | | 60-69 | | 3 | 10% | 0.1 | | Total | | 30 | 100% | | 3. Figure 2.5 shows the histogram of the numeric frequency distribution for shoppers' ages. #### Figure 2.5 Histogram - age of shoppers **Number of shoppers** | | | | -------- | -------- | | | 12 10 8 6 4 2 0 | | 20-29 | | | 30-39 | | | 40-49 | | | 50-59 | | | 60-69 | | | More | | | | Age intervals (in years) | #### Management Interpretation * There are six shoppers between the ages of 20 and 29 years. * The most frequent age interval is between 30 and 39 years. * 30% of shoppers surveyed are between 30 and 39 years of age. * 10% of shoppers surveyed are 60 years or older * The youngest 20% of shoppers are no older than 29 years. If the numeric data are discrete values in a limited range (5-point rating scales, number of children in a family, number of customers in a bank queue, for example), then the individual discrete values of the random variable can be used as the 'intervals' in the construction of a numeric frequency distribution and a histogram. This is illustrated in Example 2.4 below. #### Example 2.4 Grocery Shoppers Survey - Profiling the Family Size of Shoppers Refer to the random variable 'family size' in the database in Table 2.1. Construct a numeric and percentage frequency distribution and histogram of the family size of grocery shoppers surveyed. #### Management Questions * Which is the most common family size? * How many shoppers have a family size of three? * What percentage of shoppers have a family size of either three or four? #### Solution Family size is a discrete random variable. The family sizes range from 1 to 5 (see data in Table 2.1). Each family size can be treated as a separate interval. To tally the family sizes, count how many shoppers have a family size of one, two, three, four and of five. Table 2.7 and Figure 2.6 show the numeric and percentage frequency table and histogram for the discrete numeric data of family size of grocery shoppers. #### Table 2.7 Numeric (and percentage) frequency distribution - family size of shoppers | Family size | Tally | Count | Percentage | |---|---|---|---| | 1 | | 4 | 13.3% | | 2 | | 11 | 36.7% | | 3 | | 8 | 26.7% | | 4 | | 5 | 16.7% | | 5 | | 2 | 6.6% | | Total | | 30 | 100% | #### Figure 2.6 Histogram - family size of shoppers **Number of shoppers** | | | | -------- | -------- | | | 14 12 10 8 6 4 2 0 | | 1 | | | 2 | | | 3 | | | 4 | | | 5 | | | | Family size | #### Management Interpretation * The most common family size of grocery shoppers is two. * There are eight shoppers that have a family size of three. * 43.4% (26.7% + 16.7%) of shoppers surveyed have a family size of either three or four. ##### Cumulative Frequency Distribution Data for a single numeric variable can also be summarised into a cumulative frequency distribution. A cumulative frequency distribution is a summary table of cumulative frequency counts which is used to answer questions of a 'more than' or 'less than' nature. Follow these steps to construct a less than cumulative frequency distribution from a numeric frequency distribution: * For each interval, starting with the lowest interval, ask the question: 'How many data values are below this interval's upper limit?' * The answer is: the sum of all frequency counts (or percentages, or proportions) that lie below this current interval's upper limit. * This is repeated until the last interval is reached. * A shortcut method to find each successive less than cumulative frequency count (or percentage, or proportion) is to add each interval's frequency count to the cumulative frequency immediately preceding it. * The last interval's cumulative frequency count must always equal the sample size, n, (or 100% or 1). To find the more than cumulative frequency distribution, start from the highest interval's lower limit by asking the question: 'How may data values are above this interval's lower limit?' Then work back to the first interval by adding each successive interval's frequency to the preceding cumulative frequency. ##### Ogive An ogive is a graph of a cumulative frequency distribution. Follow these steps to construct an ogive: * On a set of axes, mark the interval limits on the x-axis. * On the y-axis, plot a cumulative frequency of zero opposite the lower limit of the first interval. Thereafter, plot each cumulative frequency count (or cumulative percentage or cumulative proportion) opposite the upper limit of its interval. * Join these cumulative frequency points to produce a line graph. * The cumulative line graph always starts at zero at the lower limit of the first interval. * The cumulative line graph always ends at the upper limit of the last interval. This ogive graph can now be used to read off cumulative answers to questions of the following type: * How many (or what percentage) of observations lie below (or above) this value? * What data value separates the data set at a given cumulative frequency (or cumulative percentage)? Note: The ogive graph can provide answers for both less than and more than type of questions from the same graph. #### Example 2.5 Grocery Shoppers Survey - Analysis of Grocery Spend Refer to the numeric variable 'spend' (amount spent on groceries last month) in Table 2.1. 1. Compute the numeric frequency distribution and percentage frequency distribution for the amount spent on groceries last month by grocery shoppers. 2. Compute the cumulative frequency distribution and its graph, the ogive, for the amount spent on groceries last month. #### Management Questions * What percentage of shoppers spent less than R1 200 last month? * What percentage of shoppers spent R1 600 or more last month? * What percentage of shoppers spent between R800 and R1 600 last month? * What was the maximum amount spent last month by the 20% of shoppers who spent the least on groceries? Approximate your answer. * What is the approximate minimum amount spent on groceries last month by the top-spending 50% of shoppers? #### Solution 1. The numeric frequency distribution for amount spent is computed using the construction steps outlined earlier. The range is R2 136-R456 = R1 680. Choosing five intervals, the interval width can bé set to a 'neat' width of R400 (based on R1 680 / 5 = R336). The lower limit of the first interval is set at a 'neat' limit of R400, since the minimum amount spent is R456. The numeric and percentage frequency distributions are both shown in Table 2.8. #### Table 2.8 Numeric (and percentage) frequency distributions - grocery spend | Grocery spend (R) | Count | Percentage | |---|---|---| | 400-<800 | 7 | 23.3% | | 800-<1200| 14| 46.7% | | 1200-1600 | 5 | 16.7% | | 1600-2000 | 3 | 10.0% | | 2000-2400 | 1 | 3.3% | | Total | 30 | 100% | 2. The cumulative frequency distribution (ogive) for amount spent on groceries last month is computed using the construction guidelines outlined above for the ogive. Based on the numeric frequency distribution in Table 2.8, with R400 being the minimum grocery spend, the following cumulative counts are derived: * 7 shoppers spent up to R800 * 21 (= 7 +14) shoppers spent up to R1 200 * 26 (= 21 + 5) shoppers spent up to R1 600 * 29 (= 26 + 3) shoppers spent up to R2 000 * all 30 shoppers (= 29 + 1) spent no more than R2 400 on groceries last month. #### Table 2.9 Cumulative frequency distributions (count and percentage) – grocery spend | Numeric frequency distribution | Cumulative distribution | |---|---| | Grocery spend (R) | Count | Percentage | Count | Percentage | | 400-<800 | 7 | 23.3% | 7 | 23.3% | | 800-1200 | 14 | 46.7% | 21 | 70.0% | | 1200-1600 | 5 | 16.7% | 26 | 86.7% | | 1600-<2000 | 3 | 10.0% | 29 | 96.7% | | 2000-2400 | 1 | 3.3% | 30 | 100.0% | | Total | 30 | | | 100% | Figure 2.7 shows the percentage ogive graph. Note that the % cumulative frequency is 0% at R400 (the lower limit of the first interval) and 100

Applied Business Statistics PDF

Document Details

Tags

Related

Summary

Full Transcript