STA111 Descriptive Statistics Lecture Notes PDF

Summary

These notes cover descriptive statistics, focusing on the presentation of data using tables, including frequency tables for grouped and ungrouped data. The material discusses data description and organization, with an example using household weights.

Full Transcript

STA111: Descriptive Statistics Topic: Presentation of data: Tables Frequency Tables (grouped and ungrouped data) By Dr. S.A. Aderoju DATA DESCRIPTION One principal aim of any statistical enquiry is to be able to understand...

STA111: Descriptive Statistics Topic: Presentation of data: Tables Frequency Tables (grouped and ungrouped data) By Dr. S.A. Aderoju DATA DESCRIPTION One principal aim of any statistical enquiry is to be able to understand and describe the population of interest. For example, a farm survey is aimed at estimating current crop output and evaluating the impact of various government policies; a consumer survey will be interested in assessing how much of its product is being consumed and what is the chance of increasing production if some action is taken. A researcher may be interested in studying the number of infants with high level of Bilirubin in a specific geographic area over the past several years, he or she must gather the data from various doctors, hospitals, or health departments Thus, the first task of a statistical student is that of organizing the data in the form that salient characteristics can be easily observed. Suppose in your enumeration area, 35 farming households were sampled, and the weights of heads of households in kilograms (to nearest whole number) as obtained from the field are shown below; 70, 66, 60, 55, 61, 63, 72, 68, 60, 60, 63, 60, 75, 68, 59, 71, 53, 76, 64, 64, 52, 64, 64, 68, 64, 66, 67, 63, 64, 70, 69, 68, 63, 59, 57 These data are what we call raw data, that is, data as obtained from the field. With the data in this form, very little information can be obtained about the population. The first possible thing that we can do is to put the data in what we call an array. An array is the arrangement of the values in ascending or descending order of magnitude. For example, if we put the data in an ascending array we have the following results: 52 53 55 57 59 59 60 60 60 60 61 63 63 63 63 64 64 64 64 64 64 66 66 67 68 68 68 68 69 70 70 71 72 75 76 Frequency Distributions Tables A frequency distribution is the organization of raw data in table form, using classes and frequencies. Ungrouped Distribution Table 1: The frequency table of farmers having specific weights. Sample Weights Number of farmers No having such weights 1 52 1 2 53 1 3 55 1 4 57 1 5 59 2 6 60 4 7 61 1 8 63 4 9 64 6 10 66 2 11 67 1 12 68 4 13 69 1 14 70 2 15 71 1 16 72 1 17 75 1 18 76 1 NB: This classification tells us more about the sample; for example, we could see that: (i). Most farmers have different weights. (ii). The most popular (or common) weight of household head is 64 kg. This is an example of ungrouped frequency distribution. The display is called a frequency table. Note that the total should be equal to the number of households. Definition FREQUENCY: The number of farmers having a certain weight is called its frequency. In general, the number of times a particular variable/individual occurs is called its frequency. This is represented by “f.” For example, the frequency of 67 is 1, that of 68 is 4, etc in the classification above. 2|Page Grouped Distribution One serious disadvantage of the classification above is that the table may be too long. Take an example when we consider the weights of a sample of 200 households. The analysis in the form of the preceding section becomes too cumbersome and uninformative. A more convenient way of summarizing a large mass of raw data is to group the observations/variables (in this case) weights into categories and find out how many household heads belong to each category, for example, how many household heads have weights? 52 kg – 53 kg 54 kg – 55 kg 56 kg – 57 kg etc. Each of these categories is called a class interval. A simple procedure we use is what we call Tally Score Method. This method consists of making a stroke in the proper class for each observation and summing these for each class to obtain the frequency. It is customary for convenience in counting to place each fifth stroke through the preceding four as shown below. Weight No of Class Boundaries Tally (kg) Farmers (f) 52 – 55 /// 3 51.5 – 55.5 56 – 59 /// 3 55.5 – 59.5 60 – 63 //// //// 9 59.5 – 63.5 64 – 67 //// //// 9 63.5 – 67.5 68 – 71 //// /// 8 67.5 – 71.5 72 – 75 // 2 71.5 – 75.5 76 – 79 / 1 75.5 – 79.5 Definitions Class Limits: These are the end numbers of each class e.g. 52 – 55, 56 – 60 etc. Lower class limits: This represents the smallest data value that can be included in the class. E.g. the lower class limit of the second class is 56. Upper class limits: This represents the largest data value that can be included in the class. E.g. the upper class limit of the sixth class is 75. Class Boundaries: These numbers are used to separate the classes so that there are no gaps in the frequency distribution. The gaps are due to the limits. It can be obtained as follows; Lower class limits – 0.5 e.g. 52 – 0.5 = 51.5 (Lower class boundary) Upper class limits + 0.5 e.g. 55 + 0.5 = 55.5 (Upper class boundary). Class size (width, i): This is the difference between the upper and lower class boundary, e.g. 76.5 – 72.5 = 4. 3|Page Class Mark (midpoint): This is the midpoint of the class interval and is defined as; 𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 + 𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡𝑠 52 + 55 = = 53.5 2 2 OR 𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 + 𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 51.5 + 55.5 = = 53.5 2 2 Constructing a Frequency Distribution There is no hard and fast rule for the construction of frequency distribution, but the following procedures may be followed: (i) Try to use equal class interval width. This is useful for comparative purposes and for easier calculations. (ii) The number of classes should not be too many or too few. A rough guideline for constructing k classes for a sample data is the integer value of k such that 𝑘 = 1 + 3.322 𝑙𝑜𝑔10 (𝑛) Where n is the sample size. This is known as the Sturge’s Rule. The class width (i) is computed as; (iii) 𝑅𝑎𝑛𝑔𝑒 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑖= = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑘 e.g. in the example above, 𝑘 = 1 + 3.322 𝑙𝑜𝑔10 35 = 1 + 3.322 × 1.544 = 6.13 ≈7 𝑅𝑎𝑛𝑔𝑒 76 − 52 𝑖= = 3.43 ≈ 4 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 7 Note that 6.13 and 3.43 were rounded up to 7 and 4 respectively to preserve all the information in the raw data. Other Forms of Frequency Distribution Relative Frequency We may be interested in the proportion of our sample or population that falls in a certain class. In this case, we make use of relative frequency. The result of dividing each class frequency by the total frequency of all classes and multiplying the result by 100 is the relative frequency. Weight (kg) No of Class Relative Frequency Tally Farmers (f) Boundaries (%) 3 52 – 55 /// 3 51.5 – 55.5 × 100 = 8.57 35 4|Page 3 56 – 59 /// 3 55.5 – 59.5 × 100 = 8.57 35 9 60 – 63 //// //// 9 59.5 – 63.5 × 100 = 25.71 35 9 64 – 67 //// //// 9 63.5 – 67.5 × 100 = 25.71 35 8 68 – 71 //// /// 8 67.5 – 71.5 × 100 = 22.86 35 2 72 – 75 // 2 71.5 – 75.5 × 100 = 5.71 35 1 76 – 79 / 1 75.5 – 79.5 × 100 = 2.86 35 Total 35 100 The relative frequency is mostly useful for easy comparison of two or more frequency distributions. A biological example for instance is the situation where we wish to compare the number of seeds germinating in two varieties of a plant. The example below relates the age distribution of pupils with respect to education in Gabon in 1962. One cannot compare these values straightaway because the population of the boys in the school is greater than that those of girls, so expectedly, the figures for boys will be greater than those for girls. However, to compare both results, we would need to convert both frequencies into relative frequencies. The relative frequency is very useful for an easy comparison of two or more frequency distributions. Frequency Relative Frequency Age Boys Girls Total Boys Girls Total 10 – 11 6 5 11 1.1 2.1 1.4 12 – 13 119 49 168 21.5 20.9 21.3 14 – 15 210 102 312 38 43.4 39.6 16 – 17 169 75 244 30.6 31.9 31 18 – 19 34 4 38 6.2 1.7 4.8 20 – 21 12 0 12 2.2 0 1.5 22 – 23 2 0 2 0.4 0 0.3 Total 552 235 787 100 100 100 The results from the above analysis suggest the following: a) Gabonese government should encourage more girls to school. b) The proportional distribution of ages by sex is close enough except for age group 14 - 15 (difference = 5.4 %) and 18 - 19 (difference = 4.5 %). c) More boys of older age stay at school. Cumulative Frequency Distributions Suppose for the data in above, we are interested in answering questions such as: 5|Page How many household heads weigh less than 53 kg? How many household heads weigh more than 52 kg? The answers to these and other similar questions are best answered through cumulative frequency distributions. No of Class Cumulative Weight Tally Farmers Boundaries Frequency (kg) (f) 52 – 55 /// 3 51.5 – 55.5 3 56 – 59 /// 3 55.5 – 59.5 6 60 – 63 //// //// 9 59.5 – 63.5 15 64 – 67 //// //// 9 63.5 – 67.5 24 68 – 71 //// /// 8 67.5 – 71.5 32 72 – 75 // 2 71.5 – 75.5 34 76 – 79 / 1 75.5 – 79.5 35 Total 35 To answer the questions using the cumulative frequency distribution: 1. How many household heads weigh less than 52 kg? ✓ The class interval 52 – 55 covers weights starting from 52 kg. ✓ The class boundary below 52 kg is up to 51.5, which falls below the first class interval. ✓ Since no interval strictly covers weights less than 52 kg, the answer is: 0 household heads weigh less than 52 kg. 2. How many household heads weigh more than 52 kg? ✓ The cumulative frequency at the end of the first class (52 – 55) is 3. ✓ This means 3 household heads weigh up to 55 kg. ✓ To find those who weigh more than 52 kg, we look at the total count: Total farmers = 35. Thus, the number of household heads weighing more than 52 kg is: 35 (total) - 0 (below 52 kg) = 35 household heads. Moreover, the following information can then be obtained; Number of farmers whose weights are less than 52 kg = 0 Number of farmers whose weights are less than 56 kg = 3 Number of farmers whose weights are less than 60 kg = 6 Number of farmers whose weights are less than 64 kg = 15 Number of farmers whose weights are less than 68 kg = 24 Number of farmers whose weights are less than 72 kg = 32 Number of farmers whose weights are less than 76 kg = 4 Number of farmers whose weights are less than 80 kg = 35 6|Page

Use Quizgecko on...
Browser
Browser