Summary

This document is a module on data management, covering topics such as introduction to data management, measures of central tendency, measures of dispersion, and more. It also discusses primary and secondary data and different ways to present data. This module is suitable for undergraduate studies.

Full Transcript

Module 4 Data Management Data Management Introduction to Data Management Measures of Central Tendency Measures of Dispersion Measures of Relative Position Probabilities and Normal Distribution Simple Linear Regression and Correlation...

Module 4 Data Management Data Management Introduction to Data Management Measures of Central Tendency Measures of Dispersion Measures of Relative Position Probabilities and Normal Distribution Simple Linear Regression and Correlation 94 Learning Outcomes At the end of the module, the students will be able to: 1. Understand and be knowledgeable on the language used in statistics; 2. Interpret correctly and objectively statistical evidences through the gathered data, and make inferences out of it; 3. Convert and transform normally distributed data into standardized one; 4. Use and apply the concept of normal distribution in the fields of specializations; 5. Appreciate the value of statistical analysis, know the impact and apply it in your daily life; 6. Practice and display diligence, patience, honesty, accuracy and precision in solving statistical problems. 95 Lesson 1. Introduction to Data Management If we talk about data management, we deal with statistics. Statistics is an art and science of collection, organization, presentation, analysis and interpretation of data. Particularly in the field of medicine, agriculture, education, business, economics, politics and technology, the information provided that were translated as data give medical practitioners, educators, managers and decision makers a better understanding of the different environment where they are and enables them to make more informed, sound and better decisions. Statistics play a very vital role in our society today, especially this time of pandemic (COVID-19). All should be included, be counted and accountable for. No one should be left behind. Because of the usefulness of statistics in almost all fields of endeavor, some cautions should also be considered. Impressive figures can be blown out of proportions of their real or imagined importance. Unscrupulous minds with vested interests make improper or unethical use of different statistical methods. Questionable and even conflicting claims backed up with “statistics” can be accepted as true which leads one to believe that anything can be proven statistically. Moreover, faulty researchers maybe slanted to produce a particular outcome, that is, statistical analyses are chosen to produce such outcomes. Most importantly, for the above reasons, for the statistics users or the researchers that they clearly understand the statistical tools or techniques being used in their researches. Thus, in this module, careful attention will be given to the role of statistics as a tool in research. Science is based on the empirical method for making observations – for systemically obtaining information. It consists of methods for making observations. Observations are the empirical “stuff” of science. Statistics, as we have defined, is an art and science of collection, organization, presentation, analysis and interpretation of data. Statistics is a set of concepts, rules, and procedures that help us to collect, organize and present numerical information in the form of tables, graphs, and charts; understand and analyze statistical techniques underlying decisions that affect our lives and well-being; and interpret or make informed decisions. 96 Statistics is being divided into two (2) categories or branches called descriptive and inferential statistics. We can differentiate the two using the definition of statistics. COLLECTING ORGANIZING DATA DESCRIPTIVE STATISTICS PRESENTING ANALYSIS DATA INFERENTIAL STATISTICS INTERPRETING Since we talked about statistical inference, we should be very careful on every information we take and use. Many situations require information about large size or group of people. On top of that, we also have to consider the time, cost, and many more. Data can be collected from a small portion of the group. Population refers to the group of elements or set of individuals of interest in a particular study. The smaller group, sample, is a set of individuals selected from a population, usually intended to represent the population in a study. PARAMETER A parameter is a value, usually a numerical value that describes a population. It may be obtained from a single measurement, or it may be derived from a set of measurements from the population. (µ-population mean; δ- population standard deviation) STATISTIC A statistic is a value, usually a numerical value that describes a sample. It may be obtained from a single measurement, or it may be derived from a set of measurements from the sample. (Ẍ-sample mean; s-sample standard deviation) VARIABLE A variable is any information that differs from one member to another in a population or sample. It is a characteristic of interest for the elements. The weight (kg) in Table 1.1 served as the variable. 97 Table 1.1 Weights of Randomly selected Grade IV pupils in AES, 1st Quarter of 2020 Section Weights (Kg) IV - 1 50 41 36 34 54 60 51 37 IV - 2 22 39 42 42 45 38 38 40 IV - 3 38 28 32 44 42 47 37 28 IV - 4 27 27 40 41 39 32 36 24 IV - 5 40 39 33 33 27 30 31 45 Each weight of pupils included in the data set is called an element. An entity on which data are collected. Collected measurements on each variable for every element in a study provide the data. The set of measurements obtained for particular element is called observation. In Table 1.1, we see the different measurements for the first observations (IV-1) are 50, 41, 36, 34, 54, 60, 51, 37. For the second observations (IV-2) are 22, 39, 42, 42, 45, 38, 38, 40, and so on. A data set with 40 elements contains 40 observations. CONSTANT A constant is an information about the population or sample that is true to all members. The value of pi, temperature (Celsius to Fahrenheit and vice versa), number of days in a week, and different forms of measurements e.g. 12 inches = I foot, are some examples of constant. Data is a collection of facts, such as numbers, words, measurements, observations or just description of things. Data are classified into two categories: Qualitative and Quantitative data. 1. Qualitative Data Qualitative data describes qualities or characteristics. It is mostly non- numerical and descriptive in nature. It often but not always captures emotions, feeling and subjective perception of something. Qualitative method of research is characterized by the following: Contains open-ended questions which aims to address the ‘how’ and ‘why’ of an event and uses unstructured methods of data collection to fully explore the topic. Rely more heavily on interviews and there are more interactions between the researcher and the respondents. The findings cannot be generalized to any specific population but it can produce some evidences that can be used to seek general patterns in different studies but with different issue. 98 It can be collected through: In-depth interview Observation methods Document review Here are some examples: 1. color of hair, eyes and skin 2. home address and phone number 3. experiences of a person taken from diaries 2. Quantitative Data Quantitative data deals with things that are measurable and can be expressed in number and figures. It is usually expressed in numerical form and can be mathematically computed. Qualitative data can be collected using: Experiments/clinical trials Observing and recording well-defined objects such as number of cars which participated in a motorcade. Administering surveys with closed-ended questions. Paper-pencil questionnaires Example: 1. Number of siblings 2. Height and weight 3. Temperature in degree Celsius Quantitative data can either be: a. Discrete data – a data which cannot be broken down into smaller parts. This type of data consists of integers. The number of siblings (1, 2, 3, …) is an example. b. Continuous data – data that can be infinitely broken down into smaller parts or data which can take a decimal value. Examples are height and weight (1.37 meters and 72.6 kilograms) For example, if you would describe a house, your description can either be qualitative or quantitative. Here are some descriptions: 99 Qualitative Quantitative The house is located in Baguio City. The house is 8.5 meters high. The house is mostly made of cement. The house has 3 bedrooms. The color of the house is green. The house’s floor area is 125 square The door is made of oak tree. meters. Data Levels of Measurement 1. Nominal data This level of data is categorical in nature; none is greater than or less than the other, and it is not in any particular order. Also, the categories are exclusive and exhaustive, meaning, the response can neither be ‘both’ nor ‘neither’. Example: Sex (male or Female), civil status (married, divorced, separated, widow) 2. Ordinal data Ordinal data must also be exclusive and exhaustive, but the difference is that the responses are ranked or it has order. Here, you can say that one response is higher or better than the other. Example: Academic rank (Instructor, Professor), socioeconomic status (Rich, middle class, poor) 3. Interval Here, interval of equal length signifies equal differences in the data. Difference makes sense but ratios do not. An example is temperature, 30 oC is not twice as hot as 15oC. Also, the ‘true zero’ start point is not applicable. This means that zero does not signify the absence of the measurement. Zero degree Celsius does not mean that there is no temperature. Example: Temperature 4. Ratio At this level, both differences and ratios are meaningful. Example, 4 Liters of water is twice as much as 2 Liters of water. There also exists the ‘true zero’ start point in which zero means nothing or the absence of the measurement. Zero liter of water means there is no water. 100 Example: Weight, Height, Number of children Data Qualitative Quantitative Nominal Ordinal Ratio Interval Data can also be classified according to who collected the data. It can be a primary data or secondary data. Primary data – These are data which were collected first hand. It is more authentic, reliable and objective as compare to secondary data. Primary data can be obtained through experiments, surveys, questionnaires, interviews and observations. Secondary data – These data are collected from already published in any form. The review of literature of research is based on secondary sources. The importance of secondary data is when you do not need to go through the hassle of collecting data when it is already available and published. It will save time, effort and money in the part of the researcher. Secondary data can be collected from books, records, magazines, research articles, newspapers, biographies, databases, etc. Data Presentation 1. Textual Presentation In textual or descriptive presentation, the data are presented using texts or paragraphs. This is usually used when the number of data is not too large. For example: The population of Region I as of May 1, 2020 is 5,301,139 based on the 2020 Census of Population and Housing (2020 CPH). This accounts for about 4.86 percent of the Philippine population in 2020. The 2020 population of the region is higher by 275,011 from the population of 5.03 million in 2015, and 552,767 more than the population of 4.75 million in 2010. Moreover, it is higher 101 by 1,100,661 compared with the population of 4.20 million in 2000. (psa.gov.ph) 2. Tabular Presentation In tabular presentation, data are presented using tables to represent even a large number of data to make it engaging and easier to read. The data are arranged in rows (horizontal) and columns (vertical). Tabular presentation avoids unnecessary details and repetitions of data. It reveals patterns which cannot be seen when it is presented in textual form. In presenting data using a table, take note of the following: A table must have a table number and a title. Subtitles are properly mentioned in the column and row headers. Contents of the table are defined clearly. Units of measurement are clearly stated whenever necessary. Legends for symbols/short forms and sources are indicated in the footnote. The data are logically arranged in the table Here is an example of a table presenting the population of Region I for the year 2000-2020. Table 1. Total Population in Region I Census Year Census Reference Date Total Population 2000 May 1, 2000 4, 200, 478 2010 May 1, 2010 4, 747, 372 2015 August 1, 2015 5, 026, 128 2020 May 1, 2020 5, 301, 139 Source: Philippine Statistics Authority Table 2. Population of Region I per Province in Region I Province 2000 2010 2015 2020 Ilocos Norte 514,241 568,017 593,081 609,588 Ilocos Sur 594,206 658,587 689,668 706,009 La Union 657,945 741,906 786,653 822,352 Pangasinan 2,434,086 2,779,862 2,956,726 3,163,190 Total 4,200,478 4,748,372 5,026,128 5,301,139 Source: Philippine Statistics Authority 102 3. Diagrammatical or Graphical Presentation This type of presentation uses graphs or diagrams such as bar graph, pie graph, line graph and scatter diagram. Diagrams give a bird’s eye view of the data and can be easily understood just by looking at the graph. Some of the charts or graphs which are commonly used are the following: 1. Pie chart The following pie graph illustrates the population of Region I per province for the year 2020 using the data in table 2. Figure 1. Population of Region I by province for the year 2020 It can be seen in the graph that Pangasinan constitutes 60% of the total population of Region I. 2. Bar graph The following bar graph shows the comparison among the population of the provinces in Region I from 2000, 2010, 2015 and 2020 as seen in Figure 2. Figure 2. Population by province in Region I (Bar Graph) 103 The bar graph shows that Pangasinan dominates the population of Region I from year 2000 to 2020. The province with the least population is Ilocos Norte. 3. Column chart This example of column graph is similar to the given bar graph. Figure 3. Population by province in Region I (Column Graph) 4. Line graph The following example shows the comparison among the total population in Region I by province as shown in table 2. Region I is composed of four provinces namely Ilocos Norte, Ilocos Sur, La Union and Pangasinan. Figure 4. Population by Province in Region I 104 The line graph illustrates that the population of the provinces from Region I continuously increased from year 2000 to 2020. 5. Scatterplot The following scatter plot depicts the population of the Philippines from the year 1990 up to the present. Figure 5. Philippine Population from 1990 to 2021 The scatter plot shows an almost perfect linear relationship between the year and the population of the Philippines. Looking at the given examples, diagrams are mostly used as visual aids. It cannot be considered as alternatives for numerical data. Diagrams and graphs are not as accurate as tabular data. Only tabular data can be used for further analysis. 105 MODULE IV – Data Management Learning Activity 1 – Introduction to Data Management Name: ______________________________________________________ Course, Year and Section: _____________________________________ Classify the following data whether they are qualitative or quantitative and nominal, ordinal, ratio or interval. Type of Data Level of (Qualitative or Measurement Data Quantitative) (Nominal, Ordinal, Ratio, Interval) 1. Test questions classified as easy, average or difficult 2. Years of important historical events (e.g. 1941, 1980, 2000) 3. Flavor of ice cream 4. Age of students enrolled in GECC 103 5. Amount of money in your savings account 6. Religion 7. Contact Number 8. Home Address 9. Number of minutes allocated for reviewing before you sleep 10. IQ 106

Use Quizgecko on...
Browser
Browser