Descriptive Statistics Instructional Material PDF
Document Details
Uploaded by SensationalRhyme
Polytechnic University of the Philippines
2020
Katrina Elizon
Tags
Summary
This document is instructional material for a Descriptive Statistics course (STAT 20013) at the Polytechnic University of the Philippines, taught in 2020. It covers topics such as definitions and terminology, data collection, and descriptive measures. The material aims to provide students with a foundational understanding of statistical concepts.
Full Transcript
Instructional Materials in STAT 20013 DESCRIPTIVE STATISTICS For the sole noncommercial use of the Faculty of the Department of Mathematics and Statistics Polytechnic University of the Philippines 2020 Contributor:...
Instructional Materials in STAT 20013 DESCRIPTIVE STATISTICS For the sole noncommercial use of the Faculty of the Department of Mathematics and Statistics Polytechnic University of the Philippines 2020 Contributor: Elizon, Katrina Republic of the Philippines POLYTECHNIC UNIVERSITY OF THE PHILIPPINES COLLEGE OF SCIENCE Department of Mathematics and Statistics Course Title : DESCRIPTIVE STATISTICS Course Code : STAT 20013 Course Credit : 3 UNITS Pre-Requisite : Course Description : This course is all about basic statistics concepts, statistical measurement, levels of measurements, statistical notations, collection, organization and presentation of data, measures of central tendency, location, dispersion, skewness, kurtosis, boxplots and stem-and-leaf displays, histogram, normal distribution. T-distribution, and the central limit theorem. COURSE LEARNING PLAN Week Dates Topics and Subtopics Definitions and Terminology Week 1 9/14 – 9/20 Process of Statistics Qualitative and Quantitative Week 2 9/21 – 9/27 Discrete and Continuous Levels of Measurement Data Collection Week 3 9/28 – 10/4 Sources of Data Experimental and Observation study design Week 4 10/5 – 10/11 Determining the Sample size Basic Sampling Design Week 5 10/12 – 10/18 Sources of Error in Sampling Week 6 10/19 – 10/25 Textual and Tabular Presentation Week 7 10/26 – 10/31 Graphical Presentation Week 8 11/3 – 11/8 Measures of Central Tendency Week 9 11/9 – 11/15 Measures of Relative Position Week 10 11/16 – 11/22 Measures of Variation Shape of Distribution Week 11 11/23 – 11/27 Skewness and Kurtosis Normal Distribution Week 12 12/1-12/6 Areas Under Standard Normal Curve Week 13 12/7-12/13 Sampling Distribution of Sample Mean Central Limit Theorem Week 14 12/14-12/20 T - Distribution COURSE GRADING SYSTEM The final grade will be based on the weighted average of the student’s scores on each test assigned at the end of each lesson. The final SIS grade equivalent will be based on the following table according to the approved University Student Handbook. Class Standing (CS) = (((Weighted Average of all the Activities) x 50 )+ 50) Midterm and/or Final Exam (MFE) = (((Weighted Average of the Midterm and/or Final Tests) x 50)+50) Final Grade = (70% x CS) + (30% x MFE) Final Grade SIS Grade Description Equivalent 1.00 97.00-100 Excellent 1.25 94.00-96.99 Excellent 1.50 91.00-93.99 Very Good 1.75 88.00-90.99 Very Good 2.00 85.00-87.99 Good 2.25 82.00-84.99 Good 2.50 79.00-81.99 Satisfactory 2.75 77.00-78.99 Satisfactory 3.00 75.00-76.99 Passing 5.00 65.00-74.99 Failure INC Incomplete W Withdrawn Prepared by: Katrina D. Elizon Faculty Member, Department of Mathematics and Statistics College of Science Contents 1 Introduction to Statistical Concepts 1.1 Definitions and Terminology…………………………………………….. 1 1.2 Process of Statistics …………………………………………………….. 2 1.3 Qualitative and Quantitative Variables…………………………………. 4 1.4 Discrete and Continuous Variables ……………………………………. 5 1.5 Levels of Measurement………………………………………………….. 6 2 Data Collection and Basic Concepts in Sampling Design 2.1 Data Collection …...………………………………………………………. 8 2.2 Sources of Data …………….……………………………………………. 9 2.3 Methods of Collecting Primary and Secondary Data…………………. 10 2.4 Sample Size Determination…………..…………………………………. 11 2.5 Basic Sampling Design …………..……..………………………………. 14 2.6 Sources of Errors in Sampling…………..………………………………. 20 3 Organizing and Summarizing Data 3.1 Textual Presentation ……………………………………………………. 23 3.2 Tabular Presentation ……………………………………………………. 24 3.3 Graphical Presentation………………………………………………….. 35 4 Descriptive Measures 4.1 Measures of Central Tendency……….……………………………….... 49 4.2 Measures of Relative Position………………………………………..…. 57 4.3 Measures of Variation or Dispersion……………………………..…….. 62 4.4 Shape of Distribution ……………….………………………………….... 68 4.5 Karl Pearson’s Measure of Skewness……………………………..…… 69 4.6 Bowley’s Measure of Skewness………………………………………… 71 4.7 Kelly’s Measure of Skewness ……….…………….……………………. 73 4.8 Percentile Coefficient of Kurtosis……………………………………….. 75 4.9 Normal Distribution……………………………………………….………. 76 4.10 Areas Under a Standard Normal Curve…………………….………… 80 4.11 Sampling Distribution of Sample Mean…………………….…………. 86 4.12 Central Limit Theorem………………………………….…….………… 90 4.13 T – Distribution ……………………………….……………….………… 91 MODULE 1: DEFINITION OF STATISTICS INTRODUCTION TO THE Statistics plays a major role in many aspects of our lives. It is used in sports, for example, to help a general manager decide which player might be the STATISTICAL best fit for a team. It is used in politics to help candidates understand how the public feels about CONCEPTS various policies. And statistics is used in medicine to help determine the effectiveness of new drugs. Used a p p r o p r i a t e l y, s t a t i s t i c s c a n e n h a n c e o u r understanding of the world around us. Used Objectives: inappropriately, it can lend support to inaccurate After successful completion of this beliefs. Understanding statistical methods will provide you with the ability to analyze and critique module, you should be able to: studies and the opportunity to become an informed consumer of information. Understanding statistical Define statistics. methods will also enable you to distinguish solid analysis from bogus “facts.” Enumerate the importance and limitations of statistics Many people say that statistics is numbers. After all, we are bombarded by numbers that supposedly Explain the process of statistics represent how we feel and who we are. Certainly, statistics has a lot to do with numbers, but this Know the difference between definition is only partially correct. Statistics is also descriptive and inferential about where the numbers come from (that is, how statistics. they were obtained) and how closely the numbers reflect reality. Distinguish between qualitative Statistics is the science of collecting, organizing, and quantitative variables. summarizing, and analyzing information to draw conclusions or answer questions. In addition, Distinguish between discrete and statistics is about providing a measure of confidence continuous variables. in any conclusions. Determine the level of Let’s break this definition into four parts. The first measurement of a variable. part states that statistics involves the collection of information. The second refers to the organization and summarization of information. The third states that the information is analyzed to draw conclusions or answer specific questions. The fourth part states that results should be reported using some measure that represents how convinced we are that our conclusions reflect reality. Statistics is important because it enables 4. Statistics table may be misused. people to make decisions based on empirical evidence. 5. Statistics is only, one of the methods of studying a problem. Statistics provides us with tools needed to convert massive data into pertinent Definitions: information that can be used in decision Universe is the set of all entities under making. study. Statistics can provide us information that we A Population is the total or entire group of can use to make sensible decisions. individuals or observations from which What information is referred to in the information is desired by a researcher. Apart definition? from persons, a population may consist of mosquitoes, villages, institution, etc. The information referred to the definition is the data. According to the Merriam Webster An individual is a person or object that is a dictionary, data are “factual information used member of the population being studied. as a basis for reasoning, discussion, or A statistic is a numerical summary of a calculation”. sample. Data can be numerical, as in height, or Sample is the subset of the population. nonnumerical, as in gender. In either case, data describe characteristics of an individual. Descriptive statistics consist of organizing and summarizing data. Descriptive statistics Field of Statistics describe data through numerical summaries, A. Mathematical Statistics- The study and tables, and graphs. development of statistical theory and methods Inferential statistics uses methods that in the abstract. take a result from a sample, extend it to the B. Applied Statistics- The application of population, and measure the reliability of the statistical methods to solve real problems result. involving randomly generated data and the A parameter is a numerical summary of a development of new statistical methodology population motivated by real problems. Example branches of Applied Statistics: psychometric, Example: Consider the Scenario. econometrics, and biostatistics. You are walking down the street and notice Limitation of Statistics that a person walking in front of you drops Statistics is not suitable to the study of PHP100. Nobody seems to notice the PHP100 qualitative phenomenon. except you. Since you could keep the money without anyone knowing, would you keep the 2. Statistics does not study individuals. money or return it to the owner? 3. Statistical laws are not exact. Suppose you wanted to use this scenario as a account for the variability in our results. One gauge of the morality of students at your goal of inferential statistics is to use statistics school by determining the percent of students to estimate parameters. who would return the money. How might you do this? You could attempt to present the PROCESS OF STATISTICS scenario to every student at the school, but 1. Identify the research objective. this would be difficult or impossible if the student body is large. A second possibility is to A researcher must determine the question(s) present the scenario to 50 students and use he or she wants answered. The question(s) the results to make a statement about all the must clearly identify the population that is to be students at the school. studied. Identify the research objective. In the PHP100 study presented, the population 2. Collect the information needed to answer is all the students at the school. Each student the questions. is an individual. The sample is the 50 students selected to participate in the study. Conducting research on an entire population is often difficult and expensive, so we typically Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the would return the money to the owner. We could statistical process, because if the data are not present this result by saying that the percent of collected correctly, the conclusions drawn are students in the survey who would return the meaningless. Do not overlook the importance money to the owner is 78%. This is an of appropriate data collection. example of a descriptive statistic because it describes the results of the sample without Example: making any general conclusions about the population. So 78% is a statistic because it is a A research objective is presented. For each numerical summary based on a sample. research objective, identify the population and Descriptive statistics make it easier to get an sample in the study. overview of what the data are telling us. 1. The Philippine Mental Health Associations If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17 population, we are performing inferential years of age and live in Antipolo City and statistics. The generalization contains asked whether or not they had been uncertainty because a sample cannot tell us prescribed medications for any mental everything about a population. Therefore, disorders, such as depression or anxiety. inferential statistics includes a level of confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age that 78% of all students would return the who live in Antipolo City money, we might say that we are 95% confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of students would return the money. Notice how age who live in Antipolo City this inferential statement includes a level of confidence (measure of reliability) in our results. It also includes a range of values to 1. A farmer wanted to learn about the weight sample of 50 batteries. (Inferential of his soybean crop. He randomly sampled Statistics) 100 plants and weighted the soybeans on each plant. 3. Janine wants to determine the variability of her six exam scores in Algebra. Population: Entire soybean crop (Descriptive Statistics) Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the number of passengers traveling via their 3. Organize and summarize the information. ships next year using their data on the number of passengers in the past three Descriptive statistics allow the researcher to years. (Inferential Statistics) obtain an overview of the data and can help determine the type of statistical methods the 5. A politician wants to determine the total researcher should use. number of votes his rival obtained in the past election based on his copies of the 4. Draw conclusion from the information. tally sheet of electoral returns. In this step the information collected from the (Descriptive Statistics) sample is generalized to the population. DISTINCTION BETWEEN QUALITATIVE AND Inferential statistics uses methods that takes QUANTITATIVE VARIABLES results obtained from a sample, extends them to the population, and measures the reliability Variables are the characteristics of the of the result. individuals within the population. For example, recently my mother and I planted a tomato Take Note! plant in our backyard. We collected information If the entire population is studied, then about the tomatoes harvested from the plant. inferential statistics is not necessary, because The individuals we studied were the tomatoes. descriptive statistics will provide all the The variable that interested us was the weight information that we need regarding the of a tomato.My mom noted that the tomatoes population. had different weights even though they came from the same plant. She discovered that Example: variables such as weight may vary. For the following statements, decide whether it If variables did not vary, they would be belongs to the field of descriptive statistics or constants, and statistical inference would inferential statistics. not be necessary. Think about it this way: If each tomato had the same weight, then 1. A badminton player wants to know his knowing the weight of one tomato would allow average score for the past 10 games. us to determine the weights of all tomatoes. (Descriptive Statistics) However, the weights of the tomatoes vary. One goal of research is to learn the causes of 2. A car manufacturer wishes to estimate the the variability so that we can learn to grow average lifetime of batteries by testing a plants that yield the best tomatoes. It is helpful to divide variables into different possible values. If you count to get the types, as different statistical methods are value of a quantitative variable, it is applicable to each. The main division is into discrete. qualitative (or categorical) or quantitative (or numerical variables). 2. A continuous variable is a quantitative variable that has an infinite number of Variables can be classified into two groups: possible values that are not countable. If you measure to get the value of a 1. Qualitative variables (Categorical) is quantitative variable, it is continuous. variable that yields categorical responses. It is a word or a code that represents a Example: class or category. Determine whether the following quantitative 2. Quantitative variables (Numeric) takes variables are discrete or continuous. on numerical values representing an amount or quantity. 1. The number of heads obtained after flipping a coin five times. (Discrete) Example: 2. The number of cars that arrive at a Determine whether the following variables are McDonald’s drive-through between 12:00 qualitative or quantitative. P.M and 1:00 P.M. (Discrete) 1. Haircolor (Qualitative) 3. The distance of a 2005 Toyota Prius can travel in city conditions with a full tank of 2. Temperature (Quantitative) gas. (Continuous) 3. Stages of breast cancer (Qualitative) 4. Number of words correctly spelled. (Discrete) 4. Number of hamburger sold (Quantitative) 5. Time of a runner to finish one lap. 5. Number of children (Quantitative) (Continuous) 6. Zip code (Qualitative) LEVELS OF MEASUREMENT 7. Place of birth (Qualitative) 8. Degree of pain (Qualitative) DISTINCTION BETWEEN DISCRETE AND CONTINUOUS Quantitative variables may be further classified into: 1. A discrete variable is a quantitative variable that either a finite number of Levels of Measurement possible values or a countable number of It is important to know which type of scale is 3. Interval Level - This is a measurement level represented by your data since different not only classifies and orders the statistics are appropriate for different scales of measurements, but it also specifies that the measurement. A characteristic may be distances between each interval on the scale measured using nominal, ordinal, interval and are equivalent along the scale from low interval ration scales. to high interval. A value of zero does not mean the absence of the quantity. Arithmetic 1. Nominal Level - They are sometimes operations such as addition and subtraction called categorical scales or categorical can be performed on values of the variable. data. Such a scale classifies persons or objects into two or more categories. Example: Whatever the basis for classification, a person can only be in one category, and - Te m p e r a t u r e o n F a h r e n h e i t / C e l s i u s Thermometer members of a given category have a common set of characteristics. - Trait anxiety (e.g., high anxious vs. low anxious) Example: - IQ (e.g., high IQ vs. average IQ vs. low IQ) - Method of payment (cash, check, debit card, credit card) 4. Ratio Level - A ratio scale represents the highest, most precise, level of measurement. It - Type of school (public vs. private) has the properties of the interval level of - Eye Color (Blue, Green, Brown) measurement and the ratios of the values of the variable have meaning. A value of zero 2. Ordinal Level - This involves data that may means the absence of the quantity. Arithmetic be arranged in some order, but differences operations such as multiplication and division between data values either cannot be can be performed on the values of the determined or meaningless. An ordinal scale variable. not only classifies subjects but also ranks them in terms of the degree to which they possess a Example: characteristics of interest. In other words, an - Height and weight ordinal scale puts the subjects in order from highest to lowest, from most to least. Although - Time ordinal scales indicate that some subjects are higher, or lower than others, they do not - Time until death indicate how much higher or how much better. Operations that make sense for variables of different scales. Example: - Food Preferences - Stage of Disease - Social Economic Class (First, Middle, Lower) - Severity of Pain Both interval and ratio data involve B. ______________________________ measurement. Most data analysis techniques that apply to ratio data also apply to interval 2. Every year the PSA releases the Current data..Therefore, in most practical aspects, Population Report based on a survey of these types of data (interval and ratio) are 50,000 households. The goal of this report grouped under metric data. In some other is to learn the demographic characteristics, instances, these type of data are also known such as income, of all households within as numerical discrete and numerical the Philippines. continuous. A. ______________________________ Example: B. ______________________________ Categorize each of the following as nominal, ordinal, interval or ratio measurement. 3. Researchers want to determine whether or not higher folate intake is associated with a 1. Ranking of college athletic teams. lower risk of hypertension (high blood (Ordinal) pressure) in women (27 to 44 years of 2. Employee number. (Nominal) age). To make this determination, they look at 7373 cases of hypertension in these 3. Number of vehicles registered. (Ratio) women and find that those who consume at least 1000 micrograms per day of total 4. Brands of soft drinks. (Nominal) folate had a decreased risk of hypertension compared with those who consume less 5. Number of car passers along C5 on a than 200. given day. (Ratio) A. ______________________________ 6. Zip code (Nominal) B. ______________________________ 7. Degree of pain (Ordinal) II. Indicate whether the following statements ACTIVITIES/ASSESSMENTS: require the use of descriptive or inferential Read each item carefully. Write the answer statistics. on the yellow paper. Answers Only. ______________1. A teacher wants to know I. A research objective is presented. For the attitudes of all students towards abortion. each, identify the (A) population and (B) ______________2. A market analyst of a sales sample in the study. firm draws a chart showing the sales figures of 8. A polling organization contacts 2141 male a given product for the period 2006-2007. university graduates who have a white- ______________3. A forecaster predicts the collar job and asks whether or not they had results of an election using the number of received a raise at work during the past 4 votes cast in 15 out of 25 barangays. months. ______________4. Men are better in math A. ______________________________ than women. _____________5. Forty percent of the ______________10. Brands of soft drinks employees of an organization were recorded tardy for at least 15 working days. ______________11. Socioeconomic status ______________6. There are very few ______________12. Status Employment gender-related occupations. ______________13. Number of missing teeth ____________ 7. An account predicts ______________14. Number of vehicles accuracy rate of a client’s financial resources. registered ______________ 8. A quality control manager ______________15. Jersey Number wishes to check production output. ______________16. Number of employees ______________ 9. Records indicated that collecting retirement 75% of the faculty in the graduate school are benefits from GSIS doctoral degree holders. ______________17. Duration of a seizure ______________ 10. There is no relationship between educational qualification of parents ______________18. Cause of death and academic achievement of their children. ______________19. Dividends III. Identify the qualitative and quantitative variables and indicate the highest level of ______________20. Current assets list measurement required in each. If quantitative, classify whether discrete or ______________21. Number of heart attacks continuous. ______________22. Account receivable ______________1. Occupation ______________23. Clothing size ______________2. Number of government officials ______________24. Blood type ______________3. Favorite color ______________25. Ethnic group ______________4. Temperature in Celsius REFERENCES: degrees Statistics. Informed Decision using Data by ______________5. Type of school Michael Sullivan, III,. Fifth Edition ______________6. Volume of mineral water Sampling: Design and Analysis by Sharon L. sold daily Lhr. Second Edition ______________7. Employee number ______________8. Civil status ______________9. Equity accounts MODULE 2: DATA COLLECTION DATA COLLECTION Everybody collects, interprets and uses information, much of it in numerical or statistical forms in day-to- AND BASIC Concepts day life. It is a common practice that people receive large quantities of information everyday through conversations, televisions, computers, the radios, in Sampling DESIGN newspapers, posters, notices and instructions. It is just because there is so much information available that people need to be able to absorb, select and reject it. In everyday life, in business and industry, Objectives: certain statistical information is necessary and it is After successful completion of this independent to know where to find it how to collect it. module, you should be able to: Analysis of data can lead to powerful results. Data can be used to offset anecdotal claims, such as the Determine the sources of data suggestion that cellular telephones cause brain (primary and secondary data). cancer. Anecdotal means that the information being conveyed is based on casual observation, not Distinguish the different methods scientific research. Because data are powerful, they data collection under primary and can be dangerous when misused. The misuse of secondary data. data usually occurs when data are incorrectly obtained or analyzed. For example, radio or Determine the appropriate television talk shows regularly ask poll questions for sample size. which respondents must call in or use the Internet to supply their vote. Most likely, the individuals who are Differentiate various sampling going to call in are those who have a strong opinion techniques. about the topic. This group is not likely to be representative of people in general, so the results of Know the sources of errors in the poll are not meaningful. Whenever we look at sampling. data, we should be mindful of where the data come from. Even when data tell us that a relation exists, we need to investigate. For example, a study showed that breast-fed children have higher IQs than those who were not breast-fed. Does this study mean that a mother who breast-feeds her child will increase the child’s IQ? Not necessarily. It may be that some factor other than breast-feeding contributes to the IQ of the children. In this case, it turns out that mothers who breastfeed generally have higher IQs than those who do not. Therefore, it may be genetics that leads to the higher IQ, not breast-feeding. Data collection is the process of gathering 3. Determine the method to be used in data and measuring information on variables of gathering and define the comprehensive interest, in an established systematic fashion data collection points. that enables one to answer stated research questions, test hypotheses, and evaluate 4. Design data gathering forms to be used. outcomes. 5. Collect data. Without proper planning for data collection, a Choosing of Method of Data Collection number of problems can occur. If the data collection steps and processes are not Decision-makers need information that is properly planned, the research project can relevant, timely, accurate and usable. The cost ultimately end up with a data set that does not of obtaining, processing and analyzing these serve the purpose for which it was intended. data is high. The challenge is to find ways, For example, if more than one person is which lead to information that is cost-effective, involved in the data collection, but data relevant, timely and important for immediate collectors do not follow consistent data use. Some methods pay attention to timeliness collection practices, they can end up with data and reduction in cost. Others pay attention to with different units, collection processes, and accuracy and the strength of the method in variable names. using scientific. Consequences from Improperly Collected The statistical data may be classified under Data two categories, depending upon the sources. approaches: Primary Data and Secondary Inability to answer research questions Data. accurately. SOURCES OF DATA Inability to repeat and validate the study. Whether conducting research in the social Distorted findings resulting in wasted sciences, humanities arts, or natural sciences, resources. the ability to distinguish between primary and Misleading other researchers to pursue secondary sources is essential. fruitless avenues of investigation. Primary Sources - Provide a first-hand Compromising decisions for public policy. account of an event or time period and are considered to be authoritative. They Causing harm to human participants and represent original thinking, reports on animal subjects. discoveries or events, or they can share new information. Often these sources are created Steps in Data Gathering at the time the events occurred but they can also include sources that are created later. 1. Set the objectives for collecting data They are usually the first formal appearance 2. Determine the data needed based on the of original research. set objectives. Primary Data - are data documented by the agency may have been different from the primary source. The data collectors purpose of the user of these secondary data. documented the data themselves. Secondly, there may have been bias introduced, the size of the sample may have The first hand information obtained by the been inadequate, or there may have been investigator is more reliable and accurate since arithmetic or definition errors, hence, it is the investigator can extract the correct necessary to critically investigate the validity of information by removing doubts, if any, in the the secondary data. minds of the respondents regarding certain questions. High response rates might be The primary data can be collected by the obtained since the answers to various following five methods: questions are obtained on the spot. It permits 1. Direct personal interviews - The explanation of questions concerning difficult researcher has direct contact with the subject matter. interviewee. The researcher gathers Secondary Sources - offer an analysis, information by asking questions to the interpretation or a restatement of primary interviewee. sources and are considered to be 2. Indirect/Questionnaire Method - This persuasive. They often involve methods of data collection involve sourcing generalisation, synthesis, interpretation, and accessing existing data that were commentary or evaluation in an attempt to originally collected for the purpose of the study. convince the reader of the creator's argument. They often attempt to describe or Designing good “questioning tools” forms an explain primary sources. important and time consuming phase in the development of most research proposals. Secondary Data - are data documented by a Once the decision has been made to use secondary source. The data collectors had the these techniques, the following questions data documented by other sources. should be considered before designing our In secondary data, data are primary data for tools: the agency that collected them, and become secondary for someone else who uses these What exactly do we want to know, according to the objectives and variables we identified data for his own purposes. earlier? Is questioning the right technique to Secondary data are less expensive to collect obtain all answers, or do we need additional both in money and time. These data can also techniques, such as observations or be better utilized and sometimes the quality of analysis of records? such data may be better because these might have been collected by persons who were Of whom will we ask questions and what techniques will we use? Do we understand specially trained for that purpose. the topic sufficiently to design a On the other hand, such data must be used questionnaire, or do we need some loosely with great care, because such data may also structured interviews with key informants or be full of errors due to the fact that the purpose a focus group discussion first to orient of the collection of the data by the primary ourselves? Are our informants mainly literate or Example: illiterate? If illiterate, the use of self- administered questionnaires is not an - Can you describe exactly what the option. traditional birth attendant did when your labor started? How large is the sample that will be interviewed? Studies with many respondents - What do you think are the reasons for a high often use shorter, highly structured drop-out rate of village health committee questionnaires, whereas smaller studies members? allow more flexibility and may use A closed-ended question is a type of questionnaires with a number of open-ended question that includes a list of response questions. categories from which the respondent will Key Design Principles of a Good select his answer. It is useful if the range of Questionnaire possible responses is known. This type of question is usually appropriate for collecting 1. Keep the questionnaire as short as possible. objective data. 2. Decide on the type of questionnaire (Open Example: Ended or Closed Ended). Did you eat any of the following foods 3. Write the questions properly. yesterday? 4. Order the questions appropriately. Fish or meat Yes No 5. Avoid questions that prompt or motivate the Eggs. Yes No respondent to say what you would like to hear. Milk or cheese Yes No 6. Write an introductory letter or an introduction. Take Note! 7. Write special instructions for interviewers or Question wording and question order have a respondents. large effect on the responses obtained. 8. Translate the questions if necessary. Example: 9. Always test your questions before taking the Two surveys were taken in late 1993/early survey. (Pre-test) 1994 about Elvis Presley. An open-ended question is a type of question One survey asked: “In the past few years, that does not include response categories. The there have been a lot of rumors and stories respondent is not given any possible answers about whether Elvis Presley is really dead. to choose from. This type of question is usually How do you feel about this? Do you think there appropriate for collecting subjective data. It is any possibility that these rumors are true permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you in the respondent’s own words. think so?” Second survey asked: “A recent television - Unrealistic Controlled Environments show examined various theories about Elvis - Inability to Control for All Variables Presley’s death. Do you think it is possible that Elvis is alive or not?” 5. Observation is a technique that involves systematically selecting, watching and 8% of the respondents to the first question said recoding behaviors of people or other it is possible that Elvis is still alive and 16% of phenomena and aspects of the setting in which respondents to the second question said it is they occur, for the purpose of getting (gaining) possible that Elvis is still alive. specified information. It includes all methods 3. A focus group is a group interview of from simple visual observations to the use of approximately six to twelve people who share high level machines and measurements, similar characteristics or common interests. A sophisticated equipment or facilities such as: facilitator guides the group based on a - Radiographic predetermined set of topics. - biochemical 4. Experiment is a method of collecting data where there is direct human intervention on the - X-ray machines conditions that may affect the values of the - Microscope variable of interest. - Clinical examinations Bear in mind that the experimental method has several limitations that you should be aware of. - Microbiological examinations - Ethical, moral, and legal Concerns It gives relatively more accurate data on size can produce accuracy of results. behavior and activities but Investigators or Moreover, the results from the small sample observer’s own biases, prejudice, desires, and size will be questionable. A sample size that is etc. and needs more resources and skilled too large will result in wasting money and time human power during the use of high level because enough sample will normally give an machines. accurate result. The secondary data can be collected by the The sample size is typically denoted by n and following five methods: it is always a positive integer. No exact sample size can be mentioned here and it can vary in 1. Published report on newspaper and different research settings. However, all else periodicals. being equal, large sized sample leads to increased precision in estimates of various 2. Financial Data reported in annual reports. properties of the population. 3. Records maintained by the institution. Take Note! 4. Internal reports of the government - Representativeness, not size, is the more departments. important consideration. 5. Information from official publications. - Use no less than 30 subjects if possible. Take Note! - If you use complex statistics, you may need Always investigate the validity and reliability a minimum of 100 or more in your sample of the data by examining the collection (varies with method). method employed by your source. Do not use inappropriate data for your research. The choice of methods of data collection is largely based on the accuracy of the information they yield. SAMPLE SIZE “How many participants should be chosen for a survey”? One of the most frequent problems in Representative Sample statistical analysis is the determination of the appropriate sample size. One may ask why sample size is so important. The answer to this is that an appropriate sample size is required for validity. If the sample size it too small, it will not yield valid results. An appropriate sample Desired Confidence Z - Score Level 80% 1.28 85% 1.44 90% 1.65 95% 1.96 99% 2.58 3. Degree of Variability Depending upon the target population and attributes under consideration, the degree of variability varies considerably. The more Choosing of sample size depends on non- heterogeneous a population is, the larger the statistical considerations and statistical sample size is required to get an optimum level considerations. of precision. Methods in Determining the Sample Size Non-statistical considerations – It may include availability of resources, man power, Estimating the Mean or Average budget, ethics and sampling frame. The sample size required to estimate the Statistical considerations – It will include population mean µ to with a level of confidence the desired precision of the estimate. with specified margin of error e, given by 2 ( e ) Three criteria need to be specified to Zσ determine the appropriate sample size: n≥ 1. Level of Precision where: Also called sampling error, the level of precision, is the range in which the true value Z is the z-score corresponding to level of of the population is estimated to be. confidence. 2. Confidence Interval e is the level of precision. It is statistical measure of the number of times Take Note: out of 100 that results can be expected to be within a specified range. For example, a If When σ is unknown, it is common practice to confidence interval of 90% means that results conduct a preliminary survey to determine s of an action will probably meet expectations and use it as an estimate of σ or use results 90% of the time. from previous studies to obtain an estimate of σ. When using this approach, the size of the To find the right z – score to use, refer to the sample should be at least 30. The formula for table: the sample standard deviation s is which we know only after we have taken the ∑ (x − x̄)2 s= sample. n−1 There are two ways to solve this dilemma: Example: 1. We could determine a preliminary value for A soft drink machine is regulated so that the p based on a pilot study or an earlier study. amount of drink dispensed is approximately normally distributed with a standard deviation Example: equal to 0.5 ounce. Determine the sample size needed if we wish to be 95% confident that our If last month 37% of all voters thought that sample mean will be within 0.03 ounce from state taxes are too high, then it is likely that the the true mean. proportion with that opinion this month will not be dramatically different, and we would use the Solution: The z – score for confidence level value 0.37 for p in the formula. 95% in the z – table is 1.96. 2. Simply to replace p in the formula by 0.5. 2 1.96(0.5) ( 0.03 ) n≥ = 1067.11 When p = 0.5, the maximum value of p(1- p)=0.25. This is called the most conservative estimate, since it gives the We need a 1068 sample for our study. largest possible estimate of n. Estimating Proportion (Infinite The conservative formula using the strong law Population) of large number. The sample size required to obtain a 2 1 Z 4 (e) confidence interval for p with specified margin n≥ ≈ 385 of error e is given by 2 Where: (e) Z n≥ p(1 − p) Confidence level is 95%. Where: The level of precision is 0.05. Z is the z-score corresponding to level of Example: confidence. Suppose we are doing a study on the e is the level of precision. inhabitants of a large town, and want to find out how many households serve breakfast in P is population proportion. the mornings. We don’t have much information on the subject to begin with, so we’re going to There is a dilemma in this formula: assume that half of the families serve It dependents on breakfast: this gives us maximum variability. x p= So p = 0.5. We want 99% confidence and at N least 1% precision. Solution: The z – score for confidence level Where: 99% in the z – table is 2.58. no is Cochran’s sample size recommendation. 2 2.58 ( 0.01 ) n≥ 0.5(1 − 0.5) = 16,641 N is the population size. This is the link for online calculator of sample We need a 16,641 sample for our study. size: Slovin’s Formula https://select-statistics.co.uk/calculators/ sample-size-calculator-population-proportion/ Slovin’s formula is used to calculate the sample size n given the population size and https://www.calculator.net/sample-size- error. It is computed as calculator.html N n≥ 1 + Ne 2 Where: N is the total population. e is the level of precision. Example: A researcher plans to conduct a survey about food preference of BS Stat students. If the population of students is 1000, find the sample BASIC SAMPLING DESIGN size if the error is 5%. The goal in sampling is to obtain individuals for Solution: a study in such a way that accurate information 1000 about the population can be obtained. n≥ = 285.71 1 + 1000(0.05)2 Reason for Sampling The researcher need to survey 286 BS stat - Important that the individuals included in a students. sample represent a cross section of individuals in the population. Finite Population Correction - If sample is not representative it is biased. If the population is small then the sample size You cannot generalize to the population from can be reduced slightly your statistical data. n0 n≥ n −1 Some definitions are needed to make the 1+ o notion of a good sample more precise. N Definitions: - Deliberately or purposively selecting a “representative” sample. Observation unit - An object on which a Misspecifying the target population. measurement is taken. This is the basic unit Failing to include all of the target population of observation, sometimes called an element. in the sampling frame, called In studying human populations, observation undercoverage. units are often individuals. Including population units in the sampling frame that are not in the target population, Target population - The complete collection called overcoverage. of observations we want to study. - Having multiplicity of listings in the sampling Sampled population - The collection of all frame. possible observation units that might have Substituting a convenient member of a been chosen in a sample; the population population for a designated member who is from which the sample was taken. not readily available. Sample - A subset of a population. - Failing to obtain responses from all of the Sampling unit - A unit that can be selected chosen sample. (Nonresponse) for a sample. We may want to study - Allowing the sample to consist entirely of individuals, but do not have a list of all volunteers. individuals in the target population. Instead, households serve as the sampling units, and Advantage of Sampling Over Complete the observation units are the individuals Enumeration living in the households. - Less Labor Sampling frame - A list, map, or other specification of sampling units in the - Reduced Cost population from which a sample may be - Greater Speed selected. For a survey using in-person interviews, the sampling frame might be a list - Greater Scope of all street addresses. - Greater Efficiency and Accuracy Sampling technique/Sampling Strategies - - Convenience It is a plan you set forth to be sure that the sample you use in your research study - Ethical Considerations represents the population from which you Two Type of Samples drew your sample. 1. Probability Sample Sampling Bias - This involves problems in your sampling, which reveals that your - Samples are obtained using some objective sample is not representative of your chance mechanism, thus involving population. randomization. The following examples indicate some ways in which selection bias can occur: - They require the use of a complete listing of - Most basic method of drawing a probability the elements of the universe called the sample. sampling frame. - Assigns equal probabilities of selection to - The probabilities of selection are known. each possible sample. - They are generally referred to as random - Results to a simple random sample. samples. Advantage: It is very simple and easy to use. - They allow drawing of valid generalizations about the universe/population. Disadvantage: The sample chosen may be distributed over a wide geographic area. 2. Non - probability Sample When to use: This is preferable to use if the - Samples are obtained haphazardly, selected population is not widely spread geographically. purposively or are taken as volunteers. Also, this is more appropriate to use if the population is more or less homogenous with - The probabilities of selection are unknown. respect to the characteristics of the population. - They should not be used for statistical inference. Sampling Procedure - Identify the population. - Determine if population is accessible. - Select a sampling method. - Choose a sample that is representative of the population. - Ask the question, can I generalize to the Simple Random Sampling general population from the accessible population? Sampling technique can be grouped into how Systematic Random Sampling selections of items are made such as probability sampling and non-probability - It is obtained by selecting every kth sampling. individual from the population. Basic Sampling Technique of Probability - The first individual selected corresponds to a Sampling random number between 1 to k. Simple Random Sampling Obtaining a Systematic Random Sample When to use: This is advisable to us if the ordering of the population is essentially 1. Decide on a method of assigning a unique random and when stratification with numerous serial number, from 1 to N, to each one of data is used. the elements in the population. 2. Compute for the sampling interval N PopulationSize k= = n SampleSize 3. Select a number, from 1 to k, using a randomization mechanism. The element in the population assigned to this number is the first element of the sample. The other elements of the sample are those assigned to the numbers and so on until you get a sample of size. Systematic Random Sampling Example: Stratified Random Sampling We want to select a sample of 50 students - It is obtained by separating the population from 500 students under this method kth item into non-overlapping groups called strata and picked up from the sampling frame. and then obtaining a simple random sample Solution: from each stratum. 500 k= = 10 - The individuals within each stratum should 50 be homogeneous (or similar) in some way. We start to get a sample starting form i and for every kth unit subsequently. Suppose the Example: random number i is 6, then we select 15, 25, A sample of 50 students is to be drawn from a 35, 45,... population consisting of 500 students Advantage: Drawing of the sample is easy. It belonging to two institutions A and B. The is easy to administer in the field, and the number of students in the institution A is 200 sample is spread evenly over the population. and the institution B is 300. How will you draw the sample using proportional allocation? Disadvantage: May give poor precision when unsuspected periodicity is present in the population. When to use: This is advisable to us if the ordering of the population is essentially random and when stratification with numerous data is used. Solution: There are two strata in this case. Given: N1 = 200 N2 = 300 N = 500 n = 50 50 (N) ( 500 ) n n1 = N1 = 200 = 20 50 (N) ( 500 ) n n2 = N2 = 300 = 30 The sample sizes are 20 from A and 30 from B. Then the units from each institution are to be selected by simple random sampling. Advantage: Stratification of respondents is advantageous in terms of precision of the estimates of the characteristics of the population. Sampling designs may vary by stratum to adjust for the differences in the conditions across strata. It is easy to use as a random sampling design. Disadvantage: Values of the stratification variable may not be easily available for all units in the population especially if the characteristic of interest is homogeneous. It is possible that there are not representative in Stratified Random Sampling one or two strata. Also, transportation costs can be high if the population covers a wide geographic area. Cluster Sampling When to use: If the population is such that the - You take the sample from naturally occurring distribution of the characteristics of the groups in your population. respondents under consideration concentrated in small and spread segment of the population. - The clusters are constructed such that the Thus, this is preferred to use if precise sampling units are heterogeneous within the estimates are desired for stratified parts of the cluster and homogeneous among the population and if sampling problems differ in clusters. the various strata of the population. Obtaining a Cluster Sample When to use: If the population can be grouped into clusters where individual 1. Divide the population into non-overlapping population elements are known to be different clusters. with respect to the characteristics under study, this preferable to use. 2. Number the clusters in the population from 1 to N. 3. Select n distinct numbers from 1 to N using a randomization mechanism. The selected clusters are the clusters associated with the selected numbers. 4. The sample will consist of all the elements in the selected clusters. Example: A researcher wants to survey academic performance of high school students in Cluster Sampling MIMAROPA. Multi - Stage Sampling 1. He/She can divide the entire population into different clusters. - Selection of the sample is done in two or more steps or stages, with sampling units 2. Then the researcher selects a number of varying in each stage. clusters depending on his research through - The population is first divided into a number simple or systematic random sampling. of first-stage sampling units from which a 3. Then, from the selected clusters the sample is drawn. Smaller units, called the researcher can either include all the high secondary sampling units, comprising the school students as subject or he can select a selected first-stage units then serve as the number of subjects from each cluster through sampling units for the next stage. If needed simple or systematic random sampling. additional stages may be added until the units of observation for the survey are Advantage: There is no need to come out with clearly identified. The units comprising the a list of units in the population; all what is samples selected from the previous stage needed is simply a list of the clusters. It is also constitute the frame for the stages. less costly since the elements are physically closer together. Obtaining a Multi-Stage Sampling Disadvantage: In actual field applications, 1. Organize the sampling process into stages adjacent households tend to have more similar where the unit of analysis is systematically characteristics than households distantly apart. grouped. 2. Select a sampling technique for each 3. Systematically apply the sampling technique to each stage until the unit of analysis has been selected. Example: Suppose we wish to study the expenditure patterns of households in NCR. We can select a sample of households for this study using simple three-stage sampling. - First, divide into smaller cities/municipalities and a random sample of these cities/ municipalities is collected. Multi-Stage Sampling - Second, a random sample of smaller areas such as barangays is taken from within each of the cities/municipalities chosen in the first Basic Sampling Technique of Non- stage. Probability Sampling - Third, a random sample of even smaller Accidental Sampling - There is no system areas such as households is taken from of selection but only those whom the within each of the areas chosen in the researcher or interviewer meets by chance. second stage. Quota Sampling - There is specified Advantage: It is easier to generate adequate number of persons of certain types is sampling frames. Transportation costs are included in the sample. The researcher is greatly reduced since there is some form of aware of categories within the population clustering among the ultimate or final samples; and draws samples from each category. The i.e., they are in the sample lower-stage units. size of each categorical sample is proportional to the proportion of the Disadvantage: Its complexity in theory may be population that belongs in that category. difficult to apply in the field. Estimation procedures may be difficult for non-statisticians Convenience Sampling - It is a process of to follow. picking out people in the most convenient and fastest way to get reactions When to use: If no population list is available immediately. This method can be done by and if the population covers a wide area. telephone interview to get the immediate reactions of a certain group of sample for a Take Note! certain issue. Used probability sampling if the main objective Purposive Sampling - It is based on certain of the sample survey is making inferences criteria laid down by the researcher. People about the characteristics of the population who satisfy the criteria are interviewed. It is under study. used to determine the target population of those who will be taken for the study. Judgement Sampling - selects sample in ACTIVITIES/ASSESSMENTS: accordance with an expert’s judgment. I. Determine if the source would be a primary Cases wherein Non-Probability Sampling is or a secondary source. Useful ______________1. Government Records - Only few are willing to be interviewed ______________2. Dictionary - Extreme difficulties in locating or identifying subjects ______________3. Artifact - Probability sampling is more expensive to ______________4. A TV show explaining what implement happened in Philippines. - Cannot enumerate the population elements. ______________5. Autobiography about Rodrigo Duterte. Sources of Errors in Sampling ______________6. Enrile diary describing 1. Non-sampling Error what he thought about the world war II. - Errors that result from the survey process. ______________7. Audio and video - Any errors that cannot be attributed to the recordings sample-to-sample variability. ______________8. Speeches Sources of Non-Sampling Error ______________9. Newspaper 1. Non-responses ______________10. Review Articles 2. Interviewer Error II. Determine the sample size of the following 3. Misrepresented Answers problems. Show your solution. 4. Data entry errors 1. A dermatologist wishes to estimate the proportion of young adults who apply 5. Questionnaire Design sunscreen regularly before going out in the 6. Wording of Questions sun in the summer. Find the minimum sample size required to estimate the 7. Selection Bias proportion with precision of 3%, and 90% confidence. 2. Sampling Error 2. The administration at a college wishes to - Error that results from taking one sample estimate, the proportion of all its entering instead of examining the whole population. freshmen who graduate within four years, with 95% confidence. Estimate the - Error that results from using sampling to minimum size sample required. Assume estimate information regarding a population. 1. that the population standard deviation is σ completed and returned at the end of the = 1.3 and precision level is 0.05. program. 2. A government agency wishes to estimate ______________4. 24 Hour Fitness wants to the proportion of drivers aged 16–24 who administer a satisfaction survey to its current have been involved in a traffic accident in members. Using its membership roster, the the last year. It wishes to make the club randomly selects 40 club members and estimate to within 1% error and at 90% asks them about their level of satisfaction with confidence. Find the minimum sample size the club. required, using the information that several ______________5. A radio station asks its years ago the proportion was 0.12. listeners to call in their opinion regarding the use of U.S. forces in peacekeeping missions. 3. An internet service provider wishes to estimate, to within one percentage error, ______________6. A tax auditor selects every the current proportion of all email that is 1000th income tax return that is received. spam, with 85% confidence. Last year the ______________7. For a survey, a sample of proportion that was spam was 71%. municipalities was selected from every Estimate the minimum size sample province in the country and included all child required if the total email that is spam is laborers in the selected municipalities. 10,000. ______________8. To determine his DSL III. Determine the type of sampling. (ex. Internet connection speed, Shawn divides up Simple Random Sampling, Purposive the day into four parts: morning, midday, Sampling) evening, and late night. He then measures his Internet connection speed at 5 randomly ______________1. To determine customer selected times during each part of the day. opinion of its boarding policy, Southwest Airlines randomly selects 60 flights during a ______________9. A college official divides certain week and surveys all passengers on the student population into five classes: the flights. freshman, sophomore, junior, senior, and graduate student. The official takes a simple ______________2. A member of Congress random sample from each class and asks the wishes to determine her constituency’s opinion members opinions regarding student services. regarding estate taxes. She divides her ______________10. In the game of lotto, 6 constituency into three income classes: low- balls are selected from a container with 42 income households, middle-income balls. households, and upper-income households. She then takes a simple random sample of IV. Using proportional allocation, determine households from each income class. the sample size needed for every school. The total population of students is 10,679, ______________3. The presider of a guest- and the minimum sample is 2,450. lecture series at a university stands outside the auditorium before a lecture begins and hands every fifth person who arrives, beginning with the third, a speaker evaluation survey to be Population School Sample per School Antipolo National 3,360 High School Bagong Nayon National 2,540 High School Dela Paz National 2,122 High School Sta. Cruz National 1,290 High School Tubigan National 1,367 High School Total 10,679 REFERENCES: Statistics. Informed Decision using Data by Michael Sullivan, III,. Fifth Edition Sampling: Design and Analysis by Sharon L. Lhr. Second Edition http://www.economicsdiscussion.net/statistics/ sampling/advantages-of-sampling-over- completeenumeration-in-statistics/11980 h t t p : / / w w w. n a t c o 1. o r g / r e s e a r c h / fi l e s /SamplingStrategies.pdf https://data36.com/statistical-bias-types- explained/ DATA PRESENTATION MODULE 3: Organizing and Once data has been collected, it has to be classified and organised in such a way that it becomes easily readable and interpretable, that is, converted to summarising data information. Before the calculation of descriptive statistics, it is sometimes a good idea to present data as tables, charts, diagrams or graphs. Most people find “pictures” much more helpful than Objectives: ‘numbers’ in the sense that, in their opinion, they After successful completion of this present data more meaningfully. module, you should be able to: Data are usually collected in a raw format and thus the inherent information is difficult to understand. Understand various techniques Therefore, raw data need to be summarized, for presentation of data processed, and analyzed to usefully derive information from them. However, no matter how well Know the different parts of the manipulated, the information derived from the raw table. data should be presented in an effective format, otherwise, it would be a great loss for both authors Construct and interpret simple and readers. Planning how the data will be diagrams including bar graph, presented is essential before appropriately histogram, stem-and-leaf processing raw data. diagrams, pie charts, scatter Presentation of data refers to an exhibition or putting diagrams, box plot. up data in an attractive and useful manner such that it can be easily interpreted. Textual Presentation Differentiate between histograms and bar charts. The three main forms of presentation of data are: Textual Presentation Know when should a bar graph or a pie chart be used. Tabular Presentation Choose appropriate diagrams/ Graphical Presentation graphs to present a given set of Textual Presentation data. - All the data is presented in the form of text, Understand the advantage of phrases, or paragraphs. using textual, tabular and graphical presentation. - It involves enumerating important characteristics, emphasizing significant figures and identifying important features of data. - Text is the principal method for explaining findings, outlining trends, and providing contextual Tabular Presentation information. - It is a systematic and logical arrangement of Example: data in the form of Rows and Columns with respect to the characteristics of data. A researcher is asked to present the performance of a section in the statistics test. - A table is best suited for representing The following are the test scores: individual information and represents both quantitative and qualitative information. 34 42 20 50 17 9 34 43 50 18 Simple or One - Way Table 35 43 50 23 23 Example: 35 37 38 38 39 39 38 38 39 24 29 25 26 28 27 44 44 49 48 46 45 45 46 45 46 The data presented in textual form would be like this: In the statistics class of 40 students, 3 obtained the perfect score of 50. Sixteen students got a score 40 and above, while only Optionally, the table may also include totals or 3 got 19 and below. Generally, the students percentages. performed well in the test with 23 or 70% Compound Table getting a passing score of 38 and above. Advantage of Textual Presentation A compound table is just an extension of a simple in which there are more than one - The data would be more interpreted. variable distributed among its attributes (sub- variable). An attribute is just a quality, property - Can help in emphasizing some important or component of a variable according to which points in data. it can be differentiated with respect to other variables. - Small sets of data can be easily presented. We may refer to a compound table as a cross Take Note! tabulation or even to a contingency table - Keep your paragraphs simple and short. depending on the context in which it is used. - Always make sure that the readers are Example: provided with additional explanations about the relevance of the figures and its implications. Compound Table Preparing Tables facilitate interpretation of the data. For example, rows may stand for score of The making of a compact table itself an art. classes and columns for data related to sex This should contain all the information needed of students. In the process, there will be within the smallest possible space. What the many rows for scores classes but only two purpose of tabulation is and how the tabulated columns for male and female students. information is to be used are the main points to be kept in mind while preparing for a statistical Footnotes: Footnotes are given at the foot table. An ideal table should consist of the of the table for explanation of any fact or following main parts: information included in the table which needs some explanation. Thus, they are meant for Title: The title must tell as simply as possible explaining or providing further details about what is in the table. It should answer the the data that have not been covered in title, questions: captions and stubs. A footnote usually - Who? White females with breast cancer, applies to a specific cell(s) within the table black males with lung cancer. and a symbol, such as, ‘*’ or ‘#’, is used to key the cell to the footnote. If several - What are the data? Counts, percentage footnotes are required, it is better to use distributions, rates. small letters rather than numbers. Footnote - Where are the data from? One hospital, or numbers might be confused with the the entire population covered by your numbers within the table. registry. Sources of Data: We should also mention - When? A particular year, time period. the source of information from which data are taken. This may preferably include the name Boxhead: The boxhead contains the of the author, volume, page and the year of captions or column headings. The heading of publication. This should also state whether each column should contain as few words as the data contained in the table is of ‘primary possible, yet explain exactly what the data in or secondary’ nature. For xample: Axtell, the columns represent. L.M., Mire, A.J. & Myers, M.H.: Cancer Stub: The row captions are known as the Patient Survival, Report Number 5. DHEW stub. Items in the stub should be grouped to Publication No. (NIH) 77-992 Parts of the Table Construction of Data Tables Less work and less cost are required in the preparation. - The title should be in accordance with the objective of study Organize Quantitative Variable in Table - Comparison Classes are categories into which data are - Alternative location of stubs grouped. When a data set consists of a large number of different discrete data values or - Headings when a data set consists of continuous data, - Footnote we create classes by using intervals of numbers. - Size of columns - Use of abbreviations Make sure that the classes do not overlap. This is necessary to avoid confusion as to - Units measurement which class a data value belongs. Also, make Advantage of Tabular Presentation sure that the class widths are equal for all classes. More information may be presented. The class width is the difference between Exact values can be read from a table to consecutive lower class limits. retain precision. Flexibility is maintained without distortion of data. Determine the class width by computing: xmax − xmin cw = nc Where: cw is the class width nc is the number of classes Round this value up to a convenient number. One exception to the requirement of equal Take Note! class widths occurs in open-ended tables. A table is open ended if the first class has no Creating the classes for summarizing lower class limit or the last class has no upper continuous data is an art form. There is no class limit. such thing as the correct frequency distribution. However, there can be less desirable frequency distributions. The larger the class width, the fewer classes a frequency