BIOEPI Prelims PDF
Document Details
Uploaded by Deleted User
University of Baguio
Tags
Related
- Biostatistics Midterm Notes PDF
- MBBS January 23: Introduction to Biostatistics Statistics in Practice (King's College London) PDF
- Biostatistics Notes PDF
- Fundamentals of Biostatistics (8th ed.) PDF
- Biostatistics: Introduction and Applications PDF
- BIO203 Biostatistics Lecture 2 (Descriptive Statistics) shf 2024 PDF
Summary
This document introduces biostatistics, defining key terms and concepts. It outlines branches of statistics, and the role of statistics within dentistry. It also classifies variables and defines constants, with examples. The document is a self-regulated learning module on these topics.
Full Transcript
LESSON 1 INTRODUCTION TO BIOSTATISTICS: BASIC CONCEPTS AND TERMINOLOGIES LEARNING OUTCOMES At the end of the lesson, you should be able to: Cognitive: 1. define terminologies...
LESSON 1 INTRODUCTION TO BIOSTATISTICS: BASIC CONCEPTS AND TERMINOLOGIES LEARNING OUTCOMES At the end of the lesson, you should be able to: Cognitive: 1. define terminologies and explain concepts of biostatistics; 2. describe the concepts of biostatistics as they relate to Dentistry; Affective: 3. explain the importance of familiarizing with terms used in scientific inquiry Psychomotor: 4. list and differentiate constant entities from variables and differentiate them according to their use in a study Statistics: Its Meaning and Role in Dentistry Meaning: the term “statistics” is defined as follows: - Manner of standing or position - Body of methods for the handling and analysis of data - Collection - Organization of numerical data - Analysis - Interpretation -Collection of powerful and efficient tools that can be used with available information in order to communicate and make decisions with precision and ease. - A mass of observations or a bulk of data A Self-regulated Learning Module 1 Role of Statistics in Dentistry - Guide in the critical evaluation of literature or written reports - Basis for decision making - Understand epidemiologic terms and applications - Ability to conduct or join scientific studies Biostatistics - The scientific discipline concerned with the application of statistical methods to problems in the health sciences. Two main branches of statistics 1. Descriptive statistics 2. Inferential statistics Descriptive statistics Involved in: - Summarizing data - Converting summarized data to useful information - The primary goal is to have a useful, clear, and informative description of a mass of numerical data. Inferential statistics Concerned with making estimates, predictions, generalizations and conclusions about a population based on information from a sample. 2 aspects involved: 1. estimation 2. hypothesis testing Estimation - Predicting the value of a certain occurrence or the change in measurement from the start of a study to its endpoint. Hypothesis testing - Comprises those procedures by which it is decided to reject or not to reject an assumption or hypothesis about a certain population. Phenomenon of Variation A Self-regulated Learning Module 2 Tendency of a count or a measurement to change from an individual to another from one instant of time to another instant in the same individual. Constant A measure or a characteristic whose value remains the same and is true for everybody. Variable A measure or a characteristic whose outcome cannot be predicted with certainty. Classification of Variables: According to the nature and type of measurement: 1. Qualitative Variables 2. Quantitative Variables 2.1. Discrete 2.2. Continuous Qualitative Variable - Categories simply distinguish one group from another. There is no reason for saying that one is greater or less, higher or lower, or better or worse than the other. e.g.: sex/gender (male or female) Quantitative Variable – Categories can be ordered according to quantity or amount, or whose values can be expressed numerically. Values of quantitative variables are usually derived from measurements. Discrete quantitative variable – a variable which can assume only integral values (whole numbers) and not fractions of integers (e.g.: number of teeth in the oral cavity). Continuous quantitative variable – a variable which can assume any value (both whole numbers and decimals) e.g.: cervico-incisal length of primary cuspids in millimeters, volume of saliva in cubic centimeters. According to their interrelationship with each other in a study: Independent variable – an entity that is presumed to cause, effect or influence the outcome. Dependent variable – refers to the output, outcome, result or response variable. A Self-regulated Learning Module 3 Control variables – variables which in themselves may produce changes that may be mistaken to be the effect of the independent variable. Hence, they need to be controlled, held constant, or randomized to neutralize effects. Example: in a study objective: “To compare the apical leakage and technical quality of root canal fillings in teeth obturated by lateral condensation of gutta percha to teeth obturated with thermafil obturators” Independent variable – method of condensation Dependent variable – amount of apical leakage Control variable – pressure Note: Review your notes from your previous research class. Measurement Scales The scale of measurement and how variables will be transformed into numerical value are important for purposes of analysis and summarization of data. The following are the types of measurement scale: Scale of Description Example Measurement Nominal Categorize subjects into one of a number of classes sex Ordinal Similar to nominal and in addition, classes can be socio-economic ranked status Interval The exact distance between two categories can be temperature determined but the zero point is arbitrary Ratio Similar to interval but the zero point is fixed cervico-incisal length Activity Do further readings about the topics/concepts given. Refer to the list of references in the syllabus and other reliable online sources. Identify examples or situations to further describe the terms/concepts. A Self-regulated Learning Module 4 ASSESSMENT To be done during face to face session. A Self-regulated Learning Module 5 LESSON 2 DATA COLLECTION LEARNING OUTCOMES At the end of the lesson, you should be able to: Cognitive: 1. Describe two main categories of data 2. Specify where to get data 3. Enumerate and explain the different methods of data collection Affective: 4. Understand why there is a need to collect data Psychomotor: 5. Properly collect data using the different methods of data collection Data Data (singular) refers to the measurement/s of the response variables with one element of a population or sample. Example: Freeway space – 8 mm. Intercanine dimension – 32 mm. Data (Plural) refers to the set of measurements collected for the response variable from each of the elements belonging to the sample. Example: DMFT (decayed, missing, and filled teeth) rates of 42 subjects Why collect data? - Reliable and systematic decision making - Helps in planning and analysis - Helps in prioritizing groups - Proper allocation of resources - Guide in implementation and monitoring A Self-regulated Learning Module 1 What type of data do we need to collect? Where? The answer to this question is largely dependent on the objectives and type of study. 2 main categories of data (according to source): A) Primary data - Collected first hand by the investigator - Originally collected and used by person or agency that collected the data B) Secondary data - Derived data - Already in existence (previously collected) - Need to be checked for accuracy and validity *Note: Do your further readings and identify as many examples of each category or data. Guidelines in checking quality and utility of secondary data: A) Under what conditions were the data collected and for what purpose/s? B) How reliable were the collectors or informants? C) Were the data gathered on the basis of first hand knowledge or from hearsay? D) Are the data suitable for answering the questions under current investigation? Desired characteristics of statistical data: A) timeliness (updated) B) completeness C) accuracy (closeness of measurement to its true value) D) precision (repeatability, stability, consistency) Methods of Data Collection A. Methods of collecting Primary Data: - Questionnaire - Interview schedule/guide - Observation - Experimental method A Self-regulated Learning Module 2 Questionnaire: - Self-administered or completed by the respondent - All instructions contained in it are meant for the respondent - Questions are usually standardized and tested in a similar group to determine if expected answers will be elicited from the respondents. Advantages of the Questionnaire: - Wide coverage – it can be administered simultaneously - Economical – less time and personnel investment - Respondent’s anonymity can be maintained - Less pressure on the part of the respondent to answer - Particularly useful when questions are of fixed alternative type (i.e., where response categories follow the questions; respondent selects categories appropriate to his/her experience). Limitations of the Questionnaire: - The respondent might not answer the questions himself or herself - Low response rate (10%) of targeted population; hence, follow-ups are necessary - Questions may be misinterpreted by the respondent, it is important that respondents are reasonably literate - It is difficult to check the reliability of responses - Needs a long period of study as the return of responses could be slow - If respondents don’t like the questions or feel antagonized, they just leave them blank Interview Schedule: Involves a personal conference between a respondent and an interviewer using a similarly worded list of questions to be addressed by an interviewer to help aid the respondent in answering questions. The interview schedule is used by the interviewer to establish the sequence of survey questions asked of the respondent. It may contain prompting information, skip instructions or other information of use to the interviewer but not read to the respondent. Advantages of Interview Schedule: - It can be applied to different types of respondents (including the illiterate) - Higher response rate (as it is easier to convince the cooperation of respondents) - There is a chance to re-phrase difficult questions - Interviewer can observe the social setting of the interview A Self-regulated Learning Module 3 - It is more powerful when generating responses to open-ended questions; follow-up questions are possible An interview schedule may not be preferred for the following reasons: - More costly - Human factors may distort the returns—answers to questions will be highly dependent on the interviewer’s interpretation - Requires proper training and supervision of interviewers - Time of field work have to be adjusted to maximize chances of catching desired respondent Observation: - A classic form of data collection - Used when informants are unable to provide information (e.g. babies or experimental animals) - A record form is used which is more concise than a questionnaire or an interview schedule - In general, only the labels or names of the variables being asked are indicated in the form instead of having a list of questions to be answered Advantages of Observation: - First hand account of behavior; hence, less distortion on recall - Appropriate for investigating areas of resistance to interviews Limitations of Observation: - Limited to the time of observation - Fails to capture inner (or more in-depth) feelings Experimental Method: - Induce deliberately and artificially the occurrence of the event. - All factors are equalized and controlled except for that which we are studying. - It is said to be the most powerful method of data collection because by varying conditions, we may obtain many and different answers. - Lesser number of subjects are needed and results are obtained within a shorter period of time. B. Methods of collecting Secondary Data: - Documentary sources - Statistical records A Self-regulated Learning Module 4 - Personal documents Over-all advantages of collecting secondary data: - Economical - Can be retrieved anytime at the convenience of the researcher Disadvantage of collecting secondary data: - Limited according to the method and focus of the researcher Documentary Sources: - Use of records and reports e.g., birth registries, school records, dental records - Necessitates the permission of record keepers of administrators Statistical Records: - Information that is collected repeatedly throughout time - Enables conduct of trend/ longitudinal studies - These sources do not require the cooperation from individuals who are the primary sources of information Personal Documents: - Diaries, biographies, letters - Provides richness and details not achievable by standardized methods - It gives insight into the personal characteristics, experiences and beliefs of the respondent - Disadvantages: a) Tends to be biased; limited within the range of interest of the reporter b) Unrepresentative of the target population and limited only to those who are articulate c) Data may not be accessible *To help us decide which type of data collection method is more appropriate for a study, the following questions should be answered: 1. How much time have I got? 2. How extensive are my human and non-human resources (including funding)? 3. How soon do I need to produce the results of the study? Whether we are preparing to do an interview or bound to use a questionnaire, we can make use of the following guidelines in the development and formatting of questions: A Self-regulated Learning Module 5 1. Remember that the aim of inquiry design is to obtain complete and accurate information which is relevant to the survey’s objectives. 2. Remember that the respondent is doing you a favor by participating in the survey. Do not exploit these good intentions. 3. Justify the relevance of each question placed in the list. Avoid extraneous or irrelevant questions and make sure that each question has some bearing on some specified or planned part of the analysis. 4. Be sensitive to concerns respondents may have regarding their privacy. 5. Think as a respondent when developing the questions (That is what we call “empathy”!). Guidelines on Question Wording: 1. Avoid questions which will require respondents to recall events or facts that occurred sometime in the past. e.g. “How many times during the last five years did you experience bleeding of the gums?” 2. Use simple, generally familiar word which respondents might use in daily conversation. Avoid technical jargon, formal language or colloquial terms. e.g. “Have you ever used a Swartz Appliance in your mouth?” 3. Avoid ambiguous questions and inadequate frame of reference. e.g. “Are you generally comfortable with your dentist?” 4. Avoid needless elaboration. 5. Avoid double-barelled questions. e.g. “Is your tooth painful when you drink cold beverages and is it a dull or a stabbing kind of pain?” 6. Avoid “leading” questions which push the respondent towards a certain answer. e.g. “Don’t you agree that that an hour spent in a movie is more enjoyable than an hour in the dental office?” 7. Avoid emotionally charged words in questions which might arouse positive or negative feelings which might overshadow the specific content of the question. e.g. traumatic extraction, lousy operator, subversive 8. Avoid threats or appeals to the respondent’s “self-esteem”. e.g. “Do you have a sexually transmitted disease?” 9. Avoid personalized questions that tend to find out what the respondent thinks and not what he feels best for society (as in impersonalized questions). e.g. “Do you think it is desirable for the dentists to increase dental fees?” Guidelines in Formatting of Questions: 1. When producing the sequence of questions, start with questions which are easy to administer and answer. The first questions should be an attempt to: A Self-regulated Learning Module 6 a. Create interest and motivation b. Build respondent’s confidence in the survey 2. Questions should be grouped according to subject areas to avoid an unnatural flow. 3. Respondents should be eased into embarrassing or sensitive questions by a series of lead-in questions. In summary, a good questionnaire/schedule is: 1. Easy to understand – emphasize words or phrases by underlining them or printing in italics. USE CAPITAL LETTERS FOR INSTRUCTION TO INTERVIEWERS. 2. Easy to fill-out – there should be a minimum writing by the interviewer - there should be enough space to record answers 3. Easy to follow – adequate instructions - sequential numbering and indentation Note: Consider data processing when developing the questionnaire’s format and listing of response options to close response questions. Open-ended questions that require indefinite answers are more difficult to analyze later. Activity Review your lecture notes from your previous research subject regarding data gathering. Do further readings. Use the list of references in the syllabus, other reliable sources online, and also utilize the UB Library online and face to face services. ASSESSMENT To be given during the face to face session. *Further assessment on the attainment of the objectives of the lesson is done in the worksheets. A Self-regulated Learning Module 7 LESSON 3 BASIC SAMPLING DESIGNS ly LEARNING OUTCOMES On At the end of the lesson, you should be able to: 1. Define sampling and specify its advantages when doing research 2. Distinguish non-probability from probability sampling designs 3. Discuss the procedures for doing the different probability sampling designs se 4. Follow the proper procedures in sampling “Limitations in both time and funding necessitate that a researcher study only a segment of yU the intended population for which the results of the study will be applied.” In generating new ideas or concepts, it would be ideal to get measurements or observations from all elements in the population if it is at all possible. However, limitations in both time and r funding necessitate that a researcher study only a segment of the intended population for which the tist results of the study will be applied. In dental public health, sampling is done in the hope of generalizing for a bigger population and thereby answer questions or solve problems whether basic or applied. en Sampling BD It is the act of drawing or selecting a part of the population through some acceptable methods. Intention: rU Whatever findings we get from the sample, we will generalize for the total population. Sampling is used in everyday life: Fo In the workplace In school In our dealings with other people A Self-regulated Learning Module 1 Advantages of Sampling - Lower cost - Shorter time - Better quality of information collected - More comprehensive data may be obtained - It is the only method when the procedure to be used is destructive/invasive ly Definition of Terms On Population – it refers to the entire group of individuals or items of interest in a study. Types of Population se Target population – the group from which representative information is desired and for which results of a study will be applied taken r Sample – subset or subgroup of the population yU Sampling population – the population from which a sample will actually be drawn or tist Elementary unit – an object or a person on which a measurement is actually taken or an observation is made en Sampling unit – non-overlapping collection of elements or elementary units BD Sampling frame – collection of all sampling or elementary units from which a sample is drawn Sampling error – difference between the population estimates from different sample and the rU population Conditions where sampling may be inadvisable Fo The relevant information for every member of the population may be required for non-statistical purposes (e.g. birth and death certificates, medical records) Data may be required for certain subdivisions of the whole population containing few individuals. Sampling may then fail to provide precise information. (e.g. factors affecting rare diseases can be adequately determined only by studying all known cases) A Self-regulated Learning Module 2 There are occasions when sampling in a community will create a feeling of discrimination. (e.g. when sampled individuals are given special health services like X-ray, food assistance or fluoridation). Criteria for a good sampling design ly 1. The sample to be obtained should be a representative of the population. On 2. The sample should be adequate such that reliable generalizations about the population can be made. se 3. Feasibility and practicality of the sampling procedure. 4. Economy and efficiency of the sampling design. Types of Sampling Designs Non-probability sampling designs r yU tist - the probabilities of selection are not specified for the individual units of the population. - there is no objective way of assessing the reliability of the sample units. en Probability sampling designs - each unit of the population is known; it specifies rules and procedures for both the sample BD selection and estimation. -non-zero probability of being included in the sample; equal chance of being included - it specifies rules and procedures for both the sample selection and estimation. rU Non-probability sampling designs: 1. Haphazard or accidental sampling Fo 2. Judgement or purposive sampling 3. Quota sampling Haphazard or accidental sampling -asking volunteers or whatever items come at hand are used as samples A Self-regulated Learning Module 3 Judgement or purposive sampling -a “representative” sample of the population is selected in accordance with an expert’s subjective judgement Quota sampling -”quotas” are set up for different strata of the population and the field workers are instructed to keep picking items or respondents until quotas are filled ly Probability Sampling Designs On 1. Simple random sampling 2. Systematic sampling 3. Stratified random sampling 4. Cluster sampling se 5. Multi-stage sampling Simple Random Sampling -every element has an equal chance of selection Systematic Sampling r yU e.g.: table of random numbers, lottery or fishbowl technique -the first sample is obtained by randomly selecting one element from the first “K” elements in tist the frame. Every kth element thereafter is included in the sample Stratified Random Sampling en -the population is divided into non-overlapping groups called strata (e.g. section, zone, gender, year level, occupation) -a simple random sample is then drawn from each stratum BD -reasons for stratification: -to spread out the sample over the field of survey -to provide the reasonably accurate estimates for the various subsections of the population rU -for physical or administrative purposes -to increase precision (over that of a simple random sample of the same size) for overall estimates of the population under study Fo Cluster Sampling -the population is first subdivided into “A” sample units and a sample of such units is then selected. -every element in each sampling unit is included in the study Multi-stage Sampling -the population is first divided into a set of primary or first stage sampling units. A Self-regulated Learning Module 4 -a sample of each unit is selected. -each primary unit selected is further subdivided into secondary or second stage sampling units from which a sample is to be taken. -the procedure goes on until the desired stage is reached. Activity The “activity” serves as guide in studying sampling and is not to be submitted. This activity ly will help you in accomplishing the worksheets. On Using any reliable reference in biostatistics (online or offline) and the list of references in the syllabus: 1. Study the use of the table of random numbers (e.g.: how to determine the starting point, se how to draw samples using the numbers based on the starting point, etc.). Practice picking household numbers using the given table of random numbers below. r yU tist en BD rU 2. Find and study the formula on how to determine the “Kth” or “Nth” element for systematic sampling. Practice getting samples from a sampling frame (e.g. class list). 3. Do further readings and study each of the different sampling designs/methods and Fo identify examples/situations for each including the advantages and disadvantages of each method. A Self-regulated Learning Module 5 ASSESSMENT ly (To be submitted individually) On Define/Describe the following terms: (1 point each) 1. Sampling “with” replacement se 2. Sampling “without” replacement 3. Sampling error 4. Sampling bias 5. Snowball method of sampling 7. Population parameters 8. Fishbowl method of sampling r 9. Random sampling yU 6. Power analysis for determining acceptable sample size tist 10. Non-random sampling *Further assessment will be done through the worksheets. en BD rU Fo A Self-regulated Learning Module 6 LESSON 4 DESCRIPTIVE STATISTICS: TYPES/METHODS OF PRESENTATION OF DATA LEARNING OBJECTIVES At the end of the lesson, you should be able to: 1. Define and explain the various aspects of descriptive statistics 2. Enumerate and discuss the four types of data presentation 3. Identify and illustrate the parts of a statistical table and describe their uses 4. Enumerate the different types of graphs 5. Choose the appropriate graph for presenting a given set of data Descriptive Statistics Transforming data into useful information is what descriptive statistics is all about. There are four main types of summarizing and/or presenting data which could be used singly or in combination: 1. Textual 2. Tabular 3. Graphical 4. Numerical 4.1. Measures of Central Tendency 4.2. Measures of Dispersion and Location Non-numerical Presentation of Data: Textual Presentation – it requires one to summarize data in the form of a story or in paragraph form. It is preferred when only a few numbers are to be presented. The facts are incorporated in the main text. It is used if data is simple. Example: The prevalence of the maxillary midline diastema was found to be highest among the 6-7 years old level. This observation occurred concomitantly with the eruption of the second incisors. The lowest prevalence of midline diastema was found among the 12-14 years old level, occurring at about the time the canines have completely erupted in the oral cavity. A Self-regulated Learning Module 1 Tabular Presentation – it is a more concise and better way of presenting a big mass of quantitative data. This type of data presentation aims to reduce bulk of the data and make computation easier. Properly constructed tables show trend, comparisons as well as interrelationship of variables which is difficult to show in textual presentation. The form of the table depends on the purpose for which it is designed as well as on the complexity of the data. As in any aspect of research it is important to keep any presentation simple, direct and clear. Parts of a Table: 1. Table number 2. Title 3. Column headings 4. Row headings or “stubs” 5. Body 6. Footnotes 7. Source of data Types of Tables According to number of variables: 1. One-way table 2. Two-way table 3. Multi-way table According to function: 1. Master table 2. Dummy table 3. Correlation table 4. Summary table Prime considerations in constructing tables: 1. Simplicity – clean, professional and uniform look 2. Clarity – table should jive with textual discussion; includes internal clarity of data presented: clear, concise headings or captions, uncluttered footnotes, a minimum of variables presented, well-spaced columns or rows, etc. 3. Directness – only the necessary data should be included Graphical presentation – it is a pictorial presentation of data and has the advantages of simplicity and appeal to the readers. By using figures, colors and other types of computer graphics which are readily/easily produced, it is easier to highlight significant aspects of research results. When large sets of information is needed, the choice of data presentation is still the tabular type. A Self-regulated Learning Module 2 Graphs are preferred over tables when precision is not required as graphs present some form of an overview of the data. Advantages: Simplicity Appeal to readers Easier to highlight significant aspects of research results Preferred over tables when precision is not required Types of Graphs Bar Graph - to compare distribution of subjects according to categories of a qualitative or a discrete quantitative variable. Frequencies of occurrences among /between groups are represented by bars or rectangles. Comparison is made on the basis of the length of these bars. The bars may be drawn vertically or horizontally according to the nature of the variable. Types of Bar Graph: Horizontal Bar Graph Vertical Bar Graph Pie Chart - Presents data in the form of a circle divided into wedge-shaped components which correspond to the different categories of an event being studied. Preferred when there are six or less categories as in the case of budget presentations, allocation of funds or distribution of subjects according to sex. Component Bar Diagram - An alternative to the pie chart as it also shows how the total number of subjects are divided into specified categories. It is useful when the aim is to compare two or more different groups Histogram - Demonstrates the frequency distribution of a continuous quantitative variable. Rectangular bar is used to depict counts, absolute or relative. Horizontal scale shows the units of measurement of the variable being considered. Vertical scale shows the frequencies. Rectangles are drawn over the true limits of the groupings. Frequency Polygon - Compares two or more distributions The frequencies are plotted at the midpoint Produces a closed figure It is a closed figure (i.e. the line touches the X-axis at both ends) A Self-regulated Learning Module 3 The point where the ends of the line touches the X-axis is equal to the midpoint of the class before the first interval and the midpoint of the class after the highest class Line graph – portrays trends or changes in a variable through time Scatterplot - Shows relationship between two quantitative variables Gives a rough estimate of the correlation of two variables under study Pictogram - A popular method of presenting data to the man on the street and to those who cannot understand orthodox charts. Small pictures or symbols are used to present the data. SUMMARY Types of graphs and their applicability according to the nature of variables: Type Nature of Variable Function Bar graph (horizontal Qualitative or For comparison of absolute or relative or vertical) discrete frequencies between categories of a quantitative qualitative or a discrete quantitative variable Pie chart / qualitative Composition of a group or total, where the component bar number of categories is not too large (< or graph = 6) Histogram / Quantitative, Frequency distribution of a continuous frequency polygon continuous variable or measurements including age groups Line graph quantitative Shows trend data (time series; change with time or age or with respect to some other variable Scatterplot / quantitative Correlation data for two quantitative scattergraph events or conditions A Self-regulated Learning Module 4 Activity Using the list of references in the syllabus and other reliable sources/references online or offline, do further study/readings about the concepts presented. Look for figures, images for each type of graph and describe the figures/images based on the concepts given in this lesson. ASSESSMENT Answer the following questions and follow the instructions for submission to be given. 1. Describe the parts of a Table and indicate the use/purpose of each part. (14 points = 2 pts per item if both correct description and use/purpose are presented; 1 pt per item if only the description is given or only the use/purpose is given) 2. Describe the types of table according to number of variables. (3 pts = 1 pt each) 3. Describe the types of table according to function. (4 pts = 1 pt each) *Assessment for other concepts will be done in the worksheets. A Self-regulated Learning Module 5 LEARNING OBJECTIVES ly At the end of the lesson, you should be able to: On 1. Describe frequency distribution as a system to facilitate the description of data 2. Point out the importance of frequency distribution in descriptive statistics 3. Make a frequency distribution based on given data se Frequency Distribution yU Frequency distribution is one system used to facilitate the description of important features of the data. Raw data (data not yet arranged) are distributed into classes or categories. r tist A frequency distribution is a tabular arrangement of data by class with their corresponding frequency. It is one of the simplest ways to present data. en - Appropriate for reporting all levels of data (nominal, ordinal, interval, ratio). - All values or scores are listed, and the number of times each one appears is recorded. BD Important terms: Class intervals – refers to the grouping defined by a lower limit and an upper limit. e.g. 40 – 45: 40 is the lower limit and 45 is the upper limit rU *open class interval – no upper limit e.g. 40 and above Fo Class boundaries – also called true class limits; these are more accurate expressions of the class limits by at least 0.5 (for continuous data) e.g. In the class interval 5-9, the lower class boundary is 4.5 and the upper class boundary is 9.5 A Self-regulated Learning Module 1 Class marks – the middle value or midpoint of a class interval. It is obtained by getting the average of the lower class limit and the upper class limit. e.g. The class mark or midpoint of the class interval 40 – 45 is (40 + 45) / 2 = 42.5 Class size – also called class width, is the difference between the upper class boundary and the lower class boundary of a class interval. ly e.g. For the class boundaries 4.5 and 9.5, the class size is 5. On *The class size can also be obtained by getting the difference between two successive lower class limits. Class Frequency – the number of observations (how many times the value occurs) belonging se to a class interval. e.g. : Class Interval Frequency yU 23 – 28 5 17 – 22 9 11 – 16 10 5 – 10 3 r tist Example of a frequency distribution: Scores of 60 students in a statistics test: en Highest Score (HS) = 98 Lowest Score (LS) = 10 Range (R) = 88 (HS – LS) n = 60 BD K = 1 + 3.3 log n = 1 + (3.3) (1.78) = 1 + 5.874 K = 6.874 or 7 rU *for the final answer, round off to a whole number Note: in the formula to get the value of “K”, “1” is constant, “3.3” is constant, log n (get the “log” value of the “n” which is the number of items. Use your scientific calculator). In the given Fo data in the example, the number of items (n) is “60” which is equal to 1.78. For our BIOEPI1 class, round of the value of “log n” to the nearest hundredths or “two” decimal places. Follow the rules of rounding off. Determine the “log n” first before multiplying by “3.3” then add “1”. Interval (i) = R/K = 88/7 = 12.57 or 13 i = 13 A Self-regulated Learning Module 2 *Rules in BIOEPI1 Class so that we will have the same key for checking answers: Round off the interval (i) to a whole number). ly Follow the rules in rounding off numbers. On In the table for the frequency distribution, place the “highest” class interval on the first or top row and the lowest class interval on the last or bottom row (in the given example, 88 – 100 is the “highest” class interval and 10 – 22 is the “lowest” class interval). se Start by determining the “lowest” class interval by: identifying the “LS or lowest score” in the given set of data and use it as the lower limit of the lowest class interval. In this example, “10” is the lowest score. yU To determine the lower limit of the next class interval, just “add” the interval (i) to the lowest score. In this example, the computed “i” is “13”. So 10 + 13 = 23. The lower limit of the succeeding class interval is “23”. Therefore, the “upper limit” of the lowest class interval is “22” since it is 1 point r less than the “23”. tist Just keep adding the interval to the lower limit of the class intervals until you reach the highest class interval. en The basis for the highest class interval is when the HS or highest score is already included. In this example, the highest score if 98. The highest class interval is 88 – 100 because “98” is already included. In other words, the lowest score should be included in the lowest class interval and the highest score should be included in the highest class interval. Do not add another class BD interval once you reached the class interval where the highest score is included. To determine the upper limit of the class intervals, starting from the upper limit of the lowest class interval, just add the computed interval (in this example i = 13). So 22 + 13 = 35. 35 + 13 = rU 48 and so on until you reach the highest class interval. See to it that the class width is the same for all the class intervals. Fo Add other columns for the table to show the frequency. Be accurate in the tally (t) of the scores. If you add the frequencies (f), the total should be the same as the number of items (in this example, the total number of items/scores is “60”. However, it does not mean that if you get the same total, your tally is “always” right. If you are not careful, you might tally a score to another class interval and still get the same total but the frequency is not accurate. So be very careful in tallying. If the frequency is wrong, all your computations for the succeeding numerical presentations such as the mean, etc. will be wrong. A Self-regulated Learning Module 3 The following table shows the frequency distribution of the given data: ly On Class t f 88 - 100 lllll - l 6 75 - 87 lllll –lllll-lllll-lllll 20 se 62 - 74 lllll-lllll-llll 14 49 - 61 36 - 48 r 23 - 35 lllll-lll lllll llll yU 8 5 4 tist 10 - 22 lll 3 i = 13 N = 60 en Activity BD Study the concepts presented about the frequency distribution. Do further readings. Practice making a frequency distribution using the following data: (the practice output is “not” to be submitted but you must perform it to prepare for the worksheet). rU Data: DMFT (Decayed, Missing and Filled Teeth) scores of 30 children: 10 11 5 8 10 4 Fo 6 9 4 5 8 10 6 7 5 12 11 5 8 10 12 4 7 8 A Self-regulated Learning Module 4 7 12 5 8 10 7 *The answer will be given after you perform the practice activity. ASSESSMENT Answer the questions and “submit” the assessment following the instructions and deadline of submission to be announced by your instructor. Since you will be utilizing references, each question ly will be worth 3 points each. 3 pts = question answered completely; 2 pts = lacks 1 or 2 keywords; 1 pt = lacks more than 2 keywords. Total = 15 points On 1. What is raw data? 2. What is the difference between “ungrouped” and “grouped” data? 3. What is an array? se 4. What is rank of a number in statistics? 5. What is the important role of the frequency distribution in presentation of data? r yU tist en BD rU Fo A Self-regulated Learning Module 5