Biostatistics Lecture Notes PDF
Document Details
Uploaded by SpiritedNephrite8926
CEU San Pablo University
Tags
Summary
These lecture notes provide an overview of biostatistics, including topics such as data types, inferential statistics, and random samples. Biostatistics is a field of study that combines statistical methods with biological concepts to analyze and interpret data in various healthcare settings.
Full Transcript
07/11/2024 OVERVIEW OF BIOESTATISTICS ‘There are three kinds of lies: lies, damned lies and statistics’ (Benjamin Disreali) 1 OVERVIEW OF BIOESTATISTICS MAIN USES OF DATA IN DENTAL...
07/11/2024 OVERVIEW OF BIOESTATISTICS ‘There are three kinds of lies: lies, damned lies and statistics’ (Benjamin Disreali) 1 OVERVIEW OF BIOESTATISTICS MAIN USES OF DATA IN DENTAL HEALTH For designing a health care program or facility For evaluating the effectiveness of an oral hygiene education program For determining the treatment needs of a specific population For proper interpretation of the scientific literature 2 07/11/2024 OVERVIEW OF BIOESTATISTICS TWO MAJOR DIVISIONS OF STATISTICS: Descriptive statistics Inferential statistics 3 OVERVIEW OF BIOESTATISTICS Descriptive statistical techniques Enable researchers to describe and summarize a set of data numerically. 4 07/11/2024 OVERVIEW OF BIOESTATISTICS Inferential statistical techniques Provide a basis for testing hypotheses and applying statistical results to the group of individuals or objects that form the population of interest. 5 OVERVIEW OF BIOESTATISTICS POPULATION VS SAMPLE A population is any entire group of items (objects, materials, people, etc.) that posses at least one basic defining characteristic in common. In cases in which it is impossible to collect data on the entire population, complete and reliable information can be collected from a representative portion of the population termed as sample. 6 population: group of items that has at least 1 charactristic in common. Sample: reprersentative PORTION of a population when it is impossible to collect dats from the entire population 07/11/2024 OVERVIEW OF BIOESTATISTICS Statistics is a science that describes data for the purpose of making inferences about the population from which the data are obtained. 7 OVERVIEW OF BIOESTATISTICS When we collect a specific piece of information (data) from each member of a population, we obtain a characteristic of the population termed a parameter. gender/ age When we collect a specific piece of information from each member of a sample, we obtain a characteristic of the sample termed a statistic. Most studies are conducted by using samples, statistics are most commonly used. Using statistics (characteristics of a sample), we try to infer what the parameters (characteristics of a population) will be. 8 SPECIFIC PIECE OF INFO// CHARATERISTIC OF A POPULATION—> PARAMETER SPECIFIC PIECE OF INFO// CHaracteriscti of a sample—> statostic it is more cdommon to study samples, so we mostyly see statistics. statistics are used to try to conlcude the parameters of the population of that sample 07/11/2024 OVERVIEW OF BIOSTATISTICS RANDOM SAMPLES One in which every element in the population has an equal and independent chance of being selected. Random sampling is the procedure of choice whenever possible. It prevents the possibility of selection bias on the part of the researcher. Probability sampling methods / sufficiently large sample 9 Random samples? 10 07/11/2024 OVERVIEW OF BIOSTATISTICS RANDOM SAMPLES – Example 1 Assume a population of 5000 seniors in the predental program at 50 universities. Each senior class has 100 predental students divided into five equal sections of 20 students each. The objective is to determine the grade point average (GPA) of each predental student by selecting a representative sample of 1000 students (i.e., a sampling ration of one fifth, or 20%). 11 OVERVIEW OF BIOESTATISTICS RANDOM SAMPLES – Example 2 A similar procedure may be applied by using a table of random numbers. 12 07/11/2024 OVERVIEW OF BIOESTATISTICS STRATIFIED SAMPLING Sometimes, a simple random sample may not ensure representation of the entire population. It may be necessary to select individuals according to certain characteristics to dismiss the chance of sample fluctuation. selection of specific items is done since a randomized sample wouldn’t be enogh to ensure avoiding chances of fluctuations in the results (i.e if a random sample is mostly composed of males, then we need to add more females items into it, to avoid fluctuations of the results) 13 OVERVIEW OF BIOESTATISTICS SYSTEMATIC SAMPLE not all items of the population It is not a true random sample because everyone may not have an independent have tyhe same chance of chance of being selected (e.g. even-numbered) being selected JUDGEMENT SAMPLE the items are Someone with knowledge of the population may select a sample arbitrarily to SELECTED by represent the population. Non-random sample. Some degree of bias. (e.g. summa someone. non- cum laude students) randomized CONVENIENCE SAMPLE the items are selected A group is chosen because it happens to be convenient and may represent the based on convenience, population. May be valid but, when generalized to include the larger population it may be their reliability is questionable. representative. but it is not an accurate representation of the population 14 07/11/2024 What are data? 15 OVERVIEW OF BIOESTATISTICS Data are any information that can be collected. Not all data are represented by numbers. Before determining the appropriate methods for summarizing and displaying data it is necessary to understand the nature of the variable of interest (its scale of measurement). The type of data also plays an important role in deciding which statistical procedures to apply in a test of a hypothesis. 16 07/11/2024 OVERVIEW OF BIOESTATISTICS TWO MAJOR SCALES OF MEASUREMENTS 1. Categorical data (enumeration) 2. Continuous data (measurements) 17 OVERVIEW OF BIOESTATISTICS CATEGORICAL DATA Enumeration data are data that are represented by mutually exclusive categories. These data are qualitative (descriptive). They are classified into two types: a) Nominal scale b) Ordinal scale categorical data is the data that that is excusive to one category. it is not numerical, is qualitaty (color, name, gender). it can me nominal or ORDINARY 18 07/11/2024 OVERVIEW OF BIOESTATISTICS CATEGORICAL DATA Nominal scale Characterized by named categories having no particular order. Within each of these scales, an individual subject may belong to only one level, and one level does not mean something greater than any other level. a item can only fall into one nominal level, it doenst hav a particular order. it can either be female or male. 19 Examples of categorical data – Nominal scale Patient gender (male/female) Reason for dental visit (check-up, routine treatment, emergency treatment) Use of fluoridated water (yes/no) 20 07/11/2024 OVERVIEW OF BIOESTATISTICS CATEGORICAL DATA Ordinal scale Variables whose categories possess a meaningful order. it has a particular order 21 Examples of categorical data – Ordinal scale Severity of periodontal disease (0=none, 1=mild, 2=moderate, 3=severe) Length of time spent in a dental office waiting room (1= less than 15 minutes, 2= 15 to less than 30 minutes, 3= 30 minutes or more) 22 07/11/2024 OVERVIEW OF BIOESTATISTICS CONTINUOUS DATA Numerical values are assigned according to a systematic rule and exist on a continuum (for any two points on the scale, an intermediate value exists, at least theoretically). Blood pressure, body weight, head circumference, and number of minutes to relief of pain 23 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA Frequency distribution tables Graphs Tables 24 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA FREQUENCY DISTRIBUTION TABLE This type of data display shows each value that occurs in the data set and how often each value occurs. Provides a sense of the shape of a variable’s distribution. These displays provide the researcher with an opportunity to screen the data values for incorrect or impossible values, a first step in the process known as “cleaning the data” Routinely, data analysts generate a frequency distribution table for every variable that is recorded in a research project. 25 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA - FREQUENCY DISTRIBUTION TABLE A group of 33 dental students has taken part one of the national board examinations. Their examination scores have been recorded. The dean of the dental school wishes to summarize these scores at the next school faculty meeting. 26 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA - FREQUENCY DISTRIBUTION TABLE UNGROUPED FREQUENCY DISTRIBUTION TABLE 27 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA - FREQUENCY DISTRIBUTION TABLE CUMULATIVE FREQUENCY DISTRIBUTION TABLE 28 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA - FREQUENCY DISTRIBUTION TABLE GROUPED FREQUENCY DISTRIBUTION TABLE 29 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA - GRAPHS Allow rapid assimilation of findings by the reader. General rule for graphs: Y axis: usually represents the frequency of scores occurring along the scale of measurement X axis: represents the scale that measures the variable of interest Y axis 30 X axis 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA – GRAPHS BAR CHART Two dimension Categorical scale (nominal or ordinal) Each category – separate bar Height of the bar (number or percent) The bars do not touch each other Order 31 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA – GRAPHS HISTOGRAM Formed directly from a frequency distribution table. Used to display a continuous measurement variable. X axis: continuous number line that represents the measurement scale of the variable of interest (grouped into equal intervals) Y axis: number of observations in each interval 32 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA – SUGGESTIONS FOR GRAPHS Include, below the figure, a title providing all relevant information Be referred to as figures in the text Identify figure axes by the variables under analysis Quote the source which provided the data, if required Demonstrate the scale being used Be self-explanatory 33 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA – TABLES 34 07/11/2024 OVERVIEW OF BIOESTATISTICS DISPLAYING DATA – SUGGESTIONS FOR TABLES Be self-explanatory Present values with the same number of decimal Include a title informing what is being described and where, as well as the number of observations (N) and when data were collected Have a structure formed by three horizontal lines Defining table heading and the end of the table at its lower border; Not have vertical lines at its lateral borders Provide additional information in table footer When needed; Be inserted into a document only after being mentioned in the text Be numbered by Arabic numerals 35 OVERVIEW OF BIOESTATISTICS NUMERICAL SUMMARY OF DATA A more formal numerical summary of the variable is usually required for the full presentation of a data set. To adequately describe a variable’s values, three summary measures are needed: 1. The sample size 2. A measure of central tendency or location 3. A measure of dispersion or spread 36 07/11/2024 OVERVIEW OF BIOESTATISTICS THE SAMPLE SIZE Total number of observations in the group Symbolized by N or n 37 OVERVIEW OF BIOESTATISTICS MEASURES OF CENTRAL TENDENCY OR LOCATION Describe the middle (or typical) value in a data set. (mode, median, mean) Provide useful information about the typical performance for a group of data. 38 07/11/2024 OVERVIEW OF BIOESTATISTICS MEASURES OF CENTRAL TENDENCY OR LOCATION – mode Is that value that occurs with greatest frequency. It is possible for a distribution to have more than one mode. Ease of computation Convenience as a quick indicator of the central value in a distribution. Its statistical uses are extremely limited. 39 OVERVIEW OF BIOESTATISTICS MEASURES OF CENTRAL TENDENCY OR LOCATION – median Value that divides the distribution of data points into two equal parts (50% lie above and 50% lie below). 40 07/11/2024 OVERVIEW OF BIOESTATISTICS MEASURES OF CENTRAL TENDENCY OR LOCATION – mean The most common measure of central tendency used to describe a set of data It is often used when making statistical decisions. Unlike the median, the mean is sensitive to any change in any score in the distribution. 41 OVERVIEW OF BIOESTATISTICS MEASURES OF DISPERSION OR SPREAD Quantify the degree to which values in a group vary from one another (range, standard deviation) How the members of the data set arrange themselves about the central or typical value. How spread out are the data points? How stable are the values in the group? Report variability 42 07/11/2024 OVERVIEW OF BIOESTATISTICS MEASURES OF DISPERSION OR SPREAD – range Difference between the largest value and the smallest value in the group (Range= Maximum-Minimum) Simply the statement of the minimum and maximum values (Range= (minimum, maximum)) It quickly lends perspective regarding the variable’s distribution. Usually reported along with the sample median. It can be deemed instable because it is affected by one extremely high score or one extremely low value. Only two values are considered and these happen to be the extreme scores of the distribution. The measure of dispersion known as standard deviation addresses this disadvantage of the range. 43 OVERVIEW OF BIOESTATISTICS MEASURES OF DISPERSION OR SPREAD – Standard deviation (SD) Measure of the variability among the individual values within a group. 44 07/11/2024 OVERVIEW OF BIOESTATISTICS THE NORMAL DISTRIBUTION/ GAUSSIAN DISTRIBUTION 45 OVERVIEW OF BIOESTATISTICS THE NORMAL DISTRIBUTION/ GAUSSIAN DISTRIBUTION Bell shaped curve One of the most frequently occurring distributions in biomedical and dental research. Population frequency distribution. 46 07/11/2024 INFERENTIAL STATISTICS Provide a basis for testing hypotheses and applying statistical results to the group of individuals or objects that form the population of interest. The process of generalizing sample results to a population is termed statistical inference and is the product of formal statistical hypothesis testing. 47 INFERENTIAL STATISTICS THE NULL HYPOTHESIS ( H0) Statistical tests of a hypothesis begin with the statement of the hypothesis itself, but stated in the form of a null hypothesis: No difference No effect No association Serves as a reference point for the statistical test. 48 07/11/2024 INFERENTIAL STATISTICS THE NULL HYPOTHESIS ( H0) - EXAMPLE 49 INFERENTIAL STATISTICS 50 07/11/2024 INFERENTIAL STATISTICS STATISTICAL SIGNIFICANCE If p ≤ 0.05 the null hypothesis is rejected. The results are “statistical significant” If p ≥ 0.05 the null hypothesis is accepted and the results are called “not statistical significant” 51 INFERENTIAL STATISTICS 52 07/11/2024 BIBLIOGRAPHY Jong’s Community Dental Health Dental Public Health & Research (Christine Nielsen Nathe) Fourth edition 2017 Pereira Duquia R., Luiz Bastos J, Rangel Bonamigo R., González-Chica DA., Martínez-Mesa J. Presenting data in tables and charts. An Bras Dermatol. 2014;89(2):280-5 53