Fundamentals of Biostatistics Lecture Note PDF

Summary

This document provides a lecture note on fundamentals of biostatistics, covering definitions, classifications, and methods of data collection.

Full Transcript

Fundamentals of Biostatistics Lecture Note Stat4101 CHAPTER ONE INTRODUCTION Chapter Outline: 1. INTRODUCTION 1.1. Definitions and Classifica...

Fundamentals of Biostatistics Lecture Note Stat4101 CHAPTER ONE INTRODUCTION Chapter Outline: 1. INTRODUCTION 1.1. Definitions and Classification of Statistics 1.2. Stages in statistical Investigation 1.3. Definition of some terms 1.4. Applications, Uses and Limitations of Statistics 1.5. Scales of Measurement 1.6. Introduction to Methods of Data Collection Learning objectives: After completing this chapter, students will be able to:  Define statistics  Distinguish the difference between descriptive statistics and inferential statistics  Identify types of level of measurements  Identify the different types of data and understand why we need to classifying variables  Understand applications and limitations of statistics 1. INTRODUCTION 1.1. Definition and classifications of statistics The word statistics can be defined into two ways depending on its use in the plural and singular sense. 1. Plural sense (lay man definition). ✓ Statistics is defined as the collection of numerical facts or figures. ✓ Statistics are the raw data themselves, like statistics of births, statistics of deaths, statistics of students, statistics of imports and exports, etc. Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they cannot be compared and are unrelated. For example, the average mark of statistics course for students is 70% would be considered as a statistics whereas Aster has got 90% in statistics course is not statistics. 2. Singular sense (formal definition) ✓ Statistics is the science of collecting, organizing, presenting, analyzing and interpreting of statistical data in order to make decisions. Fundamentals of Biostatistics Lecture Note Stat4101 If the data is related to biological or medical science, it is called biostatistics. Biostatistics is the branch of statistics responsible for the proper interpretation of scientific data generated in the biology and other related fields. It is the application of statistical principles in biology, medicine and public health. It is a science of collecting and analyzing biological data using statistical methods. Biostatistics and statistics both involves data collection and interpretation. Statistics is a broad approach to data analysis and collection. The key distinction is that biostatistics uses statistical methods to answer questions pertaining to topics in biology. Classifications of Statistics: Statistics is broadly divided into two categories based on how the collected data are used 1. Descriptive Statistics: Deals with describing the data collected without going any further conclusion. Used to summarize, organize and describes the characteristics of a data set It consists of collection, organization and presentation of data It is concerned with graphs, charts and tables In general, descriptive Statistics are brief informational coefficients that summarize a given data set, which can be either a representation of the entire population or a sample of a population. It is broken down into frequency distribution, measures of central tendency and measures of variability (spread). Measures of central tendency include the mean, median, and mode, while measures of variability include standard deviation, variance, minimum and maximum value, kurtosis, and skewness. Measures of frequency distribution describe the occurrence of data within the data set (count). Examples: Suppose that the mark of six students in Molecular Biology for 4th year Biology students is given as 75, 80, 84, 86, 88 and 90.  The average mark of the six students is 83.83  The median mark of the six students is 85 The above example is considered as a descriptive statistic. 2. Inferential Statistics: It deals with making inferences or conclusions about a population based on representative sample. Fundamentals of Biostatistics Lecture Note Stat4101 It consists of performing hypothesis testing, determining relationships among variables and making predictions. It is important because statistical data usually arises from sample. Statistical techniques based on probability theory are required. In the above example, if we say that the average mark in Molecular Biology for 4th year Biology students is 83.83, then we talk about inferential statistics (draw conclusion based on the sample observation). 1.2. Stages in Statistical Investigation There are five stages or steps in any statistical investigation. These are collection, organization, presentation, analysis and interpretation of data. 1. Collection of data: This is the process of obtaining measurements, gathering, assembling and obtaining data. Data can be collected in a variety of ways; one of the most common methods is through the use of survey. Survey can also be done in different methods, three of the most common methods are: Telephone survey questionnaire Personal interview 2. Organization of data Summarization of data in some meaningful way and organized form, e.g., table form Correcting any apparent inconsistencies, ambiguities, and recording error of data that happens during data collections. 3. Presentation of data: It is the process of re-organization, classification, compilation, and summarization of data in condensed manner. It can be presented in the form of tables, charts, diagrams and graphs in a valid meaning. The main purpose of data presentation is to facilitate the understanding as well as statistical analysis. 4. Analysis of data: This is the stage where we critically study the data to draw conclusions about the population parameter. The purpose of data analysis is to dig out useful information for decision-making. Fundamentals of Biostatistics Lecture Note Stat4101 This analysis may be simply extracting relevant information from summarized data to draw some meaningful conclusions about a parameter or it may involve highly complex and sophisticated mathematical techniques. 5. Data Interpretation: This is the final stage where we draw valid conclusion from the results obtained through data analysis. Correct interpretation will lead to a valid conclusion of the study & thus can aid in further decision-making. 1.3. Definitions of some basic terms A. Population: is the totality of all individuals, objects or items under consideration. Examples: All students of Bahir Dar University (BDU), All clients of Telephone Company, etc. The population could be finite or infinite (an imaginary collection of units). B. Sample: A part of the population selected for study. Note: A major use of statistics is to collect and use sample data to make conclusions about the populations. C. Sampling: a process or method of sample selection from the population using some statistical techniques. D. Sampling frame: is a list of all possible units of the population/items/ from which the sample is taken. E. Survey: is an investigation of a certain population to assess its characteristics. It may be census or sample F. Census survey: a complete enumeration of the population under study. G. Sample survey: the process of collecting data covering a representative part or portion of a population. H. Parameter: is a characteristic or a summary value calculated from a population. (Greek letter are usually used to represent parameters; Example: Population mean µ, population standard deviations etc.) I. Statistic: Characteristic or a summary value calculated from a sample. (Latin letters are usually used to represent statistic; Example sample mean , sample standard deviation S, etc.) J. Sample size: The number of elements or observation to be included in the sample. Fundamentals of Biostatistics Lecture Note Stat4101 K. An element: is a member of sample or population. It is specific subject or object (for example a person or item) about which the information is collected. L. Observation (measurement): is the value of a variable for an element. M. Variable: It is an item of interest that can take numerical or non-numerical values for different elements. 1.4. Applications, Uses and Limitations of statistics Applications of statistics: Statistics can be applied in any field of study, which seeks quantitative evidence. For instance (in engineering, economics, natural science, etc.) a) Engineering: Statistics have wide application in engineering. To compare the breaking/ violation strength of two types of materials. To determine the probability of reliability of a product. To control the quality of products in a given production process. To compare the improvement of yield due to certain additives such as fertilizer, herbicides, e t c. b) Economics: Statistics are widely used in economics study and research. To measure and forecast Gross National Product (GNP) Statistical analyses of population growth, inflation rate, poverty, unemployment figures, rural or urban population shifts and so on influence much of the economic policymaking. Financial statistics are necessary in the fields of money and banking including consumer savings and credit availability. c) Statistics and research: There is hardly any advanced research going on without the use of statistics in one form or another. Statistics are used extensively in medical, pharmaceutical and agricultural research. Almost all human beings in their daily life are subjected to obtaining numerical facts e.g., about price. Applicable in some process e.g., invention of certain drugs, extent of environmental pollution. In industries especially in quality control area. Fundamentals of Biostatistics Lecture Note Stat4101 Uses of statistics: The main function of statistics is to enlarge our knowledge of complex phenomena. The following are some uses of statistics: Estimating the relationship between dependent & one or more independent variable. Testing and formulating of hypothesis. It presents facts in a definite and precise form. Data reduction. Measuring the magnitude of variations in data. Furnishes/ Provides a technique of comparison Estimating unknown population characteristics. Studying the relationship between two or more variables. Forecasting future events. Limitations of statistics As a science, statistics has its own limitations. The following are some of the limitations: ❖ It cannot deal with a single value. However, it deals with a set of data. ❖ Complete accuracy is impossible. ❖ It cannot deal with qualitative data. It only deals with data which can be quantified. Ex: it does not deal with marital status (married, single) but it deals with a number of married, a number of single ❖ Statistical values are true on average. The conclusions drawn from the analysis of the sample may perhaps, differ from the conclusions that would be drawn from the entire population. For this reason, statistics is not an exact science ❖ It can be misused: statistics cannot be used to full advantage in the absence of proper understanding of the subject matter. ❖ Statistical data are only approximately and not mathematical correct. Types of Variables and Measurement Scales Types of Variables: A variable is a characteristic of an object that can have different possible values. There are two types of variables. Based on the values that variables assume, variable can be classified as qualitative and quantitative. A. Quantitative variables: are numerical variables that can be quantified and measured or can assume numeric values. Examples: height, area, income, temperature etc. Quantitative variables can be further classified as: Fundamentals of Biostatistics Lecture Note Stat4101 ◆ Discrete variables: are variables whose values can obtain by counting. The possible values for such variables are whole number values (0, 1, 2…). Eg: number of students, number of households, Number of pages of a book. ◆ Continuous variables: are variables whose value can take any value b/n two numbers. Their values are obtained by measuring. Eg: weight, height, Temperature etc B. Qualitative variables: are variables that cannot be quantified directly. Qualitative variables are non-measurable and non-quantifiable variables (variables that do not assume numeric values.) Examples: color, sex, location, marital status. Qualitative variables are also called categorical variables. In quantitative variable, an operation such as addition or average can make a sense. But for qualitative it can’t make a sense. A categorical variable is also known as an attribute, whereas a quantitative variable is often referred to simply a variable 1.5. Scales of Measurement Measurement scale refers to the property of value assigned to the data based on the properties of order, distance and fixed zero. There are four levels (scales) of measurement: Nominal, Ordinal, Interval, and Ratio. These scales go from lowest level to highest level. Data is classified according to the highest level, which it fits. Each additional level adds something the previous levels did not have. I. Nominal Scales “Nominal” is a Latin word for “name" this is a scale for grouping individuals into different categories. The values of a nominal attribute are just different names, i.e., nominal attributes provide only enough information to distinguish one object from another. These types of data are consisting of names, labels and categories. This is simply a scale for grouping individuals into different categories (no numerical or quantitative value). Nominal Level of measurement, classifies data into mutually exclusive, all-inclusive categories in which no order or ranking can be imposed on the data. In this scale, one is different from the other Arithmetic operations (+, -, *, ÷) are not applicable Comparison (, =, ≤, ≥, ≠, etc.) is impossible Fundamentals of Biostatistics Lecture Note Stat4101 Examples: o Political party preference (Republican, Democrat, or Other,) o Sex (Male or Female.) o Marital status (married, single, widowed, divorced) o Country code (+251, +1, … o Regional differentiation of Ethiopia. o ID number, ethnicity, color, … II. Ordinal Scales Are qualitative variable whose values can be ordered and ranked. Can be arranged in some order, but the differences between the data values are meaningless. Data consisting of an ordering or ranking of measurements are said to be an ordinal scale of measurements. That is, the values of an ordinal scale provide enough information to order objects. One is different from and greater /better/ less than the other Arithmetic operations (+, -, *, ÷) are impossible but comparison or relational operations (, ≠, etc) is possible. Differences between the ranks do not exist. Ordering is the sole property of ordinal scale. Examples: o Grade Score (A, B, C, D, F). o Rating scales (Excellent, very good, Good, Fair, poor). o Military status (general, colonel, lieutenant, etc). o Academic qualification (B.Sc., M.Sc., Ph.D.) o Health status (very sick, sick, cured) Ordinal scales data contain and convey more information than the nominal scale data, for relative magnitudes are known, however, quantitative comparisons are impossible. III. Interval Scales: This measurement scale shares the ordering properties of ordinal scale of measurement. Besides, the distance or magnitude between two values is clearly known (meaningful). However, there is no a true zero point (i.e., zero point is not meaningful, so ratios are meaningless). Fundamentals of Biostatistics Lecture Note Stat4101 Interval scales are measurement systems that possess the properties of Order and distance, but not the property of fixed zero We can do subtraction and addition but division and multiplication are impossible to use Relational operations (, =, ≤, ≥, ≠) are also possible. Interval scale data convey better information than nominal and ordinal scale data In this measurement when zero occurs, it is an arbitrary measurement rather than actually indicating "nothing” Examples: o IQ o Temperature in 0F. IQ is an example of such a variable. There is a meaningful difference of 1 point between an IQ of 109 and an IQ of 110. Temperature is another example of interval measurement, since there is a meaningful difference of 10F between each unit, such as 720F and 730F. One property is lacking in the interval scale: There is no true zero. For example, IQ tests do not measure people who have no intelligence. For temperature, 0 0F does not mean no heat at all. IV. Ratio Scales It is the highest level of measurement scale. It shares the ordering, labeling and meaningful distance properties of interval scale. In addition, it has a true or meaningful zero point. The existence of a true zero makes the ratio of two measures meaningful. Level of measurement, which classifies data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist between the different units of measure. All arithmetic and relational operations are applicable. Examples: o Weight o Height. Example i.e., the ratio of Aster's height to Marta's height is 1.32, whereas this is not possible with interval scales. o Number of students o Age Note: Nominal & ordinal scales are belonging to qualitative variables, whereas interval & ratio scale are quantitative. The following present a list of different attributes and rules for assigning numbers to objects. Try to classify the different measurement systems into one of the four types of scales. (Exercise) Fundamentals of Biostatistics Lecture Note Stat4101 1. Your checking account number as a name for your account. 2. Your checking account balance as a measure of the amount of money you have in that account. 3. Your score on the first statistics test as a measure of your knowledge of statistics. 4. Your score on an individual intelligence test as a measure of your intelligence. 5. A response to the statement "Abortion is a woman's right" where "Strongly Disagree" = 1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a measure of attitude toward abortion. 6. Times for swimmers to complete a 50-meter race; 7. Months of the year Meskerm, Tikimit… 8. Socioeconomic status of a family when classified as low, middle and upper classes. 9. Blood type of individuals, A, B, AB and O. 10. Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no pollen and 10 that it is rampant, but for which the values do not represent an actual count of grains of pollen. 11. Regions numbers of Ethiopia (1, 2, 3 etc.) 12. The number of students in a college; 13. the net wages of a group of workers; 14.The height of the men in the same town 1.6. Introduction to Methods of Data Collection Data: Are collection of numerical facts, or results of measurements or observations of some objects. It consists of information coming from observations, counts, measurements or responses. Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is statistical data when they are: Comparable Meaningful and Collected for a wen-defined objective Raw data: are collected data, which have not been organized numerically. Examples: 25, 10, 32, 78, 6, 93, 4 Fundamentals of Biostatistics Lecture Note Stat4101 An array: is an arrangement of raw numerical data in ascending or descending order of magnitude. It enables us to know the range of the data set easily and it also gives us some idea about the general characteristics of the distribution. Any scientific investigation requires data related to the study. The required data can be obtained from either a primary source or a secondary source. 1.6.1 Sources of Data There are two sources of data; these are primary and secondary sources of data. A. Primary sources: are sources where the data is measured or collected by the investigator or the user directly from the source. B. Secondary sources: are sources where the data is not measured or not collected by the investigator or the user directly from the source. 1.6.2 Types of Data Based on the sources; data can be categorized into two: I. Primary Data Are data originally collected for the immediate purpose Data measured or collect by the investigator or the user directly from the source. Primary data are more expensive than secondary data. Two activities involved: planning and measuring. a) Planning: ▪ Identify source and elements of the data. ▪ Decide whether to consider sample or census. ▪ If sampling is preferred, decide on sample size, selection method… etc ▪ Decide measurement procedure. ▪ Set up the necessary organizational structure. b) Measuring: there are different options. ▪ Focus Group ▪ Telephone Interview ▪ Mail Questionnaires ▪ Door-to-Door Survey ▪ New Product Registration ▪ Personal Interview ▪ Experiments are some of the sources for collecting the primary data. Fundamentals of Biostatistics Lecture Note Stat4101 II. Secondary Data ❖ Data gathered or compiled from published and unpublished sources or files. ❖ When our source is secondary data check that: ▪ The type and objective of the situations. ▪ The purpose for which the data are collected and compatible with the present problem. ▪ The nature and classification of data is appropriate to our problem. ▪ There are no biases and misreporting in the published data. Methods of data collection In Statistics, data collection is a process of gathering information from all the relevant sources to find a solution to the research problem. It helps to evaluate the outcome of the problem. The data collection methods allow a person to conclude an answer to the relevant question. Most of the organizations use data collection methods to make assumptions about future probabilities and trends. Once the data is collected, it is necessary to undergo the data organization process. Collection of primary data: to collect primary data one or more of the following methods may be implemented.  Direct personal investigation (interview)  Indirect investigation: When direct sources don’t exist or do not respond for some reason or other.  Information obtained from correspondents and local reports  Questionnaires  Field trials  Laboratory experiments  Diaries Collection of Secondary Data: It may be in a published form (E.g., Various publications of government, local bodies, chamber of commerce, journals, newspaper etc. Or Unpublished form (E.g. Manuscripts, records of government and other bodies, business houses, etc) Note: Data which are primary for one may be secondary for the other.

Use Quizgecko on...
Browser
Browser