Biostatistics PDF

Document Details

MonumentalMeerkat

Uploaded by MonumentalMeerkat

Juliano C. Brosas Elementary School

Tags

biostatistics statistics data analysis mathematics

Summary

These notes provide an introduction to biostatistics, a field that involves the collection, analysis, and interpretation of quantitative data.  The document discusses important terms and concepts like descriptive statistics, inferential statistics, and various types of variables. It also touches on fundamental data collection methods like surveys and the different levels of measurement.

Full Transcript

BIOSTATS LESSON 1 STATISTICS  Discrete Quantitative Variable...

BIOSTATS LESSON 1 STATISTICS  Discrete Quantitative Variable o Results from either a finite number of possible values or - A collection of quantitative data - A science which deals with the collection, presentation, analysis, and interpretation a countable number of possible values of quantitative data  Continuous Quantitative Variable o Results from infinitely many possible values that can be - A tool that helps us develop general and meaningful conclusions that go beyond the associated with points on a continuous scale in such a original data way that there are no gaps or interruptions Nature of Statistics Sample - Two areas of interest: o Descriptive Statistics - A part of the population or a sub-collection of elements drawn from a population  Deals with the methods of organizing, summarizing and presenting a mass of data so as to yield meaningful information Parameter - A numerical measurement describing some characteristic of a population o Inferential Statistics  Deals with making generalizations about a body of data where only a part of it is examined Statistic - A numerical measurement describing some characteristic of a sample  Comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data Survey SOME BASIC STATISTICAL TERMS - Often conducted to gather opinions or feedbacks about a variety of topics o Census Survey Population - The set of all individuals or entities under consideration or study  Most often simply referred to as census  Conducted by gathering information from the entire population - May be a finite or infinite collection of objects, events, or individuals, with specified o Sampling Survey class or characteristics under consideration  Most often simply referred to as survey Variable  Conducted by gathering information only from a part of the population - A characteristic of interest measurable on each and every individual in the universe LEVEL OF MEASUREMENT - TYPES OF VARIABLE: o Qualitative Variable - Determines the algebraic operations that can be performed and the statistical tools  Consists of categories or attributes, which have non-numerical that can be applied to the data set - Level 1: NOMINAL characteristics o Quantitative Variable o Characterized by data that consists of names, labels, or categories only  Consists of numbers presenting counts or measurements o Data cannot be arranged in an ordering scheme  CLASSIFICATION: BIOSTATS - Level 2: ORDINAL TYPES OF DATA o Data may be arranged in some order, but differences between data values Primary Data either cannot be determined or are meaningless - Information collected from an original source of data, which is first-hand in nature - Level 3: INTERVAL o Like the ordinal level, with the additional property that meaningful amounts Secondary Data of differences between data can be determined - Information collected from published or unpublished sources like books, newspapers, o There is no inherent or natural zero starting point and thesis - Level 4: RATIO o The interval level modified to include the inherent zero starting point METHODS OF DATA COLLECTION o Differences and ratios are meaningful 1. Direct or Interview Method o Will use at least two persons (an interviewer and interviewee) exchanging information o Will give precise and consistent information because clarifications can be made LESSON 2 o Time consuming, expensive and has limited field coverage CHARACTERISTIC OF A GOOD QUESTION 2. Indirect or Questionnaire Method 1. Unbiased o Written answers are given to prepared questions o Questions must not be worded in a manner that will influence the respondent o Requires less time and is inexpensive since the questionnaires can simply be to answer in a certain way that is to favor a certain response or to be against mailed or hand-carried it o Will give the respondent a sense of freedom in honestly answering the o Stated in neutral language and no element of pressure questions because of anonymity 2. Clear and Simply Stated 3. Registration Method o A question that is simple and clear will be easier to understand and more likely o Enforced by certain laws to be answered truthfully 4. Observation Method 3. Precise o Observes the behavior of individuals or organizations in the study o Question must not be vague o Also used when the respondents cannot read nor write o Question should indicate clearly the manner how the answers must be given 5. Experiment Method 4. Lend themselves to Easy Analyses o Used when the objective of the study is to determine the cause and effect of o Open question certain phenomena or event  Allows a free response o Closed question CONCEPT OF SAMPLING  Allows only a fixed response - The process of selecting units, like people, organizations, or objects from a population of interest in order to study and fairly generalize the results back to the population from which the sample was chosen BIOSTATS ADVANTAGES OF SAMPLING - Population frame 1. Reduced Cost o A listing of all the individual units in the population o If data is gathered from a small fraction of the population, expenditures are - Sampling frame smaller than if a complete count is performed o The list of sampling units o With large populations, results accurate enough to be useful can be obtained from sample that represent only a small fraction of the population TYPES OF SAMPLING 2. Greater Speed Probability Sampling (Representative Sampling) o Data can be collected and summarized faster with a sample than with a - Provide the most valid or credible results because they reflect the characteristics of complete count the population from which they are selected o A very important consideration when the information is urgently needed - Two Types: 3. Greater Scope o Random Sampling o Surveys that rely more on sampling have more scope and flexibility regarding  Each individual in the population of interest has an equal likelihood of types of information that can be obtained selection 4. Greater Accuracy  There is no bias involved in the selection of the sample o Sampling may produce more accurate results because personnel of higher  Any variation between the sample characteristics and the population quality can be employed and given intensive training and because more characteristics is only a matter of chance careful supervision of the fieldwork and processing of results become feasible o Stratified Random Sampling when the volume of work is reduced  Mini-reproduction of the population  Population is divided into characteristics of importance for the SOME DEFINITIONS research then the population is randomly sampled within each Target Population category or stratum - The entire group a researcher is interested in  Stratified samples require a fairly detailed advance knowledge of the - The group about which the researcher wishes to gather information from and to draw population characteristics, and therefore are more difficult to conclusions construct Sampled Population Quota Sampling - The collection of elements from which the sample is actually taken - Researcher deliberately sets the proportions of levels or strata within the sample - Should coincide with the target population - Generally done to insure the inclusion of a particular segment of the population - Judgment about the extent to which these conclusions will also apply to the target - Proportions may or may not differ dramatically from the actual proportion in the population must depend on other sources of information population Frame - Researcher set a quota, independent of population characteristics - Population must be divided into parts that are called the sampling units or simply units o Stratum - Units must cover the whole of the population and they must not overlap, in the sense  Refers to a sub-population or level within a population that every element in the population belongs to one and only one unit o Quota  A proportional part or share BIOSTATS  Share assigned to each in a division or to each member of a body - Body o Main part of the table that contains the information or figures Non-Probability Sampling (Non-Representative Sampling) - Stubs or Classes - Researcher may not be able to obtain a random or stratified sample. Or it may be too o Classification or categories describing the data and usually found at the left expensive most side of the table - Caption Purposive Sampling o Designations or identifications of the information contained in a column, - A non-representative subset of some larger population, constructed to serve a very usually found at the top most of the column specific need or purpose o Snowball Sample  Achieved by asking a participant to suggest someone else who might be willing or appropriate for the study  Particularly useful in hard-to-track populations Convenience Sampling - A matter of taking what you can get - An accidental sample - Volunteers would constitute a convenience sample METHODS OF DATA PRESENTATION Textual Method - A narrative description of the data gathered Tabular Method - A systematic arrangement of information into columns and rows TYPES OF FDT Graphical Method Qualitative or Categorical FDT - An illustrative description of the data - The data are grouped according to some qualitative characteristics - Data are grouped in non-numerical categories FREQUENCY DISTRIBUTION TABLE (FDT) Table 2. Frequency Distribution of the Gender of Respondents of a Survey - A statistical table showing the frequency or number of observations contained in each of the defined classes or categories Parts of a Statistical table - Table Heading o Includes the table number and the title of the table BIOSTATS Quantitative FDT 4. Enumerate the classes or categories - The data are grouped according to some numerical or quantitative characteristics 5. Tally the observations Table 3. Frequency Distribution for the Weights of 50 Pieces of Luggage Note: sometimes the number of classes (k) is not followed. Extra class es will be added to accommodate the highest observed value in the data set and a class will be deleted if it turns out to be empty 6. Compute for the values in other columns of the FDT as deemed necessary a. True Class Boundary (TCB) i. Lower True Class Boundary (LTCB) 𝟏 𝑳𝑻𝑪𝑩 = 𝑳𝑳 − 𝒖𝒏𝒊𝒕 𝒐𝒇 𝒎𝒆𝒂𝒔𝒖𝒓𝒆 𝟐 ii. Upper True Class Boundary (UTCB) STEPS IN THE CONSTRUCTION OF A FDT 1. Determine the RANGE (R) 𝟏 𝑼𝑻𝑪𝑩 = 𝑼𝑳 + 𝒖𝒏𝒊𝒕 𝒐𝒇 𝒎𝒆𝒂𝒔𝒖𝒓𝒆 𝟐 𝑹 = 𝒉𝒊𝒈𝒉𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 − 𝒍𝒐𝒘𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 2. Determine the NUMBER OF CLASSES (k) b. Class Mark (CM)  Midpoint of the class interval where the observations tend to cluster about 𝒌 = √𝑵 Where: N = total number of observations in the data set 𝟏 𝟏 𝑪𝑴 = (𝑳𝑳 + 𝑼𝑳) = (𝑳𝑻𝑪𝑩 + 𝑼𝑻𝑪𝑩 ) 3. Determine the CLASS SIZE (c) by calculating first the preliminary class size c’ 𝟐 𝟐 𝑹 c. Relative Frequency (RF) 𝒄′ =  The proportion of observations falling in a class and is expressed in 𝒌 percentage Conditions for the actual c: a. Should have the same number of decimal places as in the raw data 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 b. Should be odd in the last digit 𝑹𝑭 = 𝑵 BIOSTATS 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 - Can attract attention and hold the readers’ interest %𝑹𝑭 = 𝒙𝟏𝟎𝟎% - Simplifies concepts that would otherwise have been expressed in so many words 𝑵 - Can readily clarify data, frequently bring out hidden facts and relationships QUALITIES OF A GOOD GRAPH d. Cumulative Frequency (CF) – accumulated frequency of the classes Accurate i. Less Than CF (CF) – total number of observations whose values - Care should be taken so as not to create any optical illusion are not less than the lower limit of the class Clear - Can be easily read and understood e. Relative Cumulative Frequency (RCF) - Should focus on the message it is trying to communicate i. Less Than RCF (RCF) - Must be able to aid the reader in the interpretation of facts Simple - Basic design of a graph should be simple, straightforward, not loaded with irrelevant, superfluous, or trivial symbols and ornamentation - There should be no distracting elements in a chart that inhibits effective visual communication Good Appearance - Designed and constructed to attract or catch attention by holding a neat, dignified and professional appearance - Must be artistic COMMON TYPES OF GRAPH Scatter Graph - Used to present measurements or values that are thought to be related Line Graph GRAPHICAL PRESENTATION OF DATA - Especially useful for showing trends over a period of time - A GRAPH or a CHART is a device for showing numerical values or relationships in Pie Graph pictorial form - Useful in showing how a total quantity is distributed among a group of categories Column & Bar Graph Advantages of Graph or a Chart - Applicable only to grouped data - Main features and implication of a body of data can be seen at once - Should be used for discrete, grouped data of ordinal or nominal scale BIOSTATS GRAPHICAL PRESENTATION OF THE FDT Sample Mean Frequency Histogram ∑𝒏 𝒊=𝟏 𝑿𝒊 - Bar graph that displays the classes on the horizontal axis and the frequencies of the ̅= 𝒙 classes on the vertical axis 𝒏 Relative Frequency Histogram - Graph that displays the classes on the horizontal axis and the relative frequencies on Median the vertical axis - Positional middle of an array Frequency Polygon - In an array, one-half of the values precede the median and one-half follow it - Line chart that is constructed by plotting the frequencies at the class mark and - First step in calculating the median, denoted by Md, is to arrange the data in an array connecting the plotted points by means of straight lines o Let Xi be the ith observation in the array (𝑁+1) (𝑁+1) - Closed by considering an additional class at each end and the ends of the lines are o If N is odd, the median position equals [ 2 ], and the value of the [ 2 ]th brought down to the horizontal axis at the midpoints of the additional classes observation in the array is taken as the median Ogives o If N is even, the mean of the two middle values in the array is the median - Graphs of the cumulative frequency distribution - FORMULA: o CF plotted against LTCB 𝑴𝒅 = 𝟐 LESSON 3 Mode - Observed value that occurs most frequently MEASURE OF CENTRAL TENDENCY - Locates the point where the observation values occur with the greatest density - Any single value that is used to identify the “center“ of the data or the typical value. - Does not always exist, and if does, it may not be unique - Often referred to as the average. - Not affected by extreme values - Can be used for qualitative as well as quantitative data THREE KINDS OF AVERAGES o Unimodal – there is only one mode Mean (Arithmetic Mean) o Bimodal – there are two modes - Most common average o Trimodal – there are three modes - Sum of all the values of the observations divided by the number of observations o Etc. Population Mean MEASURES OF LOCATION OR FRACTILES ∑𝑵 𝒊=𝟏 𝑿𝒊 - Values below which a specified fraction or percentage of the observations in a given 𝝁= set must fall 𝑵 BIOSTATS THREE TYPES OF LOCATION Median PERCENTILE - Possible only if it can be assumed that the values of the observations falling in the - Values that divide a set of observations in an array into 100 equal parts median class are evenly spaced throughout the class - FORMULA - Median class is the class containing the median 𝒊 (𝒏 + 𝟏) o Locate the class with

Use Quizgecko on...
Browser
Browser