Research Methodology, Biostatistics & Evidence Based Medicine PDF

Summary

This document provides an overview of research methodology, biostatistics, and evidence-based medicine. It covers topics such as research ethics, sampling techniques, statistical analysis, and various study designs. The content is relevant to students and professionals in the field of medicine and related disciplines.

Full Transcript

Research Methodology, Biostatistics and Evidence Based Medicine 2024 FACULTY OF MEDICINE Level 2 -Semester 4 Staff members of Public Health & Community medicine Depatrment of Community medicine 2024 1 Resea...

Research Methodology, Biostatistics and Evidence Based Medicine 2024 FACULTY OF MEDICINE Level 2 -Semester 4 Staff members of Public Health & Community medicine Depatrment of Community medicine 2024 1 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Table of Contents Introduction to research & Research ethics 3 Sampling 9 Descriptive studies 15 Survey study 23 Analytical Studies: Case control study 25 Analytical Studies: Cohort study 30 Experimental (Interventional) studies 37 Role of screening tests in disease diagnosis 42 Medical Statistics I 48 Medical Statistics II 60 Common Frequency Measures to Summarize Information 70 Inferential Statistics 73 Scientific writing 78 Critical appraisal 86 Measurements of morbidity 91 Measurements of mortality 94 References 101 2 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Introduction to Research Methodology & Research Ethics Definition of research:  The word “research” originated from the old French word “recerchier” meaning to search and search again.  Research is a scientific approach of answering a research question, solving a problem or generating new knowledge through a systematic and orderly collection, organization, and analysis of information with an ultimate goal of making the research useful in decision-making. Steps of research: Systematic research in any field involves the following basic steps: 1. Asking the research Question 2. Understanding the nature of problem to be studied and identifying the related area of knowledge. 3. Reviewing literature to understand how others have approached or deal with the problem. 4. Data collection: It refers to observing, measuring, and recording information. 5. Data analysis: It refers to arranging and organizing the collected data so that we may be able to find out what their significance is and generalize about them (Testing the hypotheses). 6. Report writing: It is an inseparable part and a final outcome of a research study (Drawing conclusions and making generalizations). Its purpose is to convey information contained in it to the readers or audience. 3 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Difference between Research problem and research question: The key difference between research problem and research question is that a research problem refers to an issue, difficulty, or gap in knowledge that is being addressed in research, whereas a research question is a statement that is in the form of a question aiming to study, learn, examine, and explore more about the research topic. Research problem and research question are two important aspects of a research study. Although some people assume that they are the same, they are not. Types of Research: 1. According to type of data: Qualitative, quantitative & action research: data take the form of words or numbers? A. Epidemiological Quantative Research: Aims at: 1. Measurement of disease frequency 2. Disease determinants 3. Evaluate screening test 4. Measure morbidity and mortality 5. Determination of prognostic factors 6. Testing new treatments or new vaccines B. Qualitative Behavioral / socioeconomic and cultural research Aim at: understand underlying behavior for a phenomena. 4 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Example: Quantative study may reveal findings about prevalence of tobacco smoking/. Qualitative study can explore why you smoke? What to know about risks of smoking? C. Action research/ health system research Aim at: Identify priority problems, and to design and evaluate policies and programs that reduce greatest benefits with optimal use of resources. Example: Research to identify the immunization coverage differences in various centers. 2. According to source of data: Primary vs. secondary: methods to collect original data, or will you use data that has already been collected by someone else? 3. According to objective: Observational vs. experimental: the study will take measurements of something as it is, or will you perform an experiment? 4. According to application: Basic vs. applied. A. Basic research: this type of research is related to new discoveries and molecular basic science e.g. genome studies B. Applied research: this research is regarded as the potential improve heath or quality of life further. 5 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Research Ethics Definition of research ethics: Research Ethics are the moral principles that a person must follow, irrespective of the place or time. Behaving ethically involves doing the right thing at the right time. It is clear that research ethics should include: Protections of human and animal subjects. What makes research ethical? 1. Social value: lead to improvements in health. 2. Scientific validity: complete description of the purpose of study 3. Fair subject selection: what are the inclusion and exclusion criteria o healthy volunteers or patients? 4. Favorable risk benefit ratio: Balance in the risks and benefits of the research 5. Informed consent. 6. Qualification of investigators. Informed consent Definition of Informed consent: Patient agrees with the medical activities. In children or incapacitated persons the consent is given by parent/guardian 6 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Content of Informed consent: 1-Name of Institution 9. Discomforts and Risks 2-Title of Project 10. Potential Benefits 3-Principal Investigator 11. Statement of Confidentiality 4-Other Investigators 12. Costs for Participation 5-Participant’s Names 13. Compensation for Participation 6-Purpose of Research 14. Research Funding 7-Procedures (main & alternatives) 15. Voluntary Participation 8-Time and Duration of Procedures and 16. Contact Information for Questions or Study Concerns Principles of Biomedical Ethics: 1. Respect for autonomy 2. Beneficence 3. Non-maleficence 4. Justice 5. Fidelity 1. Autonomy: Respect for the individual and their ability to make decisions with regard to their own health and future. 2. Beneficence: Actions intended to benefit the patient or others by good care in different medical fields. Preventing and amelioration of health problems in the community. 7 Research Methodology, Biostatistics and Evidence Based Medicine 2024 3. No-maleficence: Actions intended not to harm or bring harm to the patient and others. Calculate risk: When harm may occur, compensation and compelling possibility of benefits to justify the action. 4. Justice: Being fair or just to the wide community in terms of the consequences of an action. a) Procedural Justice: Distribution according to established rules (first come, first served) b) Distributive Justice: specially with limited resources c) Compensatory Justice: as punishment to combat discrimination. 5. Fidelity: Is the duty to observe the action made by the profession. a) Truthfulness: telling the patient & his relatives the truth to decide without direct paternalism. b) Confidentiality: Access to patient’s record is legally by health professionals but should be limited and using fake names when presenting cases in conferences and teaching situations. 8 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Sampling Definition: is a procedure to obtain information about only a part of a population. Sample is a subset of population that is used to gain information about the entire population. A good sample will represent the population well. Advantages of sampling:  Lower cost.  Saves time.  Provides more intensive and accurate investigations and information.  It eliminates bias. Difference between target population and sample:  The target population: is the total group of individuals from which the sample might be drawn.  A sample is a subset of population that is used to gain information about the entire population. Precautions in Sampling 1. It must be well chosen (Representative to the parent population). 2. Sample must be sufficiently large to minimize sampling variation. 3. Adequate coverage of the sample to avoid sample bias. Methods of sampling I. Non probability sample 1. Convenience sampling: Convenience sampling is a method of collecting samples by taking samples that are conveniently located around a location or Internet service. Aim to explore e.g. to get the feel of the situation. 9 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Examples: 1. First 10 patients in the clinic. 2. Students in the library 2. Purposive samples:  Aim to serve a very specific purpose. Example: You want to know more about the opinions and experiences of disabled students at your university, so you purposefully select a number of students with different support needs in order to gather a varied range of data on their experiences. 3. Quota samples:  This type of non-probability sample is used in sampling public opinions.  Population is 1st divided into sub-groups, as in stratified sampling.  Then judgment is used to select subjects or units from each sub-group based on a specified proportion. e.g., an interviewer may be told to sample 200 females & 300 males between age of 45 & 60.The 2nd step makes it non-probability sampling, as selection of sample is non-random & sample may be biased as not everyone gets a chance of selection. 4. Snow-ball sampling  Also known as chain sampling or sequential sampling, it is used where one respondent identifies other respondents (from his/her friends or relatives).  This kind of sampling is adopted in situations where it is difficult to identify the members in a sample.  As snowball increases on adding more snow, samples increase in this technique until we collect enough data to analyze. Hence, it is named snowball sampling. 10 Research Methodology, Biostatistics and Evidence Based Medicine 2024 II. Probability sample (Random Sample) Every unit in the sampled population has equal probability or chance of being selected. It is the recommended method; generalization can be made to the parent population. Types of Random samples 1. Simple Random Sample 2. Systematic Random Sample 3. Stratified Random Sample 4. Cluster Sample 5. Multistage Random Sample 1. Simple Random Sample:  Population should be uniform or homogenous.  Each unit in population has equal & independent chance to be selected.  List all units in the population randomly (sampling frame).  Select required number using random number table, lottery, or computer programs that are based entirely on chance.  Advantage: Simple to conduct.  Disadvantages: 1. Requires sampling frame. 2. Unsuitable for heterogeneous population (age, gender, etc.) Figure (1): Simple random sampling 11 Research Methodology, Biostatistics and Evidence Based Medicine 2024 2. Systematic Sampling:  Systematic selection from the sampling frame  Selection process starts by picking some random point in the list and then every nth element is selected until the desired number is secured.  To select 100 subjects from a sampling frame of 1000, we should select 1 subject from each 10.  The 1st subject is selected randomly, then select other subjects at regular intervals, e.g., if 1st subject is the 5th then the selected subjects will be 5th, 15th, 25th, ….  Advantage: Easy to conduct.  Disadvantages: 1. Requires sampling frame. 2. Only the 1st subject is selected randomly. Figure (2): Systematic Random Sample 12 Research Methodology, Biostatistics and Evidence Based Medicine 2024 3. Stratified random sampling:  Used when population is heterogeneous, i.e. people in the population differ based on the relevant characteristic (e.g. gender, age range, income, job).  Divide heterogeneous population into homogeneous strata.  Then select simple random sample from each stratum.  Proportional allocation should be respected.  Advantages: The most suitable procedure to ensure representativeness  Disadvantage: More expensive & time consuming. Figure (3): Stratified Random Sample 4. Cluster Sample:  Sample unit is a group not an individual (family, school class, ….)  They are selected randomly from all groups of the same type.  All members of the selected group will be included in the study.  Advantages: Field is concentrated (simple & cheaper.)  Disadvantages: It cause errors if the studied disease or variable is clustered in the population. 13 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Figure (4): Cluster Sample 5. Multistage Sample:  Sampling in 2 or more stages. E.g. 2-stage sample of school children:  1st stage: a random sample of schools.  2nd stage: random sample of children from each selected school.  May have more than 2 stages, e.g., select town, districts, streets & finally houses.  Advantages:  Concentrate resources  Need no sampling frame for the whole population.  Only a list of the 1st-stage units is required.  Frame is needed only for the selected last-stage units. Figure (5): Multistage Sample 14 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Descriptive studies Health: The World Health Organization (WHO) defines health as: a state of complete physical, mental & social well-being and not merely absence of disease or infirmity and the ability to lead a socially and economically productive life. What is Epidemiology? “Epidemiology is the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems.” Uses of Epidemiology: 1. Identify disease pattern among population (increases or decreases or stationary). 2. Identify risk determinants of the diseases (discover the agent, host, and environmental factors that affect health and disease. 3. Determine the relative importance of causes of illness, disability, and death. 4. Identify those segments of the population that have the greatest risk from specific causes of ill health; and 5. Evaluate the effectiveness of health programs and services in improving population health. 15 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Epidemiologic Research Methodology Epidemiological studies can be classified as either observational or experimental. 1. Observational studies Observational studies allow nature to take its course: the investigator measures but does not intervene. They include studies that can be called descriptive or analytical: A descriptive study is limited to a description of the occurrence of a disease in a population and is often the first step in an epidemiological investigation. An analytical study goes further by analyzing relationships between health status and other variables. 2. Experimental studies Experimental or intervention studies involve an active attempt to change a disease determinant such as; an exposure or a behavior or the progress of a disease through treatment, and are similar in design to experiments in other sciences. Figure (1): Types of epidemiological study design 16 Research Methodology, Biostatistics and Evidence Based Medicine 2024 I. Observational Studies A. Descriptive studies: 1. Case report 2. Case series 3. Correlation studies 4. Cross-sectional studies B. Analytic studies: 1. Case control 2. Cohort study II. Experimental Studies 1. Treatment trials 2. Prevention trials 3. Diagnostic trials 4. Screening trials 5. Quality of life trials 17 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Quality of research studies from weak to strong according to evidence based medicine: 1. Descriptive studies: Case report-case series- Ecological studies - Cross sectional studies. 2. Analytical studies: Case control studies- Cohort studies. 3. Clinical trials. 4. Systematic Reviews. 5. Met analysis studies. A. Descriptive studies:  First phase in the epidemiological investigation.  Describes the disease distribution in the population  Give data about : - When the disease occur (Time) - Where the disease occur (Place) - Who is getting the disease (Person) Types of descriptive studies: 1-Case Reports:  Presentation of a single case that is newly reported or has unique finding e.g.  un-described disease  Explain or describe link between diseases  Report new therapeutic effect 2-Case Series:  Case series describe characteristics of a number of patients with a given disease.  It aims as well to describe unusual variations of a disease.  It can also generate a hypothesis. 18 Research Methodology, Biostatistics and Evidence Based Medicine 2024 3-Ecological studies:  Ecological (or correlational) studies are useful for generating hypotheses.  Community based: the units of analysis are groups of people rather than individuals.  Comparing populations in different places at the same time or, in a time series, by comparing the same population in one place at different times.  They are extracted from different data sources like vital statistics, censuses and national health surveys. 4-Cross-Sectional Studies (Prevalence studies):  An “observational” design that surveys exposures and disease status at a single point in time (a cross-section of the population). For this reason you cannot determine if really exposure preceded disease or not.  It measures the prevalence of health outcomes or determinants of health, or both, in a population at a point in time. Figure (3): Cross sectional study design 19 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Examples of cross-sectional studies:  Investigate the presence of diabetes in relation to obesity.  Investigate the presence of musculoskeletal problems among radiologist.. Table (1): Advantages and disadvantages of Cross-sectional Studies: Advantages of Cross-sectional Disadvantages of Cross-sectional Studies Studies  Used to study conditions that are  It is not useful for studying acute relatively frequent with long diseases or diseases with seasonal duration (chronic conditions) variations or highly fatal diseases  Good for generating hypotheses  Impractical for rare diseases about the cause of disease  It gives no information about the rate of  Can estimate overall and specific occurrence of new cases (incidence disease prevalence rates rate)  Can estimate exposure proportions  It gives very little information about the in the population. natural history of diseases  Relatively easy, quick and  Does not allow to determine which inexpensive. came first  It is the first step to develop  It does not provide solid evidence for evidence for causal association causal association as it does not determine if really exposure preceded disease or not 20 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Prevalence Rate: Definition: The prevalence is the frequency of existing cases in a defined population at a given point in time. Prevalence equation: Prevalence (P) of a disease is calculated as follows: P= Number of people with the disease or condition at a specified time X 10n (old+ new cases) Total Number of examined population in the same locality and time Prevalence is often expressed as cases per 100 (percentage), or per 1000 population. In this case, P has to be multiplied P by the appropriate factor: 10n. There are several factors that determine prevalence. Training Exercises: (1) A Study involved 2000 students in Horus collage of Medicine to assess the prevalence of hypertension. They found that 500 students were hypotensive, 1300 were normotensive and the remaining students were hypertensive. The calculated prevalence of hypertension among students is: a) 10% b) 20% c) 25% d) 65% 21 Research Methodology, Biostatistics and Evidence Based Medicine 2024 (2) A Study involved 500 students in Mansoura collage of medicine. The objectives were to assess the prevalence of obesity. They found that 50 students were underweight, 350 were normal weight and 100 students were obese. A. State the study design? ……………………………………………………………… B. Calculate prevalence rate of obesity? ……………………………………………………………… (3) Look at the table, and then answer the following questions. Table: Prevalence of HBV infection among surgeons in different dental centers at Mansoura, 2000. Center 1 Center 2 Center 3 HBV infection (n = 85) (n = 32) (n = 50) No. % No. % No. % Present 5 6.0 20 62.5 30 60.0 Absent 80 94.0 12 37.5 20 40.0 1. What is the epidemiologic study used to obtain these findings? 2. What are the advantages of this study? 3. Mention the numerator and the denominator of this rate (prevalence rate) from the above table. 22 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Application on Cross sectional studies (Survey study) Definition: Survey study is a descriptive field study carried out to investigate community problems or particular diseases. Classification: According to target population: 1. Local survey: Confined to local community e.g. school, camp or factory. 2. National survey: cover the whole country. According to purposes: 1. One purpose survey: Investigates a certain disease or health problem. 2. Multipurpose survey: Investigates more than one disease or health problem Objectives: 1. Case finding:  Pre-clinical phase: screening procedures can reveal early pathological changes.  Clinical phase with symptoms & signs.  Complicated cases with a more advanced pathological damage. 2. To determine the magnitude of the problem in the community. 3. To determine characteristic features of disease or health problem according to ecological factors of host, agent, & environment. 4. Using collected data for planning & evaluation of prevention & control program 23 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Steps of conducting a survey study: (1) Preliminary stage a) Clarifying the purpose b) Previewing the literature c) Ethical consideration d) Formulating the topic (2) Planning Define: a) Objectives of the study. b) Type of the study :Descriptive, analytical, etc. c) Target population: (Total population-Sampling& sample size) d) Time table (3) Preparation for of the requirement of the survey (Man power, Material & Money);  Team of work  Equipment & apparatus needed  Sheet of the study for questionnaire  Facilitators of works  Transport  Finance  Approach to community leader (4) Collection of data (Interviews-Questionnaires-Investigations). (5) Data tabulation & analysis. (6) Writing reports: it includes: Results-Interpretation-Discussion-Recommendation 24 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Analytical Studies: 1. Case Control study In Analytic Epidemiology we investigate the relation between: The exposure and outcome (disease) Exposure Outcome (disease)  These studies are used to test an etiologic hypothesis such as smoking and oral cancer; excess carbohydrates and dental caries, etc. Figure (1): Relation between exposure and outcome Types of analytical studies: 1. Case-control (case-referent, Retrospective) studies It is an “observational” in which we assess the frequency of exposure to specific risk factor (suspected etiological factors) in patients who have developed a disease and it is compared with that of controls or referents who do not have the disease. 25 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Figure (2): Case Control Study Design  Case-control studies provide a relatively simple way to investigate causes of diseases, especially rare diseases.  The investigator is looking backward from the disease to a possible cause (retrospectively) by direct questioning and or extracted from clinical records. Steps to conduct a case control study: 1. Identify the study group (cases) Define criteria for diagnosis and the criteria for inclusion and exclusion of cases. 2. Identify controls: (needed for comparison)  Controls must be free from the studied disease.  Controls Matching: controls selection must consider proper matching with cases for certain characteristics, which are known to influence the outcome of the disease (confounding factors) e.g. age, sex, social class.  Number of control group should be equal to or more by 2-4 times as much as the cases 26 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Sources of controls: 1. General population 2. Hospital controls 3. Special control series (Friends - neighbours - fellow employees or peers - family members) Table (1): Data are summarized in 2x2 table: Disease status Exposure Total Cases Controls (a) (b) (a+b) Yes (+) Diseased Not diseased Total exposed with exposure with exposure (c) (d) (c+d) No (-) Diseased Not diseased Total without exposure without exposure non-exposed (a+c) (b+d) (a+b+c+d) Total Total cases Total control Grand total The calculated measures are: (1) P1 proportion of the exposed among cases P1= a/a + c (2) P2 proportion of the exposed among controls P2= b/b + d (3) Relative contribution = P1 – P2 It represents relative contribution of the suspected cause to the total frequency of the disease 27 Research Methodology, Biostatistics and Evidence Based Medicine 2024 (4) Odds ratio (OR) Odd of exposure to risk factor among cases (R1) = a/c Odd of exposure to risk factor among controls (R2) = b/d Odds ratio = R1/R2 = a/c / b/d = ad/bc  It is the indirect estimation of the risk Interpretation of the Odds Ratio OR = 1 i.e. Outcome or disease is not associated with Exposure OR > 1 i.e. Cases had more frequencies of exposure than control OR < 1 i.e. Cases had less frequencies of exposure than control Example: Excess carbohydrate consumption and development of Diabetes Results from a case control study. Exposure Diabetic cases Controls Excess carbohydrate consumption 32 16 Few carbohydrate consumption 4 48 Total 36 64 P1= a/a + c = 32/36 x100= 88.9% P2= b/b + d = 16/64 x100=25% Relative contribution = P1 – P2 = 63.9% It represents relative contribution of excess carbohydrate consumption to the total frequency of diabetic cases. Odds ratio ≈ ad/bc , Odds ratio = (32x48) / (16x4) = 24 Interpretation of OR: Diabetic cases are 24 times more consuming excess carbohydrates than control. 28 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Advantages of case control studies: 1. Easy to conduct. 2. Quick and cheap. 3. Allows the study of several risk factors. 4. Useful in the study of disease with a long latency. 5. Can be used in rare diseases. 6. Does not require large samples. 7. Can prove hypothesis (Exposure & Disease are related). 8. Can estimate risk (odds ratio). Disadvantages of Case Control Study: 1. Cannot calculate rates (Incidence, prevalence rate). 2. Not useful in rare exposure. 3. Liable to bias. b. Recall bias: depend on memory or records for exposure c. Selection bias: include bias in identifying cases or non-cases. 4. Not easy to estimate the time elapsed between exposure and disease occurrence 29 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Analytical Studies: 2. Cohort (follow-up or incidence) studies  An “observational” design, it begins with a group of people who are free of disease, and who are classified into subgroups according to exposure to a potential cause of disease or outcome. Variables of interest are specified and measured and the whole cohort is followed up to see how the subsequent development of new cases of the disease (or other outcome) differs between the groups with and without exposure.  Cohort: is a group sharing a common characteristic. It is the best observational studies: as  Investigators ensure that the exposure to the risk factors to be studied occur before the development of the disease.  The incidence of the disease is measured directly.  They provide the best information about the causation of disease  The most direct measurement of the risk calculation of developing disease.  It requires long periods of follow-up since disease may occur a long time after exposure e.g. the induction period for oral cancer caused cigarette smoking is many years and it is necessary to follow up study participants for a long time. 30 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Figure (1): Cohort study design Steps to conduct Cohort study : 1. First we exclude cases of disease under investigation. 2. The free cohort, divided into 2 groups: 3. Exposed group: individuals exposed to risk factor. 4. Control group: individuals not exposed to this factor. 5. Both groups are followed up over a sufficient period of time. Therefore the cohort should be stable, cooperative & accessible to the investigator. 6. If the incidence of disease among exposed group is higher than its incidence among non exposed group, this supports the etiological hypothesis. 31 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Table (4): Analysis of cohort studies: Disease status Exposure Total Present Absent (a) (b) (a+b) Yes (+) with exposure with exposure Total exposed with disease without disease (c) (d) (c+d) No (-) Without exposure without exposure Total non- with disease Without disease exposed (a+c) (b+d) (a+b+c+d) Total Total with Total without Grand total disease disease Calculated measures are: (1) Overall incidence = a + c / a+b+c+d (2) Ie Incidence rate among the exposed =(a/a+b) (3) In Incidence rate among the non-exposed=(c/c+d ) (4) Relative risk = Incidence among exposed (Ie) Incidence among non exposed (In) RR answers "How many times exposed person is at risk of developing disease compared to non-exposed?" (5) Attributable risk (AR) = Ie - In AR answers the question: "How much of the studied disease can be attributed to exposure“. 32 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Interpretation of the Relative Risk results (RR) RR = 1 i.e. Exposure is not associated with outcome or disease RR > 1 i.e. Exposed group had greater frequencies of disease than non- exposed group (increased exposure accompanies with increased outcome) RR < 1 i.e. Exposed group had less frequencies of the disease than non-exposed group (Increased exposure accompanies decreased outcome so it is protective factor) Advantages of Cohort Studies: 1. Can study multiple outcomes from single exposure. 2. You are sure that exposure happened before outcome. 3. Can calculate incidence rates, relative risk and attributable risk. 4. Suitable for rare exposures. 5. Less bias in control selection. Disadvantages of Cohort Studies: 1. They are inefficient for evaluation of rare disease. 2. Prolonged follow up can cause drop out of cases. 3. Expensive and time consuming. 4. Involve large samples. 5. Study subjects may change habits over time e.g stop smoking, change occupation. 6. Diagnostic methods may change over time. 33 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Example: Association between cigarette smoking and Lung cancer Develop Do not Exposure Total Lung cancer Develop Lung cancer Smokers 100 2900 3000 Non-smoker 100 4900 5000  Incidence in smokers = 100/3000 = 33.3/1000  Incidence in non-smokers = 100/5000 = 20/1000  Relative risk = 33.3/20 = 1.66 Interpretation: Smokers are at a higher risk of developing lung cancer 1.66 times the risk of non-smokers.  Attributable risk= 33.3% -20% = 13.3 % Interpretation: 13.3% of Lung cancer cases among smokers (exposed group) is attributed to their smoking. Comparison of Advantages and Disadvantages of Case-Control and Cohort Studies Case-Control Studies Cohort Studies 1. Begin by disease and investigate the 1. Begin by exposure and investigate exposure the development of the disease 2. Compare diseased and non-diseased 2. Compare exposed and non-exposed individuals individuals 3. Retrospective 3. Prospective 34 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Advantages Advantages 1. Easy to carry. 1. Can study multiple effects from 2. Quick and cheap. single exposure. 3. Can be used in rare diseases. 2. You are sure that exposure happened 4. Allows the study of several risk before outcome. factors. 3. Can calculate incidence rates, 5. Useful in the study of disease with a relative risk and attributable risk. long latency. 4. Less bias in control selection. 6. Does not require large samples. 7. Can prove hypothesis (Exposure & Disease are related). 8. Can estimate risk (odds ratio). Disadvantages Disadvantages 1. Cannot calculate rates. 1. They are inefficient for evaluation of 2. Not useful in rare exposure. rare disease. 3. Liable to bias. 2. Prolonged follow up can cause drop 4. Not easy to estimate the time elapsed out of cases. between exposure and disease 3. Expensive occurrence 4. Time consuming. 5. Involve large samples. 6. Study subjects may change habits over time. 7. Diagnostic methods may change over time 35 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Training exercise (1): Look at the table, and then answer the following questions. Table: Incidence of iron deficiency anemia among male and female primary school children at New Damietta, 2021. Iron deficiency anemia Yes No Total Female children 46 1438 1484 Male children 18 1401 1419 1. What is the type of this epidemiologic study? ……………………………………………………………………………….. 2. Calculate the following:  Incidence rate of iron deficiency anemia among females ……………………………………………………………………………  Incidence rate of iron deficiency anemia among males ……………………………………………………………………………  Relative risk of iron deficiency anemia for females versus males according to gender. ……………………………………………………………………………  What does the quantitative risk signify? ……………………………………………………………………………….. 36 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Experimental (Interventional) studies Definition of Clinical trials  Clinical trials are research activities that involve the application of a therapeutic or a preventive regimen to humans to evaluate its safety and efficacy.  The purpose of these studies to find better ways to prevent, detect, or treat diseases or to improve care for people with diseases. Uses:  Confirm an etiological hypothesis: Experimental studies are the best epidemiological study design to prove causation.  It is the best epidemiologic study design.  Assess the effectiveness and safety of a preventive or curative measure. (New treatments – pharmaceutical agents, devices, surgical procedures)  Before an intervention becomes a standard practice, assessment of its efficacy and safety in comparison to standard therapy should be undertaken. Types of clinical trials: 1. Therapeutic Trials: They test  New treatments  New combination of drugs  New approaches to Surgery  New Radiation therapy  Dietary therapy 2. Prevention Trials: They try to find better ways to prevent disease in people and to prevent disease recurrence using  Medicines 37 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Vaccines  Vitamins  Minerals  Life style changes 3. Diagnostic Trials: trials of validity of diagnostic measures as radiologic or laboratory measures.  To find better tests for diagnosis of a disease  To find better procedures for diagnosis of a disease 4. Screening Trials: To find out the best way to detect certain diseases or conditions early. 5. Life Style Trials: In chronically ill patients to explore the ways to improve comfort and to improve the quality of life Stages and Phases of clinical trials:  Stage I : in vitro and animal studies.  Stage II: Testing the drug in human, it has four phases Phase-I Trials : pre-clinical study Carried out on a small group on healthy volunteers of 20-80 persons ,within short period about two months. Purpose: 1. Gain basic safety information 2. Determine a safe dosage range 3. Gain information about drug side effects and toxicity 38 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Phase-II Trials Carried out on large group of 100-300 volunteer patients suffering from the disease that treated by the studied drug or procedure. It takes longer time than phase I. Purpose: 1. Evaluation of safety 2. Preliminary evaluation of efficacy 3. Evaluation of side effects and toxicity in treated patients Phase–III Trials (Classical Phase) Applied on large group 1000-3000 people are tested. Purpose: confirm 1. Drug efficacy 2. Monitor side effects 3. Compare with commonly used treatments 4. Gather information regarding safe use Phase-IV Trials Post-marketing studies in the field after availability and wide use of the drug. Purpose: Re-assess 1. Drug effectiveness 2. Drug safety and long-term side effects 3. Drug acceptability Steps for conducting Randomized clinical trials: 1. Study population selection that will participate in the study. 2. Get informed consent from the participants. 3. Classify the study group randomly into experiment and control group. 39 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Experiment group will receive the tested drug or procedure  Control group receive either: - Placebo (agent without effect) - Another drug to be compared with the tested drug (Positive control) - Nothing is given (Passive control) 4. Follow up strictly for the specific period. 5. Assess the outcome of the study (blood pressure control, blood glucose control, pain relief, …….). 6. Compare outcome results between both groups. To avoid bias in the study we should ensure blindness or masking Blinding in clinical trials (Masking): Definition: Hiding the knowledge of treatment received: 1) Single-blind trial: Subjects in the study population are unaware whether they are experiment or control group, so the used treatments should appear alike in all aspects (taste, color, smell, shape). 2) Double -blind trial: The subjects and the observers are unaware of the group allocation. 3) Triple -blind trial: The subjects, the observers and data analyst are unaware of the group allocation. 40 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Advantages of Randomized Clinical trials: 1. More accurate than any other types of research 2. Less biased in selection due to study population randomization 3. Blinding minimize bias analysis bias Disadvantages of Randomized Clinical trials: 1. Expensive 2. Time consuming 3. Ethical difficulties (in obtaining approval) 4. Loss of cases during follow up 5. Non compliance to treatment regimen Training: Read the following and answer the questions. In a day care nursery in Mansoura city, children at 18 months were randomly classified into two groups. One group was given commonly used influenza vaccine and the other group was given a new type of influenza vaccine. Parents were asked to record any side effects on a card, and mail it back after 2 weeks. 1. What is the type of the study? ……………………………………………………………………………………. 2. What are the phases of testing a new drug in human subjects? ……………………………………………………………………………………… ……………………………………………………………………………………… ……………………………………………………………………………………. 3. What are the disadvantages of this type of study design? 41 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Role of screening tests in disease diagnosis  Definition: Screening is the investigation of apparently healthy individuals to detect unrecognized cases or individuals with high risk of developing a disease. Therefore, intervention can be done to prevent occurrence of the disease or to improve its prognosis when it develops.  Aim of screening: (1) Not to diagnose, but to sort population into those who are & those who are not suspected to suffer from the disease or the defect of interest. (2) Intervention with the natural history of the disease for prevention & control of some chronic disease especially:  Disease in which pathology develops long before symptoms & signs & by time damage will be irreversible or difficult to be treated (e.g. phenylketonuria).  Diseases in which early intervention will decrease the risk of developing pathology (e.g. control of hypertension will decrease the risk of stroke, detection & treatment of carcinoma in situ will decrease the risk of cancer cervix).  High risk groups in which intervention will decrease the subsequent risk (e.g. erythroblastosis fetalis). Therefore, the disease can be detected before clinically manifested & the pathology can be reversed, arrested, retarded or alleviated. 42 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Screening test: a simple test applied on large number to exclude those free from disease & to pick up those possibly suffering from disease & subjected to detailed investigation to prove or disprove the diagnosis (i.e. reference test).  Difference between screening test & diagnostic test:  Nature of screening tests: Screening tests may be:  A clinical step (e.g., breast palpation),  A laboratory (e.g., glucose tolerance test for diabetes mellitus)  Other investigation (e.g., mammography).  Types of Screening: (1) Mass Screening offered to all individuals, irrespective of the presence of particular risk to the disease in question. This is not a useful preventive measure unless it is backed-up by treatment & follow-up facilities for positive screening. 43 Research Methodology, Biostatistics and Evidence Based Medicine 2024 (2) High Risk Screening offered to those with special risk, e.g., screening of close relative of known diabetics (a greater number of cases can be identified at less cost). (3) Multiphase screening for a variety of diseases at one time. This is a well- established procedure in antenatal care & school examinations. (4) Case-finding or opportunistic screening: applied to patients who consult a health practitioner for some other purpose  Requirements Of Screening Program regarding (The disease): 1. Importance of the disease: The disease should be an important health problem, i.e., high frequency and/or bad sequelae, e.g., congenital hypothyroidism, although rare, should be detected early because of its serious sequelae if untreated and because it is treatable. 2. Adequate understanding of the natural history of the disease: to identify the points at which the disease can be detected by screening with effective intervention before irreversible damage, to evaluate the effectiveness of any intervention 3. A recognized latent period or asymptomatic stage. 4. Can be detected before onset of symptoms and signs 5. At risk individuals can be identified and screened 6. Available facilities for diagnosis and treatment. 7. Agreed policy on whom to treat as patients 8. An effective treatment, available, effective and acceptable 9. Benefits of early detection exceeds risks and costs (money , manpower and equipment). 44 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Requirements of screening program regarding (The screening test): A. General requirements: 1. Simple not too many steps involved to avoid errors. 2. Inexpensive for mass application. 3. Least time consuming. 4. Not painful. 5. Objective rather than subjective. 6. Acceptable by the population B. Special requirements: 1. Precise & reliable; gives the same results when repeated under standard conditions. 2. Valid (sensitive & specific); test is accurate giving true, not false reading (+ve or -ve)  Validity and reliability of screening test:  Validity: the test is valid if it correctly categorizes people into groups with and without disease, as measured by its sensitivity, specificity, and predictive values.  Validity is the extent, to which a test measures what it aims to measure, i.e., it is the capacity of a test to give true results.  Validity include: 1. Sensitivity: The ability of the test to identify correctly those who have the disease, i.e., it gives few false negative results. (When the disease is present, how often does the test detect it?) 45 Research Methodology, Biostatistics and Evidence Based Medicine 2024 2. Specificity: The ability of the test to identify correctly those who do not have the disease, i.e., it gives few false positive results. (When the disease is absent, how often does the test provide a negative result?) 3. Predictive values of a screening test: is the proportion of individuals correctly labeled diseased and non-diseased by the test.  A false positive test can lead to needless anxiety, exposes individuals to the costs and risks of further investigation and perhaps unnecessary treatment, and imposes economic burdens on the health-care system that would better be avoided.  A false negative screening test result could have disastrous consequences if persons suffering from early cancer are incorrectly reassured that there is nothing wrong with them and they missing early start of treatment. Table (I): Validity of screening test GS: Gold standard 46 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Sensitivity: = probability of a positive test in people with the disease = a/ (a+c) Specificity: = probability of a negative test in people without the disease = d/ (b+d) Positive predictive value: probability of the person having the disease when the test is positive = a /(a+b) Negative predictive value: probability of the person not having the disease when the test is negative = d/ (c+d). Accuracy: = (a+d) / (a+b+c+d) Reliability (Repeatability):  It is the level of agreement between repeated measurements; therefore, a technique will give the same values on repeated application on the same individual.  A test is reliable if it provides consistent results when repeated under standard conditions.  It measures the instability in both: 1) Subject (biological) variation either random or systemic. 2) Observer (measurement) variation either random with the same observer or systemic between different. 47 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Medical statistics I Definition: Medical Statistics definition encompasses the methods of collecting and summarizing data (Descriptive Statistics), analyzing and drawing conclusions from them (Inferential Statistics) in the field of medicine. Importance of Medical Statistics: 1. Presenting facts 2. Simplifying mass of figures & reducing the volume of data 3. Facilitating comparison 4. Helping in formulating and testing hypothesis 5. Helping in prediction, planning and administration 6. Helping in the formulation of suitable policies 7. Serving in measuring the standard of health Basic Terms:  Population: A collection, or set, of individuals or objects or events whose properties are to be analyzed.  Sample: A subset of the population.  Data (data is raw): The value of the variable associated with elements of a population or sample. This value may be a number, a word, or a symbol. Data consist of discrete observations of variables that carry no or little meaning when considered alone. Data need to be processed into information.  Information: Information is data that has been given meaning by way of relational connection.  Parameter: A numerical value summarizing data of an entire population.  Statistic: A numerical value summarizing the sample data. 48 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Types of Variables Definition: A variable is a quantity that is measured or observed in an individual and which varies from person to person. A variable is something whose value can vary. For example, age, sex and blood type are variables. Data are the values you get when you measure a variable. For example, 30 years (for the variable age), or female (for the variable sex). Types of variables: A. Quantitative (Metric) variables: Definition: Quantitative variables are variables that can be measured numerically and may be continuous or discrete. 1. Continuous variables lie on a continuum and so can take any value between two limits and have measuring units.  Weight is a continuous variable because it is measured using weighing scales. 2. Discrete variables do not lie on a continuum and can only take certain values, usually counts (integers) and have counted units.  The number of previous pregnancies in a pregnant woman is discrete variable since it is counted and only whole numbers are possible. B. Qualitative (Categorical) variables: Definition: Qualitative variables are variables where individuals fall into a number of separate categories or classes and do not have any units of measurement. 1. Nominal variables the categories are not ordered but simply have names. The ordering of the categories is completely arbitrary. In other words, the categories cannot be ordered in any meaningful way. 49 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Examples include blood group (A, B, AB, and 0) and marital status (married/widowed/single etc.). We can’t say that being in any particular category is better, or shorter, or quicker, or longer, than being in any other category. 2. Ordinal variables the categories are ordered in some way. The ordering of the categories is not arbitrary as it was with nominal variables. It is now possible to order the categories in a meaningful way. However, the difference between any pair of adjacent scores is not necessarily the same as the difference between any other pair of adjacent scores. Examples include disease staging systems (advanced, moderate, mild, none) and degree of pain (severe, moderate, mild, none). Steps of Medical Statistics: I. Data Collection: Data collection is a key part of the research process, and the collection method will impact on later statistical analysis of the data. The sources of data include: 1. Population surveys. 2. Surveys of providers—such as physicians, hospitals and nursing homes—are also an important source of information. 3. Vital statistics—drawn from the records of births, deaths, marriages and divorces. 4. Registers of diseases—such as communicable and non-communicable diseases —show the incidence, prevalence and outcomes of these threats. 50 Research Methodology, Biostatistics and Evidence Based Medicine 2024 5. Administrative records—such as those compiled during a hospital stay or at outpatient clinics or physician’s offices. 6. Research work (Cross sectional study – Case control study- Cohort study- Clinical trial). II. Data Summarizing: One of the first things that you may wish to do when you have entered your data is to summarize them. This can be done by producing diagrams, tables or summary statistics.  Diagrams are often powerful tools for conveying information about the data, for providing simple summary pictures and they further serve as useful visual aids for describing the data to others. General guidelines for Data Presentation Give a meaningful title that explains what data are included State the number of subjects or data points Label the rows and columns (tables) or axes (graphs) clearly State any units used, e.g. systolic blood pressure (mmHg)  Tables : A table is a set of data arranged in rows and columns. Almost any quantitative information can be organized into a table. The most basic table is a simple Frequency Distribution. Frequency distributions: It provides a way of organizing a collection of measurements. Grouped into well defined classes (first column of the table). Help us to determine what levels are common and what levels are rare. The second column shows the number in each class (absolute frequency). 51 Research Methodology, Biostatistics and Evidence Based Medicine 2024 The third column shows the numbers as percent of total number (relative frequency). It could be presented in tabular or graphic form. Table (1): Frequency distribution table of length of hospital stay of 120 patients in the intensive care unit. Number of % of patients Length of hospital stay patients in ICU in ICU 1- 4 3.3 8- 60 50.0 15- 29 24.2 22- 20 16.7 29-36 7 5.8 Total 120 100  Graphs A graph is a way to show quantitative data visually. It comprises two lines, one horizontal (or x-axis) to show the values of the independent variable and one vertical (or y-axis) to show the dependent variable, that intersect at a right angle.  The most common types of graphical presentation for qualitative or discrete data are Bar & Pie charts.  Histogram, frequency polygon & smooth curves for continuous data or grouped data arranged in frequency distribution. 52 Research Methodology, Biostatistics and Evidence Based Medicine 2024 1. The pie chart  A pie chart is a simple, easily understood chart in which the size of the “slices” shows the proportional contribution of each component part.  Each segment (slice) of a pie chart should be proportional to the frequency of the category it represents.  A disadvantage of a pie chart is that it can only represent one variable. You will therefore need a separate pie chart for each variable you want to chart. Figure (1): Causes of death worldwide (2014) 53 Research Methodology, Biostatistics and Evidence Based Medicine 2024 2. The simple bar chart  An alternative to the pie chart for nominal variables is the bar chart.  This is a chart with frequency on the vertical axis and category on the horizontal axis.  A bar drawn for each category, its length being proportional to the frequency in that category.  The bars are separated by small gaps to indicate that the data are categorical or discrete.  The simple bar chart is appropriate if only one variable is to be shown. 70 60 50 40 % 30 20 10 0 Presence of Presence of Presence of Positive Presence of amebic undigested fecal fat FOBT giardiasis cysts food Figure (2): Stool examination in patients with chronic diarrhea 54 Research Methodology, Biostatistics and Evidence Based Medicine 2024 3. The multiple (grouped) bar chart  If you have more than one group you can use the grouped bar chart.  A grouped bar chart is used to illustrate data from two-variable or three-variable tables.  Bars within a group are usually adjacent. The bars must be illustrated distinctively and described in a legend. Figure (3): Multiple bar chart showing count of medicines dispensed by Mansoura University Hospitals (2012) 4. The component (stacked) bar chart  Instead of appearing side by side, the bars are now stacked on top of each other.  Stacked bar charts are appropriate if you want to compare the total number of subjects in each group. 55 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Categories of a second variable could be shown as components of the bars that represent the first variable. Figure (4): Component bar chart showing blood groups of residents of some Egyptian cities 5. The histogram  A histogram is a graph of the frequency distribution of a continuous variable.  It uses adjoining columns to represent the number of observations for each class interval in the distribution.  The area of each column is proportional to the number of observations in that interval.  A histogram looks like a bar chart but without any gaps between adjacent bars to emphasize the continuous nature of the variable.  One limitation of the histogram is that it can represent only one variable at a time (like the pie chart). 56 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Figure (5): Histogram for systolic BP among patients 6. Frequency polygon: Mid points of upper bases of rectangles are connected together by series of straight lines. Figure (6): Frequency polygon of age frequency 57 Research Methodology, Biostatistics and Evidence Based Medicine 2024 7. Smooth curves: Smoothing the angles of the frequency polygon results in smooth curve which may be: a) Symmetric distribution curves: normal (bell- shaped) curve b) Skewed (asymmetric) distribution curves 8. Cartogram:  Maps are used to show the geographic location of events or attributes.  They can be used to show rates of disease or other health conditions in different areas by using different shades or colors.  When choosing shades or colors for each category, ensure that the intensity of shade or color reflects increasing disease burden. Figure (7): Cartogram for antibiotic resistance worldwide 58 Research Methodology, Biostatistics and Evidence Based Medicine 2024 9. Line graph: a line representing changes over time Figure (8): Liner graph for admitted cases of COVID 19 10. Scatter diagram: Represent correlation between two continuous variables. 59 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Medical Statistics II III. Data Analysis  Qualitative data are described as frequency & proportion.  Quantitative data needs to be descried in central tendency & dispersion.  Measures of Central Tendency: We calculate a measure of central location when we need a single value to summarize a set of epidemiological data. The three commonly used measures are: mean, median, and mode. A. The arithmetic mean or average  It is calculated by summing the values of the observations in the sample and then dividing the sum by the number of observations in the sample.  It is the measure that is frequently reported for continuous variables: age, blood pressure, pulse, body mass index (BMI).  In formulas, the arithmetic mean is usually represented as x, read as “x-bar.” The formula for calculating the mean from individual data is:  Although the mean is often an excellent summary measure of a set of data, the data must be approximately homogenous.  One disadvantage of the mean is that it is affected by the presence of one or a few observations with extremely high or low values. 60 Research Methodology, Biostatistics and Evidence Based Medicine 2024  The mean is “pulled” in the direction of the extreme values. The mean is pulled away from the median in the direction of the extreme values.  If the mean is higher than the median, the distribution of data is skewed to the right.  If the mean is lower than the median, the distribution is skewed to the left. Figure (1): Effect of skewness on the mean, median, and mode B. The median  It is the value that divides a set of data into two halves, with one half of the observations being larger than the median value, and one half smaller, after all of the observations have been ordered from least to greatest.  It is most useful for data with extreme values.  For data sets with an odd number of observations, we would determine the central observation with the following formula: Where n = number of observations 61 Research Methodology, Biostatistics and Evidence Based Medicine 2024  For data sets with an even number of observations, we would select the case that was the average of the following observations’ values: Where n = number of observations  The advantage of the median is that it is not affected by a few extremely high or low observations. Therefore, when a set of data is skewed, the median is more representative of the data than is the mean. C. The mode  It is the most commonly occurring value among all the observations in the dataset.  The mode is most useful in nominal or categorical data.  The mode is the least useful measure of central location. Some sets of data have no mode; others have more than one.  Example 1: A patient records his systolic blood pressure every day for one week. The values he records are as follows: Day 1: 98 mmHg, Day 2: 140 mmHg, Day 3: 130 mmHg, Day 4: 120 mmHg, Day 5: 130 mmHg, Day 6: 102 mmHg, Day 7: 160 mmHg. *The arithmetic mean or average for these 7 observations is calculated as follows: *In calculating the median, the values must be ordered from least to greatest: 98, 102, 120, 130, 130, 140, and 160. There are 7 observations in this dataset, an odd number. Therefore, the formula is used to determine that the fourth observation will 62 Research Methodology, Biostatistics and Evidence Based Medicine 2024 be the median. The value of the fourth observation is 130 mmHg. Therefore, the median is 130 mmHg. *In the example, the mode is also 130 mmHg. This is the only value that occurs more than once; hence it is the most commonly occurring value. It is interesting to note that if the dataset consists of continuous variables with unimodal and symmetric distribution, then the mean, median, and mode are the same.  Measures of Dispersion (Spread) These measures provide information regarding the relative position of other data points in the sample i.e. to describe how much spread there is in the distribution. Such measures include the following: range, inter-quartile range, and standard deviation. 1. Range  It is the difference between the largest (maximum) and smallest (minimum) values.  In the statistical world, the range is reported as a single number while in the epidemiologic community, the range is often reported as two numbers.  Example: If we have five persons with age 30, 34, 32, 36 and 28 years, the range will be 28 - 36 years. 2. Mean deviation 3. Variance (S2) 63 Research Methodology, Biostatistics and Evidence Based Medicine 2024 4.Standard Deviation (SD) The most commonly used measures of dispersion include the Variance and its related function, Standard Deviation, both of which provide a summary of spread around the mean. The square root of the variance is called the Standard Deviation.  Measures of Other Location: Percentiles, Quartiles, and Interquartile Range Percentiles are the values which divide an ordered set of data into 100 equal-sized groups. We can think of it as the value in a set of data that has 100% of the observations at or below it.  From this same perspective, the median, which has 50% of the observations at or below it, is the 50th percentile.  Suppose you have birth weights for 1200 infants, which you’ve put in ascending order. If you identify the birth weight that has 1 per cent of the birth weight values below it, and 99 per cent above it, then this value is the 1st percentile. Similarly, the birth weight which has 2 per cent of the birth weight values below it, and 98 per cent above it is the 2nd percentile.  Sometimes, epidemiologists group data into four equal parts, or quartiles. Each quartile includes 25% of the data. 64 Research Methodology, Biostatistics and Evidence Based Medicine 2024  The most commonly used percentiles are the 25th percentile (first quartile) and the 75th percentile (third quartile).  The interquartile range represents the central portion of the distribution, and is calculated as the difference between the third quartile and the first quartile. Figure (2): Percentiles, Quartiles, and Interquartile Range 65 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Table (1) : Advantages, disadvantages, and type of variables used with averages. Type of Advantages Disadvantages Type of variable average  Uses all the data values  Algebraically defined  Distorted by outliers and so mathematically  Distorted by skewed  Metric continuous Mean manageable data  Metric discrete  Known sampling distribution  Ignores most of the  Ordinal information  Not distorted by outliers  Metric discrete and  Not algebraically Median  Not distorted by skewed Metric continuous defined data if distribution is  Complicated sampling markedly skewed distribution  Ignores most of the information  Easily determined for  Nominal  Not algebraically Mode categorical data  Ordinal defined  Metric discrete  Unknown sampling distribution 66 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Table (2) : Advantages, disadvantages, and type of variables used with spread. Measure Type of Advantages Disadvantages of spread variable  Uses only two observations  Distorted by outliers  Ordinal Range  Easily determined  Tends to increase  Metric with increasing sample size  Units of measurement are the square of the units of Variance  Uses every observation the raw data  Metric  Algebraically defined  Sensitive to outliers  Inappropriate for skewed data  Same advantages as the variance Standard  Sensitive to outliers  Units of measurement deviation  Inappropriate for  Metric are the same as those of skewed data the raw data  Easily interpreted 67 Research Methodology, Biostatistics and Evidence Based Medicine 2024 The Normal (Gaussian) distribution Definition of normal curve: A normal distribution is the symmetrical clustering of values around a central location. This distribution represents most of the normal human biological phenomena and has the following properties: 1. Bell-shaped 2. The mean, median, and mode are equal and located at the center of the distribution 3. Uni-modal (only 1 mode) 4. Symmetrical about the mean 5. The curve is continuous – i.e., there are no gaps or holes. For each value of X, here is a corresponding value of Y 6. The curve never touches the x-axis gets increasingly closer to it 7. The total area under the normal distribution curve is equal to 100% 8. 50% of the area lies to the left of the mean and 50% of the area lies to the right of the mean 9. The X axis is divided according to standard deviation into approximately 3 standard deviations 10. Mean ± 1 standard deviation = 68.2% and Mean ± 2 standard deviations = 95.45% and mean ± 3 standard deviations = 99.73% Examples with approximate Normal distributions: Height, Weight, IQ scores, Body temperature 68 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Figure (3): The relationship of the standard deviation and the mean to the normal curve Figure (4): Normal curve, positive skewed and negative skewed curves 69 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Common Frequency Measures to Summarize Information We may encounter a number of other types of data in the medical field. All three frequency measures have the same basic form: These include: ratios, proportions, and rates. 1. Ratio Definition of ratio: A ratio is the relative magnitude of two quantities or a comparison of any two values. The numerator and denominator need not be related. Method for calculating a ratio: Number of events, items, persons,... in one group Number of events, items, persons,... in another group For example, body mass index (BMI), calculated as an individual's weight (kg) divided by his/her height squared (m2) 2. Proportion Definition of proportion: A proportion is the comparison of a part to the whole. It is a ratio in which the numerator is a subset (or part) of the denominator Method for calculating a proportion: Number of persons or events with a particular characteristic x 10n Total number of persons or events of which the numerator is a subset For a proportion, 10n is usually 100 (or n = 2) and is often expressed as a percentage or other numbers such as 100, 1,000, or 100,000. 70 Research Methodology, Biostatistics and Evidence Based Medicine 2024 − "Per cent" means per hundred, so a percentage describes a proportion of 100. For example 50% is 50 out of 100, or as a fraction 1⁄2. Other common percentages are 25% (25 out of 100 or 1⁄4), 75% (75 out of 100 or 3⁄4). − Proportion has no units. „ − A proportion may be expressed as a fraction (takes on values between 0 and 1). 3. Rate Definition of rate: In epidemiology, a rate is a measure of the frequency with which an event occurs in a defined population over a specified period of time. Rates are particularly useful for comparing disease frequency in different locations, at different times, or among different groups of persons with potentially different sized populations; that is, a rate is a measure of risk. − A rate is a proportion per time.  Example: Incidence rate of diseases = X 10n Number of new cases of disease during specified period Population at risk during the specified period N.B All these variables can be treated as continuous variables for most analyses. Exercise 1: Calculate the following measures: 1. A village of 40,000 persons has 20 Pharmacies. Calculate the ratio of Pharmacies per person. 71 Research Methodology, Biostatistics and Evidence Based Medicine 2024 2.Of the patients admitted to the surgical ward at Mansoura University Hospital during year 2014, 3,040 were non-diabetic and 190 were diabetic. Calculate the ratio of non-diabetic to diabetic patients. 3. Calculate the proportion of patients admitted to the surgical ward at Mansoura University Hospital during year 2014 who were diabetics. Answer 1. The ratio of Pharmacies per person = 20 / 40,000 x 104 = 0.0005 X 104 = 5 Pharmacies per 10,000 person 2. Ratio = 3,040 / 190 x 1 = 16:1 3. Proportion = (190 / 3,230) x 100 = 5.9% Exercise 2: Calculate the following measure In year 2005, there were an estimated 500,000 new cases of parasitic infections among 15,000,000 primary school children in some rural Egyptian villages. Calculate the incidence rate of new cases of parasitic infections. Model Answer The incidence rate = 500,000/ 15,000,000 = 0.0333 = 3.33%. Each year, 33 of every 1000 primary school children develop parasitic infections. Exercise 3: Results of National Health and Nutrition Examination Survey (NHANES) showed that, 7000 persons were enrolled in a follow-up study. Enrollees were documented to have anemia or not. 500 participants were anemic; 300 from anemic were females. 1. Calculate the proportion of males who were anemic to all in the study. 2. Calculate the ratio of males to females who were anemic. 72 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Inferential Statistics Hypothesis testing:  The “null hypothesis” is a concept. The test method assumes (hypothesizes) that there is no (null) difference between the groups.  The result of the statistical test either supports or rejects that hypothesis.  The null hypothesis is generally the opposite of what we are actually interested in finding out.  If we are interested if there is a difference between two treatments then the null hypothesis would be that there is no difference and we would try to disprove this.  Null hypothesis (Ho); assumes that no differences exist between groups.  Alternative hypothesis (H1); assumes that differences exist between groups. Examples: What is the null hypothesis in a study? (a) To find out whether use of a new surgical technique reduces rates of wound infection. Answer:  Null hypothesis: Rates of wound infection are the same with the new surgical technique as with old surgical technique.  Alternative hypothesis: Rates of wound infection are lower with the new surgical technique as compared to old surgical technique. The P (probability) value: The P value gives the probability of any observed difference having happened by chance. Used when we wish to see how likely that a hypothesis is true. 73 Research Methodology, Biostatistics and Evidence Based Medicine 2024 P = 0.5 means that the probability of the difference having happened by chance is 0.5 in 1, or 50:50. “Statistically not significant” if more than 0.05.  P = 0.05 means that the probability of the difference having happened by chance is 0.05 in 1, quoted as being “statistically significant”,  P = 0.01 is often considered to be “highly significant”. It means that the difference will only have happened by chance 1 in 100 times.  P = 0.001 It is usually considered to be “very highly significant”. Means the difference will have happened by chance 1 in 1000 times.  The lower the P value, the less likely it is that the difference happened by chance and so the higher the significance of the finding. Test of significance:  In order to test a statistical hypothesis, tests of significance are used.  Statistical tests which enable us to decide whether to reject or accept hypotheses. It varies whether the data are normally distributed (parametric) or non normaly distributed (non-parametric).  To select proper test of significance decide: 1. Type of variable used for comparison. If continuous variables (test normality) 2. Is the analysis directed to compare groups or correlate variables. 3. Number of groups are going to be compared. 4. If the studied variables between different groups or within same group. Example: If we have some length of hospital stay data with a mean stay of 10 days and a SD of 8 days. Does this mean and SDs appropriate measures to use?! No because the previous data are not parametric the SD should be less than 1/3 mean. 74 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Test of significance 1. Quantitative data: Non Normal data Normal data Type of data Skewed data SD < 1/3 of mean SD > 1/3 of mean Median (Min-Max)/ Summary Mean ± SD Median (Range) Median (IQR) Test used Parametric tests Non Parametric tests Independent t test  Two different groups Mann Whitney U test (Student’s t test)  Two pair within same Paired t test Wilcoxon signed rank test group  More than two One Way ANOVA test Kruskal Wallis test different groups  More than two Repeated Measures Periods within same Freidman test ANOVA group  Correlation Pearson correlation Spearman correlation 2. Qualitative data: Name of test Application Summary Frequency (proportion) / no (%) Measures association between qualitative Chi square test (2) variables. Used for paired comparison for dichotomous MC Nemar test variables. 75 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Training: Serum cholesterol was determined for 2 groups. The mean of the first group was 220 mg% and S.D. 10. The mean of the second group was 180 mg% and S.D. 15. After using suitable test the calculated p value was 0.03. 1. Define type of variable (Serum cholesterol)? 2. Mention method of summarizing data for this variable? 3. Determine suitable graphical presentation? 4. Define Null hypothesis and alternative hypothesis? 5. Define appropriate test of significance? 6. Comment on p value? In a case control to examine association between breast cancer and family history, decide appropriate test of significance to compare family history of breast cancer among breast cancer and control groups? A group of anemic children after giving suitable supplements were assessed for reduction of anemia after 6 month, decide suitable test to compare presence of anemia before and after treatment? Look at the following table and answer the questions: Iron deficiency Control group Test of Demographic data anemia group P value (n=130) significance (n=130) Age (years) ………… 0.106 Mean ± SD 15.89±3.20 16.60±2.37 Gender Male 73 (56.2%) 38 (54.3%) …………. 0.800 Female 57 (43.8%) 32 (45.7%)  Mention test used and comment on p value? 76 Research Methodology, Biostatistics and Evidence Based Medicine 2024  Table (2): Serum ferritin level among patients and control groups Iron deficiency Control group Test of Variables anemia group P value (n=130) significance (n=130) Ferritin 9.42±5.96 44.16±31.12 ………… 0.00  Is it right to use Mean ± SD for summary of data or not?  Mention suitable test to be used for comparison?  Comment on p value?  If we want to correlate Ferritin level with hemoglobin level, what is the appropriate test to be used? Table (3): HGB and Lymphocytic count among the studied groups Test of A/A (n=240) A/G (n=136) G/G (n=24) P value significance HGB 12.15±1.75 12.48±2.01 11.78±1.2 …………… 0.17 Mean ± SD Lymphocyte 1200 (400- 1500 (572- 1795 (572- …………… 0.048 Median (IQR) 22000) 3600) 2600) Table (4): Correlation between Cholesterol level and other variables Cholesterol level ………. P value Fasting blood sugar 0.130 0.06 HBA1c 0.757 0.002 Ca level -0.640 0.003  Mention test used and comment on p value? 77 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Scientific Writing What is a scientific paper?  A scientific research paper is a written and published report describing the results of scientific research. Motivation (Why publish?) 1. Dissemination of knowledge to many audiences. 2. A scientific research is a mandatory requirement for the promotion in academic jobs. 3. Career development and Ego (self-esteem). 4. Improved funding. 5. Patent protection (copyright). Key Elements of Publishing: 1. Ethical Issues. 2. Writing style and language. 3. Structure and components of paper. 4. Journal selection and article submission. 5. Publishing process and peer review. A standard structure of the article: Section Purpose Title Clearly describes contents Author & Ensure recognition for the writer affiliations Abstract Summarize what was done 78 Research Methodology, Biostatistics and Evidence Based Medicine 2024 Key words Ensures that article is correctly identified in abstracting and indexing services Introduction Explains the problem (What was the question asked?) Methods Explain how the data were collected and the study was conducted Results Describe what was discovered (What were the findings?) Discussion Discuss and explain the findings and what are their implication Conclusion A “take home message” to the readers Acknowledgement Thanks those who helped in research References Ensures the recognition of the previously published work Figures and Tables To objectively present the study findings I- Title: Title is the key part of the article which should be designed to advertise the paper and engage the reader’s attention at first sight and should be written after one has completed writing the main article. It describes the paper’s content clearly and precisely including the keywords. It should be specific, brief, clear, simple, precise, and catchy. The title generally should not exceed 150 characters or 12–16 words. Includes information help in the electronic retrieval of the article.  Avoid abbreviations/numerical parameters in the title: as a rule, abbreviations are not used in the title, but if for some reason commonly used abbreviations are used in the title, they should be defined in the abstract.  The following format can be used as a guide for writing a title: 79 Research Methodology, Biostatistics and Evidence Based Medicine 2024 - Research question + research design + population + geographic area of study (what, how, with whom, where). - There is no full stop at the end of the title. - For example, “Prevalence of iron deficiency anemia before and after food fortification with iron in a rural community in North India, a randomized controlled trial” (23 words, 147 characters with spaces). Types of titles: 1- Declarative 2- Descriptive Titles 3- Interrogative Titles: Titles These types are These types are commonly used describing Interrogative title poses the most the subject of the research without revealing the subject of research as appropriate for the conclusions. It includes the relevant a question. research information of the research hypothesis, articles; it which is studied, e.g., participant, They are more declare the intervention, control, and outcome, PICO. appropriate for literature main findings, reviews. revealed the A descriptive title has certain advantages: conclusion - Reader will get a snapshot information For example, “Does stated in the about the contents of the article. food fortification paper and - It contains important “keywords,” which decrease the prevalence convey the increases the probability of the article being of iron deficiency most discovered by the search engines. anemia in rural India?” information. For example, “Food 80 Research Methodology, Biostatistics and Evidence Based Medicine 2024 fortification - Unlike a declarative title, the conclusions decreases the are also not revealed, and it will help to prevalence of sustain a reader’s curiosity. iron deficiency anemia in rural For example, “Effect of food fortification India” on the prevalence of iron deficiency anemia in rural India” Running Title Short title, which should not exceed 60 characters (including spaces). Main Title: “Prevalence of hookworm infestation among school-going children in rural North India” (11 words, 86 characters with spaces). Running title: “Hookworm infestation among school-going children in rural India” (8 words, 59 characters with spaces). II- Authors Author names and order of authorship: Full and accurate names of all the authors. Author affiliations: Each author’s highest academic designation, department, and institution. The corresponding author: He/She must be highlighted with his e-mail, fax, mailing address, and telephone no. Only the corresponding author has the right to withdraw, correct, or make changes in the manuscript also, the corresponding author is the one responsible for responding to the readers’ queries o

Use Quizgecko on...
Browser
Browser