Epidemiology Study Designs PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document outlines the three main study designs used in epidemiology: cohort, case-control, and cross-sectional studies. Each design is described, including its purpose, advantages, and limitations. The document also explains how these designs can help researchers understand the development and distribution of diseases in a population.
Full Transcript
CHAPTER 13 Epidemiology: Study of the distribution of disease Three Main Study Designs in epidemiology: 1. Cohort – Begin with exposure and follow for outcome You ask them if they are exposed or not and follow them over time. Usually it gives you incidence rate however you c...
CHAPTER 13 Epidemiology: Study of the distribution of disease Three Main Study Designs in epidemiology: 1. Cohort – Begin with exposure and follow for outcome You ask them if they are exposed or not and follow them over time. Usually it gives you incidence rate however you can also find prevalence rate but it is not that common. It can have past data or future data. Can handle multiple outcomes for the same exposure, For example Exposure: Smoking , Possible Outcomes: Lung cancer, Chronic obstructive pulmonary disease (COPD), Heart disease, Stroke Rare exposures are best suited for Cohort studies. You can get INCIDENCE DATA and RELATIVE RISK from cohort study. Limitations: Expensive and time-consuming, particularly for prospective studies. Loss to follow-up can introduce bias. Not efficient for rare outcomes (large sample size needed). Requires careful control of confounding factors. Study Design for Cohort Study: 1. Identify individuals based on exposure status (exposed and non-exposed groups). 2. Follow them over time to observe the development of outcomes (e.g., disease or condition). 3. Record the number of new cases in each group. 4. Compare the incidence rates and calculate the relative risk (RR) to determine the strength of association between exposure and outcome. Cohort study-incidence rate-relative risk 2. Case-Control – Begin with outcome and look back for exposure ○ Backwards in time ○ Fuzzy memory ○ Drug for nausea was developed which gave birth defects to kids, their limbs were short or not fully formed - Case- control ○ Has the disease and you study if people near them are exposed or not. ○ Rare outcomes are best suited for case control ○ You can calculate ODDS RATIO with case control. ○ Can study multiple risk factors (exposures) for the same disease. Study Design: Identify individuals with the disease (cases) and compare them to individuals without the disease (controls). Determine past exposure status for both groups. Limitations Relies on recall (fuzzy memory) or past records, which can introduce recall bias Difficult to establish causation due to retrospective nature. Cannot directly measure incidence or prevalence. Analysis Calculates the odds ratio (OR) to estimate the strength of association between exposure and outcome. The odds ratio (OR) is the primary measure of association. It estimates how much more likely cases were exposed compared to controls. 3. Cross-Sectional – Assess exposure and outcome at the same time ○ Quick, you take a whole bunch of people who have the disease and in most cases this is the first study you do if it does not work you do others. ○ This study design is suitable because birth defects are rare outcomes. Advantages: Quick and inexpensive. Useful for generating hypotheses. Limitations: ○ Cannot establish causation (temporal sequence is unclear).(In a cross-sectional study, you measure both the exposure (e.g., smoking) and the outcome (e.g., lung disease) at a single point in time.Since you don’t know whether the exposure preceded the outcome or vice versa, you can’t determine a cause-and-effect relationship.) ○ Measures prevalence, not incidence. Use: Often the first step in research; if associations are found, more in-depth studies (e.g., cohort or case-control) are conducted. Study Design: ○ Identify cases (children with birth defects like shortened or absent limbs). ○ Identify controls (children without birth defects). ○ Investigate mothers' drug use during pregnancy (exposure). ○ Outcome: Found a strong association between Thalidomide use and birth defects. KEY DIFFERENCES Feature Cross-Sectional Study Case-Control Study Cohort Study Starting Point Assess exposure and Start with the outcome Start with the exposure outcome at the same time (disease) and look back and follow for outcomes for exposure Direction of Snapshot in time (no time Retrospective Prospective (forward in Study element) (backward in time) time) or retrospective Key Measure Prevalence, Odds Ratio (OR) Odds Ratio (OR) Incidence, Relative Risk (RR), Attributable Risk (AR) Best For Estimating disease Rare outcomes (e.g., Rare exposures, prevalence or generating rare diseases) studying multiple hypotheses outcomes for one exposure Example Survey on smoking and lung Thalidomide use during Smoking and the risk of cancer prevalence at a point pregnancy and birth developing lung cancer defects over 10 years Population Entire population or sample Cases (disease) and Exposed and controls (no disease) non-exposed groups Advantages - Quick and inexpensive - Efficient for rare - Directly measures diseases incidence and risk - Provides snapshot of - Can study multiple risk - Can study multiple disease burden factors for one disease outcomes for one exposure Disadvantages - Cannot establish causation - Recall and selection - Time-consuming and bias expensive - Limited to prevalence (not - Cannot directly - Requires large sample incidence) measure incidence size for rare outcomes Cost/Time Low Moderate High Five main statistics we will calculate 1. Incidence 2. Relative Risk 3. Attack Rate 4. Attribute Risk 5. Odds Ratio Incidence: The number of new cases in a particular time frame Incidence rate = (number of new cases / population at risk at a given time) x 100 The only study design that gives you incidence data is a cohort study. Can calculate a relative risk when you have incidence data. Key Strengths of Cohort Studies for Incidence: Tracks the population at risk to directly observe the development of new cases. Allows for temporal relationship assessment (exposure precedes outcome). Can calculate both absolute risk and relative risk. First step: Create 2x2 table Disease No Disease Totals Exposed A=40 B=7 A+B= 47 Not exposed C=5 D=153 C+D= 158 Totals A+C=45 B+D=160 A+B+C+D=205 How you set up this table is critical. Disease outcome in the columns with disease first. Exposure in the rows, with exposed first. Calculating Incidence Incidence rate = (A+C) / (A+B+C+D) x 100 Determining Incidence: Cohort study 2x2 Remember cohort studies do not include anyone who has the outcome of interest already so those diagnosed with the disease during the course of the study are INCIDENCE cases. Therefore, those who are in boxes A+C are the incidence cases in this example. So the incidence rate for your study would be (40+5/205) x 100 = 21.95%. This tells you that about 22% of your sample population developed the outcome of interest while in your study. Relative Risk: - Uses cohort study - What is the relative risk of getting exposed? Relative Risk is a number that tells us how much more (or less) likely a certain outcome (like getting a disease) is in people who are exposed to something (like smoking) compared to those who are not exposed. RR= Incidence rate in the exposed group Incidence rate in the unexposed group *Also called the risk ratio. A/A+B = Relative Risk C/C+D Interpretation of RR: RR = 1: No difference in risk between the exposed and non-exposed groups (neutral association). RR > 1: The exposure increases the risk of the outcome (positive association). ○ Example: RR=2: The exposed group is 2 times more likely to develop the outcome than the non-exposed group. RR < 1: The exposure decreases the risk of the outcome (protective effect). ○ Example: RR=0.5: The exposed group is 50% less likely to develop the outcome than the non-exposed group. Why Do We Need It? To see if an exposure increases or decreases the risk of an outcome. Helps identify harmful behaviors (e.g., smoking increases lung cancer risk) or protective factors (e.g., exercise lowers heart disease risk). Guides decisions in healthcare and public health policies. Example If smokers are 4 times more likely to get lung cancer than non-smokers, the Relative Risk is 4. This tells us smoking is strongly linked to lung cancer. Calculating RR from a 2x2 table from a cohort study Disease No Disease Totals Exposed 40 7 47 Not exposed 5 153 158 Totals 45 160 205 RR = incidence rate in the exposed group / incidence rate in the unexposed group RR = 40/47 = 0.85 = 28.3 / 5/158 = 0.03 A/A+B = Relative Risk C/C+D What does this mean? RR = 28. If your p-value is significant, this means that the exposed group was 28 times as likely to get sick as the unexposed group. Because you have incidence data, you can also say the exposure occurred before the disease. In other words, if a person is exposed to this risk factor, they are 28 times as likely to get the outcome of interest (disease). This exposure would then be considered a risk factor for this disease, and the incidence data helps establish a stronger link to causality. With incidence data from a cohort study, you have strong evidence for causality because: Exposure precedes the outcome. There’s a strong association (RR = 28). A significant p-value supports the validity of the finding. In a cohort study, we use incidence data (new cases) to study the exposure, which is observed before the outcome. This allows us to establish a temporal relationship and suggest an association between the exposure and the outcome. In a case-control study, we start with the outcome and look back at exposure history. However, because the outcome is already present, it’s harder to determine whether the exposure caused the outcome or if other contributing factors were involved. This makes it difficult to establish causation. Why p-value is Important in Relative Risk: It validates the reliability of the RR. Without a significant p-value, the RR might just be a random occurrence. A significant p-value, along with a high RR (e.g., RR = 4), strengthens the evidence for causation. RR less than 1 and p value is less than alpha (REJECT) If RR=0.5 and p 1 indicates risk; OR < 1 indicates protection. Significance: Confidence intervals determine whether the association is statistically meaningful. Summary Table for Odds Ratio and p-Value Scenarios: Odds Ratio p-Value (< 0.05) Interpretation OR > 1 Significant Increased odds of the outcome; meaningful association. OR > 1 Not Significant Increased odds, but possibly due to chance. OR < 1 Significant Reduced odds of the outcome; protective association. OR < 1 Not Significant Reduced odds, but possibly due to chance. OR ≈ 1 Significant No practical difference, but statistically significant. OR ≈ 1 Not Significant No association between exposure and outcome. 95% Confidence Intervals 95% Confidence Limits indicate the range in which the researcher is 95% certain the population parameter falls. For example, if the study reports an RR of 28 (95% CL 24-29.2). The results from the sample indicate an RR of 28, and the researcher is 95% sure that the actual RR in the population is between 24 and 29.2. For Differences (e.g., Mean Difference, Risk Difference): Significance is determined by whether the CI includes 0. ○ Why? Because for differences, 0 means no difference between groups. Difference=0: There is no effect or no change between the groups being compared. ○ If the CI does not include 0, the result is statistically significant. It reflects the precision of the estimate: Narrow CI: The estimate is precise, and there’s less variability. Wide CI: The estimate is less precise, with more variability. Determining significance of RR/OR from Confidence Limits If the full range of the CL is above the value of 1, the exposure is a significant risk factor for the disease (e.g., asbestos). If the full range of the CL is below the value of 1, the exposure is a significant protective factor for the disease (e.g., Vaccine). If the CL includes the value of 1 anywhere in the range, it indicates there is an insignificant association because an RR/OR of 1 means the disease occurs at the same level in the exposed group as it does in the unexposed group (e.g., 95% CL 0.89-2.45). This means the p-value is >alpha (not significant). Statistical Significance: If the CI does not include 1 (for RR or OR), the result is statistically significant. In this example, since the CI is well above 1, the association between the exposure and the outcome is significant. Example from Nurses Health Study The overall relative risk of major coronary disease in women currently taking estrogen was 0.56 (95 percent confidence interval, 0.40 to 0.80). What does this mean? Simplified Steps to Decide Which Rule to Follow 1. What Are You Comparing? ○ If it’s a ratio (e.g., Relative Risk, Odds Ratio): Use 1. ○ If it’s a difference (e.g., Mean Difference, Risk Difference): Use 0. 2. Check the Confidence Interval (CI): ○ For Ratios (RR, OR): If the CI does not include 1, the result is significant. ○ For Differences: If the CI does not include 0, the result is significant. 3. Examples: ○ Ratio Question: "Does smoking increase the risk of lung cancer?" Rule: CI should NOT include 1. RR=3.5, 95% CI: 2.5 - 4.2RR=3.5,95 → Significant (doesn’t include 1). ○ Difference Question: "Does a new drug lower blood pressure?" Rule: CI should NOT include 0. MeanDifference=−10,95 Mean Difference = -10, 95% CI: -12 to -8MeanDifference=−10,95 → Significant (doesn’t include 0). Quick Rule to Remember: Ratios (RR, OR): Check for 1. Differences: Check for 0. Summary of Formulas Incidence = A+C/A+B++C+D Relative Risk = A/A+B / C/C+D Attack Rate = (A/A+B) x 100 Attributable Risk exposed = (A/A+B) – (C/C+D) Odds Ratio = A/C or simplified to AD / B/D BC Important things to remember - The purpose of the odds ratio is to provide a measure of the relationship between an exposure and an outcome, particularly in studies where direct calculation of incidence or relative risk is not possible. It is especially valuable in case-control studies and for analyzing rare outcomes. - RR is interpreted by check if it is lower than 1 (protective factor), higher than one (risk factor) or if it is equal to 1 ( the factor does not have any effect on the outcome) - Confidence interval for RR and OR is interpreted by checking if it has 1 in it. - Confidence interval for any kind of mean difference ( t test) is interpreted by checking if it has zero in it. - Cohort study - incidence data- relative risk- attack rate- attributed risk ( because they all use incidence data) -