HE2801A Session 7 Cohort, Case-control Studies PDF
Document Details
Uploaded by TriumphantQuasar
Western
2024
HS2801A
Dr. Afshin Vafaei
Tags
Summary
These notes cover cohort and case-control studies from a session on November 1, 2024. The document also includes information about the midterm and final exams.
Full Transcript
Session 7 Design 2 Cohort, Case-control Studies November 1, 2024 HS2801A: Research Methods in Health Sciences Fall 2024 Dr. Afshin Vafaei Department of Health Studies Announcement Your midterm marks are on OWL: Total...
Session 7 Design 2 Cohort, Case-control Studies November 1, 2024 HS2801A: Research Methods in Health Sciences Fall 2024 Dr. Afshin Vafaei Department of Health Studies Announcement Your midterm marks are on OWL: Total out of 75 Fill in the blanks/Calculation (out of 25) MCQ (out of 50) We will review the exam questions today from 12 to 12:30, in the class If you have specific questions: in-person office hours today from 1 to 3, see the announcement HSB Room 215 Online meetings can’t be arranged Final Exam Friday, December-13-2024 7:00 pm to 10:00 pm Location Last name from To Alumni Hall 15 ABOATIAAFAUZY ASHBY Alumni Hall 201 ATTARIA ZHANG The common make-up exam date: TBD: January 2025 Today’s Class Epidemiologic (population health & Biological clinical) (basic medical) Studies Studies Observational Experimental Physiological, Studies Studies direct cell Test tube observation Descriptive Studies Case Reports Analytic Studies Randomized Community Case Series Ecological Controlled Trial Ecological Cross-sectional Trial Cross-Sectional -Case-Control -Cohort Observational Studies Cohort Studies Cohort Studies Involve the formation of a cohort, which is a group of individuals followed over time. 1. Those who enter the cohort should be outcome free 2. Selecting exposed and comparison groups 3. Following up participant 4. Determining outcome status One of the ten divisions in an ancient Roman legion NOTE: The exposure is determined before the outcome happens. The Big Picture of a Cohort Study Exposed Cohort Compare Incidence of disease Unexposed Past Start of Future Study More expensive, time consuming Not efficient for diseases with long latent periods Better exposure and confounder data Less vulnerable to information bias Schematic of a Cohort Study Using a Real Study to Explore General Features of Cohort Design Purpose: To examine the relationship between working night shifts and breast cancer. Participants: 78,562 female nurses aged 30-55 years with no history of cancer. Exposure: Years of night shift working obtained by questionnaire in 1988. Outcome: Incident cases of breast cancer between 1988 and 1998. Non-fatal cases were identified using medical records. Fatal cases were identified in the National Death Index and by contacting next of kin. Also death due to other reasons. Rotating Night Shifts and Risk of Breast Cancer in Women Participating in the Nurses’ Health Study. Schernhammer et al. Journal of the National Cancer Institute 2001;93:1563-8 Example # 2 Nov. 1, 2022 Nov. 1, 2034 Nov. 1, 2014 Nov. 1, 2024 In both, the exposure is determined before the outcome happens. Advantage of retrospective vs. prospective? Advantage of prospective vs. retrospective? Prospective vs. Retrospective Retrospective Cohort Study Exposed Compare Cohort Incidence of Unexposed disease Past Start of Future Study Cheaper, faster, efficient with diseases with long latent period Exposure and confounder data may be inadequate More vulnerable to bias Need an established recording system (military, occupational, birth cohorts) Two Questions: Dose an Occupational Chemical Exposure Cause Rash? Cause Cancer? Assume the occupational health office of a factory records (and keeps) history of exposure and rashes Available in records Incidence (rate) Ratio Relative Risk in Cohort Studies A comparison of the incidence of a characteristic in two independent populations (or independent subpopulations) that is calculated by taking a ratio of their incidence rates Men versus women populations Exposed versus unexposed A measure of risk of the exposure, association between exposure and outcome Developed Disease Yes No Incidence rate ratio = Exposed to risk factor a b [a / (a + b)] Not exposed to risk factor c d [c / (c + d)] Example #1: Risk of respiratory infection in plant worker exposed to pesticide Developed Disease Yes No Total a b Exposed to risk factor 48 206 254 c d Not exposed to risk factor 9 112 121 a / (a + b) 48/(48 +206) Incidence rate ratio = = c / (c + d) 9/(9+112) 0.189 = = 2.54 (probability increased 2.5 time; by 254%) 0.074 Time of measurement of the outcome is a crucial feature of cohort studies Common Baseline Time Point. Common Outcome Assessment Time 2014 2024 breast cancer no outcome no outcome no outcome no outcome breast cancer no outcome no outcome breast cancer Common Baseline Time Point. Outcome Measured Throughout Follow-Up 2014 2024 no outcome no outcome developed breast cancer developed breast cancer no outcome no outcome but died no outcome no outcome but lost to follow-up no outcome but died developed breast cancer died Rolling Entry Point (variable baseline) Outcome Measured Throughout Follow-Up. 2014 2024 no outcome no outcome developed breast cancer Concept of person-time: people are in no outcome but died the study for varied durations no outcome Different times under exposure developed breast cancer Contribute differentially to the study no outcome no outcome but lost to follow-up developed breast cancer no outcome but died Example: A prospective cohort study with a rolling entry that started in 2014 and finished in 2024. What are the person years of follow-up years for breast cancer for each participant and the total PTU? Participant # 1 2 3 4 5 Year Entered Study 2014 2014 2014 2015 2016 Year B.C. Developed N/A 2018 2017 N/A N/A Year Died N/A N/A 2019 2020 N/A Person Years of follow-up 10 4 3 5 8 for B.C. Total PTU = 30 years Cohort studies Advantages Disadvantages Valuable when exposure is rare Validity affected by losses to follow-up Can examine multiple effects of a single (selection bias) exposure Other factors may not be distributed evenly between exposure groups (confounding) Easier to determine the temporal relationship between exposure and outcome Inefficient for evaluation of rare diseases Allows measurement of incidence Can be expensive and time-consuming (need for large numbers and long follow-up) If retrospective they require good records Observational Studies Case-control Studies Recall Cohort Study Limitations An excellent observational design with many strengths but also certain limitations…. May need large numbers of subjects to be followed for long periods of time so logistically difficult, time consuming, expensive (especially prospective cohort studies) Loss-to-follow has potential to undermine validity Not good for rare diseases or those with long latency Not good when exposure data are expensive to obtain What to do in these situations? Conduct a case-control study! My Thoughts … You can answer a study question about a relationship between an exposure and an outcome very efficiently, including for rare outcomes. A prospective cohort study might take years of follow-up and be very expensive. A case-control design can answer the same question, but in a shorter time and with less resources. The Design of a Case-Control Study Classifying participants according Then asking for past to their outcome exposures status. The Design of a Case-Control Study Cohort Study =Unexposed =Exposed Source population Cases Ideally, all cases Case-control Cases study Control group Physical Activity and the Risk of Lung Cancer in Canadian Adults 2128 Cases obtained from National Cancer Registry 1 2128 Controls matched for 0.9 age and sex Odds Ratio for Lung Cancer 0.8 Prior physical activity (20 years 0.7 ago) assessed by questionnaire. 0.6 0.5 Determined whether physical 0.4 Active 20 years ago predicts 0.3 Current disease status (eg, Case or Control) 0.2 0.1 0 Sedentary Low Moderate High Physical Activity Level 20 Years Prior Source: Mao et al., American Journal of Epidemiology, 2003 Why Conduct a Case-Control Study? Cohort seems more like real life Hypothesis: Compared to women with low pesticide exposure, women with high pesticide exposure have an increased risk of breast cancer. Methods: Nurses’ Health Study- Prospective cohort study of 89,949 women aged 34-59 years Exposure assessed at beginning of study in 1976 Blood collected for all 89,949 women Level of pesticides in blood characterized as “high” or “low” Women followed for 30 years for the incidence of breast cancer Results: 15% have high exposure levels, and 85% have low exposure levels 1,439 incident cases of breast cancer over 30 year follow-up Why Conduct a Case-Control Study? Cohort seems more like real life Full Nurses’ Cohort Outcome Status Breast Cancer 1,439 No Breast Cancer 88,510 Total 89,949 Hypothesis: Compared to women with low pesticide exposure, women with high pesticide exposure have an increased risk of breast cancer. Practical Problem: Quantifying pesticide levels in the blood is very expensive: ($1,000 x 89,949 = nearly $90 million) It is not practical to analyze all 89,949 blood samples. Full Nurses’ Cohort Outcome Status Breast Cancer 1,439 No Breast Cancer 88,510 Total 89,949 To be efficient, we can choose to analyze blood for all cases (N=1,439), but only a SAMPLE of the women did not develop breast cancer. For this example, let’s choose to sample 2x the number of cases (N=2,878). Case-Control Study Outcome Status nested in Nurses’ Cohort Reduced size of study greatly reduces Breast Cancer 1,439 costs. With a proper sampling, we’ll No Breast Cancer 2,878 get the correct value for the measure of association. Total 4,317 (more next session) Cohort vs. Case-control Example: Protein Deficiency and Kwashiorkor in Kenyan Children Case-control study that includes 300 Cases with Kwashiorkor and 300 age and sex matched Controls. 212 of the Cases had protein deficient diets. 72 of the Controls has protein deficient diets. The odds ratio for protein deficiency in Cases relative to the Controls was ____. Disease Present Yes No Exposed to risk factor a b Not exposed to risk factor c d Example: Protein Deficiency and Kwashiorkor in Kenyan Children In case-control studies we are only able to estimate odds of occurrence (happening versus not happening) Odds Ratio=(odds of disease in exposed/odds of disease in unexposed) Why odd? You had asked us just to Disease Present accept and also do some ‘odd’ calculations in Yes No midterm. Why!! b Exposed to risk factor a 212 72 Not exposed to risk factor 88c 228 d 212 / 72 2.944 Odds Ratio= Odds of diseases in exposed (a / b) =7.63 = = Odds of diseases in unexposed (c/d) 88 / 228 0.386 Interpretation similar to other relative measures Case-Control Data Table Cases Controls Total Exposed 20 Depends on Depends on Unexposed 30 number of number of Total 50 controls controls sampled sampled Eg. Controls are 20% of the non-cases (36/180) Cases Controls Total 20% of the non- cases) Exposed 20 12 32 Unexposed 30 24 54 Total 50 36 86 Case-Control Data Table Cases Controls Total 20% of the non- cases) Exposed 20 12 32 Unexposed 30 24 54 Total 50 36 86 Cannot calculate measures of occurrence: risks and rates; why No longer have entire denominator of population at risk (because controls represent a selected SAMPLE of the total population with an arbitrary size) 20/32 is NOT the prevalence of disease in exposed, we decided to select 20% (36 controls) No idea what the real prevalence is in exposed and unexposed, hence no information on risk Not a defect, good feature: efficiency Can only calculate the odds of exposure in cases and controls; Therefore, the only ODDS RATIO is the proper measure of association Only ‘Odds’ can be Estimated from Case-Control Studies Outcome Yes No Exposure Yes A B No C D Fixed (n control for each case) Artificial: not real incidence or prevalence Odd of outcome in not-exposed group OR of exposure: (A/C)/(B/D) = (A/B)/(C/D)=OR of outcome Odd of outcome in the exposed group Selection of Cases Very clear case definition required who exactly are the cases of disease in your study? Ideally, case selection will involve direct sampling of cases within a source population All people in the source population who develop the disease of interest will be included as cases random sample if large number of cases, or logistical constraints If someone in the source population developed the disease of interest, would they (meet the case definition and) be included as a case in the study? Question: which types of diseases would be more likely to meet the above criteria? Controls Definition = A sample of the source population that produced the cases Purpose = To estimate the exposure distribution in source population that produced the cases Knowing the exposure prevalence among cases and controls is what allows us to measure the association between an exposure and outcome. Selection of Controls Trickiest bit of case–control studies! Two Necessary Requirements (This is really, really important.) 1. Controls must come from the same source population as the cases. Random selection is necessary to obtain a representative sample of source population. Representativeness is a very important consideration. 2. Controls must be selected independently of exposure. This means that their exposure status does not influence their selection. The “Would Criterion” If the controls had experienced the outcome, would they have been identified as cases in your study? A philosophical/conceptual question Where to Find Controls 1.Population-based controls (preferably random selection) 2.Nested controls from a cohort population/study 3.Hospital- or clinic-based controls 4.Family or friend control Review the next four slides in your own time Yes!, answer to your question about being in exams :) Population-Based Controls Definition: Controls selected from the general population, most suitable when cases are from well-defined geographic area Sources: Random digit dialing, cell-phone or Internet subscribers, residence lists, tax lists, voter registration lists, drivers license holders Example: Case-control study of vitamin A and lung cancer. Cases come from Massachusetts Cancer Registry, and controls come from the roster of registered voters in Massachusetts. (Is this a good control group?) Good News: Controls often come from well-defined source population. Bad News: VERY time consuming, harder to inspire participation, may not recall past exposures as well as cases Nested Controls Definition: Controls selected from an existing cohort population. Controls represent a sub-set of the full source population. Example: Studying pesticide exposure and risk of breast cancer in Nurses Health Study Cohort Good News: Controls come from clearly defined source population; already enrolled → willing participants Bad News: Restricted to members of existing cohort; may limit hypothesis that can be studied Hospital- or Clinic-Based Controls Definition: Controls selected from among patients at a hospital or clinic Choose control patients with diseases (often more than one) other than the case’s disease Typically used when cases are identified from a hospital Requirements: Same source population as cases: must consider the “would criterion” and the referral pattern Illness should be unrelated to, that is, independent of the exposure under study (1) Illnesses that have the same catchment area as the cases. This is what we mean when we say that controls should come from the same source population as cases. (2) Illnesses that have no known relation to the risk factor(s) under study. This is what we mean when we say selection of controls should be independent of exposure. Hospital- or Clinic-Based Controls Example: A case-control study of smoking and the risk of myocardial infarction. Cases come from Boston Medical Center. Controls are other patients at Boston Medical Center. But which patients/illnesses would make good controls? Emphysema patients? Diabetes patients? Appendicitis patients? Patients injured in car accidents? Good News: Easy to identify and access → less time and money, accuracy of exposure recall comparable to cases, more willing to participate Bad News: These controls are not randomly selected. This means that hospital- based controls must be carefully selected to accurately represent the exposure history in source population. The matched design Definition/Purpose Dictionary of epidemiology: the process of making a study group and a comparison group comparable with respect to extraneous factors Making groups as similar as possible to account for confounding Like Randomization in experimental studies (last session) Recall: Only difference: exposure (intervention) status; making exposure groups similar in other factors Matching cases to controls Only difference: outcome status; making outcome groups similar in other factors Matching in Case-control Studies After selection of matching factor, for each case a control with the same characteristics will be selected Characteristics with the highest possibility of being a confounder (sex, age, setting) Impractical to match for many factors Like Randomization in experimental studies the only difference remains in.. Exposure status; making exposure groups similar in other factors To account for confounding Analysis Unit is not a person but a pair (the case and its matched control) Paired analysis Statistical models are well developed Types of Matching Individual matching Performed participant by participant Frequency matching Providing similar distributions of confounders in groups Matching in Cohort Studies Exposed matched to unexposed (an effort to mimic randomization) Less common, much less! Expensive Sometimes unpractical May require control (competing risks, loss to F/U) Matching in Case-control Studies After selection of matching factor, for each case a control with the same characteristics will be selected Analysis Unit is not a person but a pair Paired analysis Statistical models are well developed The Unmatched Contingency Table …. Cases Controls Total (Metabolic (No Metabolic Syndrome) Syndrome) High fat (exposed) A B Total Low fat (unexposed) C D Total Total – no. Total Total Total OR: (A/B)/(C/D)=AD/BC Odds Ratio: Unmatched Situation Cases Controls Total (Metabolic (No Metabolic Syndrome) Syndrome) High fat (exposed) 300 100 400 Low fat (unexposed) 200 400 600 Total – no. 500 500 1000 OR: (300*400) / (100*200) = 6.0 Matched Example Control Exposed Not Exposed Cases Exposed Both exposed Discordant Not exposed Discordant Both unexposed We measure the ‘exposure’ status within the pairs OR: ratio of discordant pairs (only cases exposed/only controls exposed) Number of pairs Matched Example Control Exposed Not Exposed Cases Exposed 100 100 Not exposed 80 120 There are 400 pairs in this study (400 cases, 400 matched controls) ratio of discordant pairs (only cases exposed/only controls exposed) =100/80 Case-Control Studies Advantages Disadvantages More efficient than a cohort study (in Exposure is assessed after development of the outcome terms of time, money, effort) May be unsure about temporal sequence between exposure and disease Suited to diseases with long latent Recall bias period Also prone to selection bias in control choice –especially if response rates are low Can usually only study one disease or Optimal for rare disease outcome Inefficient for rare exposures Cannot calculate absolute measure of Can examine multiple exposures association Correct Measures of Association for various Designs Design Measure of association Cross-sectional Prevalence rate ratio Case-control Odds ratio Cohort (prospective, retrospective) Incidence (cumulative or person-time) rate ratio Cohort (if time to event data are available) Hazard ratio (not covered) Midterm Review Odd=D+/D-; proportion=D+/total Proportion to odd: Proportion =0.99=99/100 Odd=99/1=99 Odd to proportion: Odd=0.5=5 (D+)/100 (D-) Proportion=50/150 Consider a class with 100 enrolled students. None of the students were ill at the beginning of the school year (September 1st). On September 30, a total of 5 students reported having gastroenteritis. All 5 continued to be ill on October 1, but all 5 recovered within 3 days. On October 14, another 3 students developed gastroenteritis. All of these students continued to be ill on October 15, but all 3 recovered 5 days later. In this example, assume that a person cannot get gastroenteritis more than once. Disease 5 5 0 3 0 Sep 1 Sep 30 Oct 1 Oct 4 Oct 14-Oct 15 Oct 18 …………… At risk 100 95 Calculate the prevalence of gastroenteritis in the class on October 1. (1 mark) (5/100) Calculate the prevalence of gastroenteritis in the class on October 30. (1 mark) (0/100; total population remains 100) Calculate the incidence of gastroenteritis in the class during the month of October.(2 marks) 3/(100-5)=3/95 A cross-sectional study is conducted to investigate the relationship between infant exposure to second hand smoke and risk of development of asthma by age 4. If the investigators were able to measure the exposure with perfect accuracy, 20% of cases would be exposed and 15% of the healthy children would be exposed. The prevalence of exposure in the total population was 16%. a) Fill out the following 2 by 2 table, in a population of 1000. Calculate the relative risk (effect estimate) for the perfect accuracy scenario (2 marks) Developed No Asthma Total Asthma Exposed 40 120 160 Not Exposed 160 680 840 200 800 1000 RR=(40/160)/(160/840)=1.31 b) However, the investigators use an approach based on participant self-report to assess exposure. A previous validity study of this approach showed that in people with asthma there will be a 20% measurement error, it means 20% of exposed will be classified as unexposed and 20% of unexposed will be classified as exposed. The level of misclassification is equal to 10% among healthy children, ie. a 10% measurement error, it means 10% of exposed will be classified as unexposed and 10% of unexposed will be classified as exposed. Comment on the nature of misclassification in this study. (1 mark) Differential misclassification. What will be misclassified (and really observed) effect estimate? Show your calculation. (3 marks) Developed Asthma No Asthma Total Exposed 40-(40*0.2)+(160*0.2)= 120-(120*0.1)+(680*0.1)= 240 40-8+32=64 120-12+68=176 Not 160-(160*0.2)+(40*0.2)= 680-(680*0.1)+(120*0.1)= 760 Exposed 160-32+8=136 680-68+12=624 200 800 1000 Misclassified incidence rate ratio= (64/240)/(136/760)=1.49 Next Session (Nov. 8, 2024) Experimental studies Chapter 12 of the textbook (will be in the quiz) Quiz 3: the same format as quiz 1 & 2, open book In-class (11:45 to 12:15), on paper, bring a pen and calculator