HLSC 2P27 Regression Analysis PDF
Document Details
Uploaded by Deleted User
Brock University
Mostafa (Mo) Shokoohi
Tags
Summary
This document is a lecture or presentation about regression analysis in healthcare research. It covers different types of regression models, including linear and logistic regression, and explains how they can be used in healthcare research to evaluate risk factors, treatment effects, and outcomes. The document also introduces the concepts of independent variables, dependent variables, and covariates.
Full Transcript
HLSC 2P27 REGRESSION ANALYSIS Mostafa (Mo) Shokoohi, Ph.D. Epidemiology and Biostatistics As...
HLSC 2P27 REGRESSION ANALYSIS Mostafa (Mo) Shokoohi, Ph.D. Epidemiology and Biostatistics Assistant Professor Department of Health Sciences, Brock University 1 REGRESSION MODELING Why Use Regression Models? Predictions: Regression models help us make predictions (e.g., risk of disease). Associations: Explore relationships between variables (e.g., smoking and lung cancer risk). Adjusting for Confounding: Control for variables that could potentially bias results. Goes above bivariable analysis and allows for multivariable analysis Regression models are used to evaluate risk factors, treatment effects, and outcomes in healthcare research. Examples: Predicting patient outcomes, analyzing treatment effects. 2 2 VARIABLES IN REGRESSION MODELS Independent Variable Other terms: Exposure, main predictor, main risk factor Predicts the value of the dependent variable. Dependent Variable Other terms: Outcome, endpoint Represents the outcome being studied. To be predicted by the independent variable Covariates Other terms: confounders, third variables To be controlled when assessing the effect of independent variable on dependent variable 3 3 VARIABLES IN REGRESSION MODELS Independent variable Dependent variable (exposure, treatment) (outcome) e.g., Smoking e.g., lung cancer Covariates e.g., age, family history 4 4 TYPES OF REGRESSION MODELS Study outcomes (dependent variables) determine the type of regression models Linear Regression models for continuous outcomes (ratio or interval variables) E.g., predicting systolic blood pressure Logistic Regression models for binary (binomial) outcomes E.g., probability of having a disease Survival Analysis and Cox Proportional Hazard Models for time to event outcomes E.g., time until patient recovery or death 5 5 SIMPLE LINEAR REGRESSION Model Basics Equation: Y= β0 + β1X + ϵ Y: Outcome variable (e.g., weight) X: Predictor variable (e.g., exercise hours) β0 (α): Intercept / constant (where line crosses Y-axis) β1 : Slope (change in Y per unit X) Example: If β1= -0.5, each additional hour of exercise decreases weight on average by 0.5 kg. Negative sing of β1 coefficient means increase in X decreases Y on average by the β1 coefficient Positive sing of β1 coefficient means increase in X increases Y on average by the β1 coefficient 6 6 MULTIPLE LINEAR REGRESSION More accurate models, adjust for confounding (e.g., weight predicted by exercise and diet). Model Equation: Y= β0 + β1X1 + β2X2 +…+ ϵ Each predictor has its own slope (β) Example: Predicting patient’s blood pressure based on age, weight, and activity level. Y (Blood pressure)= β0 + β1(age) + β2(weight) + β3(activity) + ϵ 7 7 SIMPLE LOGISTIC REGRESSION Simple Logistic Regression Basics For outcomes like disease/no disease, with an S-shape distribution Commonly used in case-control studies Odds Ratio is the measure of association reported in this model It tells how much more (or less) likely the outcome is for different predictor values. Model Equation: logit(p) or log (p/1-p) = β0+β1X1 Where p is the probability of the outcome. Exponentiate of β1 is the odds ratio 8 8 SIMPLE LOGISTIC REGRESSION Example: Smoking and lung cancer risk logit(p) or log (p/1-p) = β0 + β1X1 logit(probability of lung cancer) = 0.15 + 0.75(smokers) Exponentiated value of 0.75 is 2.12 The odds ratio = 2.12 means smokers are 2.12 times more likely to develop lung cancer. 9 9 MULTIPLE LOGISTIC REGRESSION Model Equation: logit(p)=β0+β1X1+ β2X2+… Example: controlling for exercise and family history when exploring the association between smoking and lung cancer Independent variable: smoking Dependent variable: lung cancer Covariates: exercise and family history logit(probability of lung cancer) = β0 + β1(smoking) + β2(exercise) + β3(family history) 10 10 TIME-TO-EVENT OUTCOMES Time-to-event outcome (or survival outcome) measures the time until a specific event occurs (e.g., death, disease progression, recovery). Events may or may not occur during the study period. Often analyzed using survival analysis methods (e.g., Kaplan-Meier, Cox proportional hazards model). E.g.: In a clinical trial for cancer treatment, researchers track time from treatment start until disease progression. Patients without progression by study end: Time recorded until last follow-up (censored = 0) Patients with progression: Exact time of progression recorded as the event (experienced event = 1) 11 11 SURVIVAL ANALYSIS AND KAPLAN-MEIER CURVES Analyzing Time-to-Event Data Outcomes like survival time, time to recovery. Kaplan-Meier Curve: Shows the probability of surviving over time. Useful for visualizing survival differences between groups (e.g., treatment vs. no treatment). Used to report median survival across study groups https://towardsdatascience.com/kaplan-meier-curves-c5768e349479 Example: Comparing survival times in patients with different treatments. 12 12 COX PROPORTIONAL HAZARDS REGRESSION Analyzes time to event outcomes. E.g., overall survival, progression-free survival The Model: Simple Cox PH model: Ln(h(t)/h0(t)) = β1X1 Multiple Cox PH model: Ln(h(t)/h0(t)) = β1X1+ β2X2 + β3X3 +... Measure of association Hazard Ratio (HR), refers to risk of event occurring at any time point. HR can be obtained from exponentiate value of β coefficients 13 13 WHY MULTIPLE REGRESSION MODELS? Health outcomes are rarely influenced by a single factor. Multiple regression allows us to adjust for confounding variables, isolating the effect of each predictor. Example: In studying the impact of exercise on heart health, factors like age, diet, and smoking status also influence heart health and should be included. 14 14 WHAT IS A CONFOUNDER? A confounder is a variable that is associated with both the independent variable and the dependent variable, potentially distorting the observed association between these two. Example: Age could be a confounder in a study on physical activity (independent variable) and heart disease (dependent variable), as older age is related to both less activity and higher heart disease risk. 15 15 CONFOUNDING BIAS Confounding Bias occurs when the effect of the exposure on the outcome is mixed with the effect of the confounder. This can lead to overestimating or underestimating the association or even seeing an association where none exists. Example: In a study of alcohol consumption and liver disease, if age is not controlled for, we may mistakenly conclude that alcohol has a stronger effect than it actually does, as age independently increases the risk of liver disease. 16 16 PRESENTATION OF RESULTS Studying the association between smoking status and risk of heart disease, adjusting for age, sex, and BMI. Predictor Unadjusted Odds Ratio (95% CI) Adjusted Odds Ratio (95% CI) Smoking Status (Smokers vs non-smokers) 2.50 (1.80–3.20) 1.80 (1.40–2.30) Age (per unit) - 1.05 (1.03–1.08) Sex (Female) - 0.90 (0.70–1.10) BMI (per unit) - 1.02 (1.01–1.03) Unadjusted Model: In the unadjusted model, smokers have an OR of 2.50 for heart disease compared to non- smokers, suggesting a higher risk of heart disease. Adjusted Model: After adjusting for age, sex, and BMI, the odds ratio for smoking decreases to 1.80, indicating that part of the unadjusted association was explained by these other factors. 17 17 3 FEATURES OF A CONFOUNDER A confounder (third variable) 1. Should be associated the independent variable (exposure) 2. Should be associated the dependent variable (outcome) 3. Should not be an effect of the exposure (not be part of the causal pathway). 18 18 EFFECT MODIFICATION Effect modification occurs when the strength or direction of the association between an exposure and an outcome changes depending on the level of a third variable. Effect modification vs. Confounding Unlike confounders, effect modifiers reveal information about how an exposure’s effect varies across subgroups. Confounders distort the association, while effect modifiers change how the association behaves across groups. Example: In a study on the effect of exercise on heart health, gender may be an effect modifier if exercise impacts heart health differently in males and females. 19 19 EXAMPLES OF EFFECT MODIFICATION Physical Activity and Diabetes Prevention (Body Weight as Effect Modifier) Independent variable: Physical activity Dependent variable: Risk of developing diabetes Effect Modifier: Baseline body weight (Overweight vs. Normal weight) Results: Normal Weight Group: Relative Risk (RR) of diabetes with regular physical activity = 0.8 Overweight Group: RR of diabetes with regular physical activity = 0.5 Interpretation: Regular physical activity is more protective against diabetes in the overweight group (RR = 0.5) than in the normal weight group (RR = 0.8). 20 20 EXAMPLES OF EFFECT MODIFICATION Smoking and Lung Cancer Risk (Genetic Susceptibility as Effect Modifier) Independent variable: Smoking Dependent variable: Risk of lung cancer Effect Modifier: Genetic susceptibility (e.g., presence of a high-risk gene variant) Results: Without High-Risk Gene: Odds Ratio (OR) for smoking and lung cancer = 2.5 With High-Risk Gene: OR for smoking and lung cancer = 6.0 Interpretation: Smoking increases lung cancer risk more substantially in individuals with the high- risk gene (OR = 6.0) compared to those without it (OR = 2.5). 21 21 IDENTIFY A CONFOUNDER AND EFFECT MODIFIER Confirm that (1) is statistically significant Confirm that (2) is statistically significant Calculate these measures of association (e.g., OR) for (3): OR crude (OR unadjusted) for exposure-outcome association OR adjusted for exposure-outcome association OR1 for first stratum for exposure-outcome association OR2 for second stratum for exposure-outcome association No effect means neither confounding nor effect modification present 22 22