Cox Model with Multiple Predictors PDF
Document Details
Uploaded by FondMonkey75
King Khalid University, Abha
Tags
Summary
This document describes the Cox proportional hazards model, including how to use R to fit and interpret Cox models. The document discusses creating models to identify variables influencing survival time in statistical modeling, and explores prediction of survival using the Cox model, specifically addressing patients with advanced lung cancer.
Full Transcript
Cox model with multiple predictors The Cox is easily extended to accommodate multiple predictors, each of whose effects is assumed to be proportional overtime. h(t|Xi, X2, .. • Xp) = ho(t)exp(biXi + bzXz-}-... -\-bpXp^ Each coefficient bi can be exponentiated to calculate a hazard ratio. Fitting...
Cox model with multiple predictors The Cox is easily extended to accommodate multiple predictors, each of whose effects is assumed to be proportional overtime. h(t|Xi, X2, .. • Xp) = ho(t)exp(biXi + bzXz-}-... -\-bpXp^ Each coefficient bi can be exponentiated to calculate a hazard ratio. Fitting the Cox model with the survival package New data set for Cox modeling We will be using the 1 ung data set form the survival package for Cox modeling. The data describe survival of patients with advanced lung cancer. Variables: ■ ti me: survival time in days ■ status: i=censored, 2=deadf ■ age: age in years (assessed at beginning) I x I i ■ sex: i=male, 2=female ■ wt.loss: weight loss (pounds) in last 6 months t Remember that the Surv() function accepts a status variable with i=censored and 2=event * We would normally recommend that binary variables be coded 0/1 so that the intercept is interpretable; however, there is no intercept in the Cox model, so 1/2 coding is equivalent. Fitting a Cox model Use coxph(formula,data=) THe formula resembles a typical R regression formula: Surv(time, event) ~ xl + x2... where xl + x2... is a list of one or more predictor variables separated by +. # fit cox modeL and save resuLts lung.cox <- coxph(Surv(time, status) ~ age + sex + wt.loss, data=lung) # summary of resuLts summary(lung.cox) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: coxph(formula = Surv(time, status) ~ age + sex + wt.loss. data = lung) n= 214, number of events= 152 (14 observations deleted due to missingness) coef age 0.0200882 sex -0.5210319 Wt.loss 0.0007596 Signif. codes: age sex wt.loss exp(coef) 1.0202913 0.5939074 1.0007599 se(coef) z Pr(>|z|) 0.0096644 2.079 0.0377 * 0.1743541 -2.988 0.0028 ** 0.0061934 0.123 0.9024 0 '***' 0.001 '**' 0.01 '*' 0.05 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper .95 1.0203 0.9801 1.0011 1.0398 0.5939 1.6838 0.4220 0.8359 1.0008 0.9992 0.9887 1.0130 Concordances 0.612 (se = 0.027 ) Likelihood ratio test= 14.67 on 3 df, Wald test =13.98 on 3 df, Score (logrank) test = 14.24 on 3 df, p=0.002 p=0.003 p=0.003 Notice: ■ sample size (214), number of events (152), number of observations dropped due to missingness ■ coef: log hazard ratio coefficients • generally, we interpret positive coefficients as increasing the log-hazard and lowering survival, and negative coefficients as decreasing the log-hazard and increasing survival ■ exp(coef): hazard ratios (exponentiated coefficients) • HRage = 1.0203, for each additional year of age at baseline, the hazard increases by 2.03%, or by a factor of 1.0203 • HRsex = .5939, females have 60% the hazard of males, or a 40% decrease • HRwtioss = 1.008, for each additional pound of weight loss, the hazard increases by 0.08% ■ se (coef): standard error of log hazard ratios ■ Pr (> | z | ): p-value for test of log hazard ratio = 0, or equivalently, hazard ratio = 1 ■ lower .95, upper .95: 95% confidence interval for hazard ratio • the data are consistent with hazard ratio estimates for sex between .422 and .836 • the confidence interval for wt.loss indicates that we cannot be sure of the direction of its effect, if any ■ Conco rdance: proportion of pairs that are concordant, a goodness-of-fit measure ■ Li kel i hood rati o, Wal d, and Score tests: 3 tests of the null hypothesis that all coefficients equal zero Tidy coxph ( ) results We can ti dy O the coxph () results and store them in a tibble data.frame. # save summarized resuLts as data.frame # exponentiate=T returns hazard ratios lung.cox.tab <- tidy(lung.cox, exponentiate=T, conf.int=T) # dispLay tabLe lung.cox.tab ## ## ## ## ## ## # A tibble : 3 x 7 term estimate std.error statistic p.value conf.low conf.high <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 age 1.02 0.00966 2.08 0.0377 1.00 1.04 2 sex 0.594 0.174 -2.99 0.00280 0.422 0.836 3 wt.loss 1.00 0.00619 0.123 0.902 0.989 1.01 Storing the Cox model results as a data.frame makes it easy to use ggpl ot2 to create plots of the hazard ratios and confidence intervals. # pLot of hazard ratios and 95% Cis ggplot(lung.cox.tab, aes(y=term, x=estimate, xmin=conf.low, xmax=conf.high)) + geom_pointrange() + # pLots center point (x) and range (xmin, xmax) geom_vline(xintercept=l, color="red") + # vertical. Line at HR=1 labs(x="hazard ratio", title="Hazard ratios and 95% Cis") + theme_classic() Hazard ratios and 95% Cis wt.loss E q sex age 0.4 0.6 0.8 hazard ratio 1.0 Predicting survival with Cox estimates We often want to estimate and compare survival functions for subjects with different sets of covariates. Because the Cox model does not estimate survival directly, we first use non-parameteric methods (similar to KM-estimator) to estimate the baseline survival function, So (t), the survival function for a subject with zero on all covariates. S0(t) = S(i|Xi = 0,X2 = 0,..., Xp = 0) After we estimate the baseline survival function, So(f), we can then estimate the survival function for a subject with non-zero covariate values using the regression coefficients estimated from the Cox model and this relation: S(t|Xi = 251, X2 = x2,... ,XP = xp) = s0(t)exp^Xi+b2X2+ "+b^ Because So(t) is estimated non-parametrically, survival functions estimated after coxph () will again be step functions that change values only at event times observed in the data. Predicting survival after coxph( ) with survfit ( ) The su rvf 110 function, when supplied a coxph model object, performs all of the calculations to predict survival. If no covariate values are supplied to survfi t(), then a survival function will be estimated for a subject with mean values on all model covariates. We can then use tidyO to produce a table of the survival function estimated by survfi tO: # predict survival, function for subject with means on aLL covariates surv.at.means <- survfit(lung.cox) #tabLe of survival, function tidy(surv.at.means) ## # A tibble: 179 x 8 ## time n.risk n .event n,.censor estimate std.error conf.high conf.low ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 5 214 1 0 0.996 0.00443 1 0.987 ## 2 11 2 0.987 0.00772 1 0.972 213 0 ## 3 12 211 1 0.982 0.00894 0 1.00 0.965 ## 4 13 210 2 0 0.973 0.0110 0.995 0.953 ## 5 15 208 1 0 0.969 0.0120 0.992 0.946 ## 6 0 0.964 0.989 0.940 26 207 1 0.0128 ## 7 30 206 1 0 0.960 0.0137 0.986 0.935 ## 8 1 0.929 31 205 0 0.955 0.0145 0.983 ## 9 204 2 0.917 53 0 0.946 0.0160 0.976 ## 10 54 202 1 0.942 0.0167 0.912 0 0.973 ## # ... with 169 more rows Plotting survival curves We can also use plotQ on the survfi t() object to plot a predicted survival curve. # pLot of predicted survival for subject at means of covariates plot(surv.at.means, xlab-"days", ylab="survlval probability") Predicting survival at specific covariate values Predicting survival for a subject at the mean of all covariates may not make sense, particularly if one or more of the covariates are factors (categorical). Instead, it is recommended to always supply a data.frame of covariate values at which to predict the survival function to the newdata= option of su rvf 1t (): First we create the new data set. # create new data for plotting: 1 row for each sex # and mean age and wt.Loss for both rows plotdata <- data.frame(age=mean(lung$age), sex=i:2, wt.loss=mean(lung$wt.loss, na.rm=T)) # Look at new data plotdata ## age sex wt.loss ## 1 62.44737 1 9.831776 ## 2 62.44737 2 9.831776 Then we supply the new data to su rvf11, along with the model object created by coxph O • We ti dyO the survfi t object to produce a table of predicted survival functions. Note the column suffixes . 1 and . 2 that differentiate the survival, standard error, and confidence interval estimates between the 2 sexes. ■ for example, esti mate. 1 is the survival function estimates for sex=l, males, while esti mate. 2 is for females. # get survival function estimates for each sex surv.by.sex <- survfitClung.cox, newdata=plotdata) # one function for each sex # tidy results tidy(surv.by.sex) ## # A tibble : 179 x 12 ## time n .risk n.event n,.censor estimate.1 estimate.2 std.error.l std.error.2 <dbl> <dbl> ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 5 214 1 0 0.995 0.997 0.00546 0.00327 ## 2 11 213 2 0 0.984 0.990 0.00953 0.00577 ## 3 12 211 1 0.987 0.0111 0.00674 0 0.978 ## 4 2 0.967 0.0137 0.00844 13 210 0 0.980 ## 5 15 208 1 0 0.962 0.977 0.0148 0.00921 ## 6 26 207 1 0 0.956 0.974 0.0159 0.00995 ## 7 30 206 1 0 0.951 0.971 0.0170 0.0107 # # # # # # 8 # 9 # 10 # # # # 31 205 1 0 0.945 0.967 0.0180 53 204 2 0 0.934 0.960 0.0199 54 202 1 0 0.929 0.957 0.0208 with 169 more rows, and 4 more variables: conf.high.1 <dbl>, conf.high.2 <dbl>, conf.low.1 <dbl>, conf.low.2 <dbl> 0.0113 0.0127 0.0133 Plotting multiple predicted survival functions We can also pl ot 0 the su rvf 11 object to graph the predicted survival functions. We must again request confidence intervals because we are plotting more than one curve. # plot survival estimates plot(surv.by.sex, xlab="days", ylab="survival probability' conf.int=T, col=c("bluenJ "red")) days Formore control over plots of predicted survival functions, we can use ggsurvplotO from survmi ne r again. If a dataset was specified as newdata= in su rvf 110 to generate survival function estimates, then that dataset must be specified as data= in ggsurvplotO as well. # data= is the same data used in survfit() # censor=F removes censoring symbol.s ggsurvplot(surv.by.sex,, data=plotdata, censor=F, legend.labs=c("male"j "female")) Strata ■ male — female The hazard ratio comparing males to females was HRsex = .594, meaning that females have about 60% the hazard rate that males do. Females thus have better overall survival than males, depicted in the graph above. Assessing the proportional hazards assumption As with any statistical model, the plausibility of the model assumptions affects our confidence in the results. Several methods have been developed to assess the proportional hazards assumption of the Cox model. Here we discuss 2 tools developed by Grambsch and Therneau (1994). A chi-square test based on Schoenfeld residuals is available with cox. zph () to test the hypothesis: Ho' covariate effect is constant (proportional) overtime Ha: covariate effect changes overtime The null hypothesis of proportional hazards is tested for each covariate individually and jointly as well. cox.zph(lung.cox) ## ## ## ## ## age sex wt.loss GLOBAL chisq df P 0.5077 1 0.48 2.5489 1 0.11 0.0144 1 0.90 3.0051 3 0.39 No strong evidence of violation of proportional hazards for any covariate, though some suggestion that sex may violate. Another tool used to assess proportional hazards is a plot of a smoothed curve over the Schoenfeld residuals. To create this plot, pl ot O the object returned by cox. zph O • These plots depict an estimate of the coefficient (labelled “Beta(t)” in the plot) (y-axis) across time (x-axis). Proportional hazards is indicated by flat line. plot(cox.zph(lung.cox)) Time Beta(t) for age Time Beta(t) for sex Again we see some evidence of non-proportional hazards for sex, as the effect of sex seems to increase with time. The effect of sex seems to be strongest at the beginning of follow-up, but then trends toward zero as time passes. Dealing with proportional hazards violations Many strategies have been proposed to account for violation of PH. We discuss two here: ■ stratify by the non-PH variable ■ add an interaction of the non-PH variable with time to the model If the change in the coefficient is not large enough to be clinically meaningfully, it can perhaps be ignored as well. Imagine we are interested in the effect of weight loss (wt .loss) on survival, but are also concerned that possible PH violation by sex in our Cox models may bias estimates of the wt.loss effect. We can perform a sensitivity analysis to show how our inferences regarding the wt.loss effect change depending on whether we address the PH violation by sex or not. In the standard Cox model assuming PH for sex, there was not much evidence that wt.loss was strongly predictive of survival, and we cannot even be confident about the direction of the effect. # reprinting original. modeL resuLts summary(lung.cox) ## Call: ## coxph(formula = Surv(time, status) ~ age + sex + wt.loss, data = lung) ## ## n= 214, number of events= 152 ## (14 observations deleted due to missingness) ## ## coef exp (coef) se(coef) z Pr(>|z|) ## age 0.0200882 1.0202913 0.0096644 2.079 0.0377 * tttt sex -0.5210319 0.5939074 0.1743541 -2.988 0.0028 ** ## wt.loss 0.0007596 1.0007599 0.0061934 0.123 0.9024 ## — ## Signif. codes: ## ## ## ## ## ## ## ## ## ## age sex wt.loss 0 ' ***' 0.001 '**' 0.01 ' *' 0.05 exp(coef) exp(-coef) lower .95 upper .95 1.0203 0.9801 1.0011 1.0398 0.5939 1.6838 0.4220 0.8359 1.0008 0.9992 0.9887 1.0130 Concordance= 0.612 (se = 0.027 ) Likelihood ratio test= 14.67 on 3 df, Wald test = 13.98 on 3 df, Score (logrank) test = 14.24 on 3 df, p=0.002 p=0.003 p=0.003 0.1 ' ' 1 Stratified Cox model Stratification provides a general approach to control for the effects of a variable, even if it violates PH. Drawback: we cannot quantify the effect of the stratification variable on survival (i.e., no coefficient will be estimated). In the stratified Cox model: ■ the Cox model is estimated separately in each stratum ■ the baseline hazard function, ho(t)f is allowed to be different across strata • this can accommodate the non-proportional effects of the stratification variable ■ the parameter estimates are then averaged across strata to generate one final set of estimates Put the stratification variable inside st rata O within the coxph () model formula: lung.strat.sex <- coxph(Surv(time, status) ~ age + wt.loss + strata(sex), datalung) summary(lung.strat.sex) # # # # # # # # # # # # ## # # # ## # # # # # # # # # # # # # # # # Call: coxph(formula = Surv(time, status) ~ age + wt.loss + strata(sex), data = lung) n= 214, number of events= 152 (14 observations deleted due to missingness) coef exp(coef) se(coef) z Pr(>|z|) age 0.0192190 1.0194049 0.0096226 1.997 0.0458 * wt.loss 0.00014121.0001412 0.0062509 0.023 0.9820 --Signif. codes: 0 '***' 0.001 0.01 0.05 1.' 0.1 1 1 1 # exp(coef) exp(-coef) lower .95 upper .95 # age 1.019 0.9810 1.000 1.039 # wt.loss 1.000 0.9999 0.988 1.012 # # # # Concordances 0.561 (se = 0.027 ) Likelihood ratio test= 4.09 on 2 df, p=0.1 Wald test = 3.99 on 2 df, p=0.1 Score (logrank) test =4 on 2 df, p=0.1 We see no coefficients for sex, the stratification variable. Although the estimates have changed a bit, our inferences regarding wt.loss (and age) are similar to those from the model where we assume PH for sex. Modeling time-varying coefficients The Cox model can be extended to allow the effects of a covariate (coefficients) to change over time, by interacting that covariate with some function of time. However, unlike other regression models, we cannot create this interaction term by simply multiplying the covariate by the time variable (unless the data are in a special structure, see survSpl 11 ()). Instead, we strongly recommend the use of the time-transform function,tt (), to avoid these easily-made mistakes. ■ Specify the nonPH covariate both by itself and inside of tt O in the model formula ■ Define the function tt () within coxph () • always start with tt = function(x,t,...) • then define the relationship between the covariate x and time t. For example: 0 x*t will allow coefficient to change with time linearly ° x*l og (t) allows coefficient to change with log (magnitudes) of time 0 x*(t>100) allows coefficient to take on 2 different values, one value when t <= 100 and another value t > 100. Below we specify an interaction of sex with time itself, so that the effect of sex is allowed to change linearly with time. # notice sex and tt(sex) in model, formuLa lung.sex.by.time <- coxph(Surv(time, status) ~ age + wt.loss + sex + tt(sex), # sex and tt(sex) in formuLa data=lung, tt=function(x,t,...) x*t) # Linear change in effect of sex summary(lung.sex.by.time) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: coxph(formula = Surv(time, status) ~ age + wt.loss + sex + tt(sex), data = lung, tt = function(x, t, ...) x * t) n= 214, number of events= 152 (14 observations deleted due to missingness) coef exp(coef) se(coef) z Pr(>|z|) age 0.0194343 1.0196244 0.0096522 2.013 0.0441 * wt.loss 0.0001260 1.0001261 0.0062502 0.020 0.9839 sex -0.9417444 0.3899470 0.3224791 -2.920 0.0035 ** tt(sex) 0.0013778 1.0013787 0.0008581 1.606 0.1084 --Signif. codes: 0 ' ***' 0.001 '**' 0.01 ' *' 0.05 0.1 ' ## exp(coef) exp(-coef) lower .95 upper .95 ## age 1.0196 0.9808 1.0005 1.0391 ## wt.loss 1.0001 0.9999 0.9879 1.0125 0.3899 0.7337 ## sex 2.5645 0.2073 ## tt(sex) 1.0014 0.9986 1.0031 0.9997 ## ## ## ## ## Concordance= 0.613 (se = 0.027 ) Likelihood ratio tests 17.23 on4 df, Wald test =15.86 on4 df, Score (logrank) test = 16.44 on4 df, p=0.002 p=0.003 p=0.002 The coef estimated for sex is the log-hazard-ratio at day t = 0, bsex = —.942, corresponding to a hazard ratio of HRsex = exp(bsex) = -39. The coef for tt (sex) is the change in the log-hazard-ratio for each additional day that passes. These estimates match the graph of the smoothed Schoenfeld residuals for sex. At the beginning of follow-up, the coefficient is close to -1, and it increases gradually overtime. plot(cox.zph(lung.cox), var="sex") Again, our inferences regarding wt.loss are mostly unchanged.