Cox Proportional Hazards with Ties

There are several versions of the Cox partial likelihood when ties are present in death times (multiple deaths at the same time). These are discussed in Section 8.4 of the textbook. We review the Breslow method (8.4.1) here.

Let \(t_1,\ldots,t_D\) denote the distinct death times, \(d_i\) the number of deaths at time \(t_i\), and \(D_i\) the set of individuals who died at time \(t_i\). Let \(s_i = \sum_{j \in D_i} Z_j\). Then the partial likelihood, derived by profiling out the non-parametric component of the hazard, is

\[ L_1(\beta) = \prod_{i=1}^D \frac{exp(\beta^T s_i)}{\left(\sum_{j \in R(i)} exp(\beta^T Z_j)\right)^{d_i}} \]

The maximum (partial) likelihood estimator is \(\widehat{\beta} = argmax_{\beta} L_1(\beta)\). As with the case with no ties, this likelihood cannot be analytically maximized. However we can numerically maximize the likelihood through a grid search, Newton’s method, or other optimization techniques. These optimization techniques are typically applied to \(\log L_1\) (since the log is a monotonic transform, maximizer of \(L_1\) is the same as the maximizer of \(\log L_1\)).

The book discusses two other partial likelihood for cases with ties.

Assessment of Cox Model Fit with Categorical Predictors

Idea

A general approach to assess the quality of model \(A\) is to do the following:

  • Fit a model \(B\) which is more flexible than model \(A\) (e.g. contains model \(A\) as a submodel)
  • Assess whether the fit with model \(B\) is similar to the fit with model \(A\). If yes, then model \(A\) seems reasonable.

For the Cox proportional hazards model, the baseline hazard is unspecified, but the relationship between hazards is controlled by the \(\beta\) parameter. For the case with a categorical covariate, we can fit a model with separate non-parametric harards to each group and assess whether the relationship between these hazards is consistent with the Cox Proportional hazards assumption.

The mathematical implementation of this idea is as follows. Suppose we have a single covariate \(Z\) with two possible values \(0\) or \(1\) and we fit a Cox PH model of the standard form

\[ \lambda(t|Z) = \lambda_0(t)e^{\beta Z} \] This implies a cumulative hazard function of \[ \Lambda(t|Z) = e^{\beta Z} \int_{s=0}^t \lambda_0(s) ds \] So the the log cumulative hazard ratio (under the Cox PH model) between the two groups \(Z=1\) and \(Z=0\) is a constant

\[ \log \frac{\Lambda(t|Z=1)}{\Lambda(t|Z=0)} = \beta \] To determine if this assumption is reasonable, we fit Nelson–Aalen models to estimate \(\Lambda(t|Z=1)\) and \(\Lambda(t|Z=0)\) and compute their log ratio. If the resulting function appears approximately constant, then the Cox PH model may be reasonable.

Example Data Set

This is example 8.4 in the textbook.

library(KMsurv)
library(survival)
data(kidney,package="KMsurv")
head(kidney)
##   time delta type
## 1  1.5     1    1
## 2  3.5     1    1
## 3  4.5     1    1
## 4  4.5     1    1
## 5  5.5     1    1
## 6  8.5     1    1

We fit the Cox proportional hazards model using the Breslow method for partial likelihood.

?coxph
fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit
## Call:
## coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
## 
##         coef exp(coef) se(coef)      z    p
## type -0.6182    0.5389   0.3981 -1.553 0.12
## 
## Likelihood ratio test=2.45  on 1 df, p=0.1175
## n= 119, number of events= 26

We assess the fit of the Cox model by computing non-parametric estimates for the cumulative hazard for each treatment (each value of type). The Cox model does not appear realistic.

## fit survival functions using fleming-harrington
## this is same as Nelson-Aalen for cumulative hazard
fit <- survfit(Surv(time,delta)~type,data=kidney,
                 type="fleming-harrington")

## separate times and survival functions for two groups
ti1 <- fit$time[1:fit$strata[1]]
ti2 <- fit$time[(fit$strata[1]+1):length(fit$time)]
s1 <- fit$surv[1:fit$strata[1]]
s2 <- fit$surv[(fit$strata[1]+1):length(fit$time)]

## interpolate so survival functions
## evaluated on same grid
lg <- max(min(ti1),min(ti2))
rg <- min(max(ti1),max(ti2))
g <- seq(lg,rg,length.out=100)
s1g <- approx(ti1,s1,xout=g,method="constant",f=0)
s2g <- approx(ti2,s2,xout=g,method="constant",f=0)

## survival functions to cumulative hazards
g1H <- -log(s1g$y)
g2H <- -log(s2g$y)

## log hazard ratio
plot(g,log(g2H/g1H),type='l',xlab="Time",
     ylab="log(H(t|Z=1)/H(t|Z=0))",
     main="Figure 8.1 in Textbook")

Cox versus Kaplan Meier / Nelson-Aalen with Categorical Predictors

In the above example, the Cox model was a poor fit. Kaplan-Meier or Nelson-Aalen is more appropriate for this data set.

Cox Model Advantages

  • Fewer parameters, more power. Especially important with limited data.
  • Parameter \(\beta\) provides simple summary of covariate effect.

Nelson–Aalen / Kaplan–Meier

  • More flexibility.

Global and Local Tests

The book discusses Wald, Likelihood Ratio, and Score tests in Section 8.5. We focus here on Wald tests. The likelihood ratio is the default in coxph function in R, but the output allows you to run Wald tests.

For the hypothesis

\[ \begin{align*} &H_0: \beta = \beta_0\\ &H_a: \beta \neq \beta_0 \end{align*} \]

the Wald test statistic is

\[ T_W = (\widehat{\beta} - \beta_0)^T \widehat{I}(\widehat{\beta})(\widehat{\beta} - \beta_0) \]

where \(\widehat{I}(\widehat{\beta}) = -\frac{\partial^2}{\partial \beta^2} \log L(\beta)|_{\beta=\widehat{\beta}}\) is the observed Fisher information (the negative Hessian of the log likelihood at \(\widehat{\beta}\), alternatively the inverse of the variance of \(\widehat{\beta}\)). \(T_W\) has a \(\chi_q^2\) (q is dimension of \(\beta\)) distribution under \(H_0\). We can reject the test at level \(\alpha\) if the test statistic exceeds the \(1-\alpha\) quantile of a \(\chi^2_q\) distribution.

In the 1 dimensional case with \(\beta_0=0\), the test statistic simplifies to

\[ T_W = \frac{\widehat{\beta}^2}{\widehat{\sigma}^2} \] where \(\widehat{\sigma^2}\) is the variance estimate of \(\widehat{\beta}\).

We consider the 1-dimensional Kidney data (for experimental purposes only, recall that the Cox PH model does not fit this data well) and run a Wald test with \(H_0: \beta=0\).

fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit
## Call:
## coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
## 
##         coef exp(coef) se(coef)      z    p
## type -0.6182    0.5389   0.3981 -1.553 0.12
## 
## Likelihood ratio test=2.45  on 1 df, p=0.1175
## n= 119, number of events= 26
str(fit)
## List of 18
##  $ coefficients     : Named num -0.618
##   ..- attr(*, "names")= chr "type"
##  $ var              : num [1, 1] 0.159
##  $ loglik           : num [1:2] -104 -103
##  $ score            : num 2.49
##  $ iter             : int 3
##  $ linear.predictors: num [1:119] 0.395 0.395 0.395 0.395 0.395 ...
##  $ residuals        : Named num [1:119] 0.915 0.857 0.824 0.824 0.805 ...
##   ..- attr(*, "names")= chr [1:119] "1" "2" "3" "4" ...
##  $ means            : Named num 1.64
##   ..- attr(*, "names")= chr "type"
##  $ method           : chr "breslow"
##  $ n                : int 119
##  $ nevent           : num 26
##  $ terms            :Classes 'terms', 'formula'  language Surv(time, delta) ~ type
##   .. ..- attr(*, "variables")= language list(Surv(time, delta), type)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "Surv(time, delta)" "type"
##   .. .. .. ..$ : chr "type"
##   .. ..- attr(*, "term.labels")= chr "type"
##   .. ..- attr(*, "specials")=Dotted pair list of 3
##   .. .. ..$ strata : NULL
##   .. .. ..$ cluster: NULL
##   .. .. ..$ tt     : NULL
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= num 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(Surv(time, delta), type)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "nmatrix.2" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "Surv(time, delta)" "type"
##  $ assign           :List of 1
##   ..$ type: int 1
##  $ wald.test        : Named num 2.41
##   ..- attr(*, "names")= chr "type"
##  $ concordance      : Named num [1:7] 413 422 938 2 18 ...
##   ..- attr(*, "names")= chr [1:7] "discordant" "concordant" "tied.x" "tied.y" ...
##  $ y                : 'Surv' num [1:119, 1:2]  1.5  3.5  4.5  4.5  5.5  8.5  8.5  9.5 10.5 11.5 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:119] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:2] "time" "status"
##   ..- attr(*, "type")= chr "right"
##  $ formula          :Class 'formula'  language Surv(time, delta) ~ type
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  $ call             : language coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
##  - attr(*, "class")= chr "coxph"

The Wald test statistic is given in fit$wald.test and can also be obtained from the coefficient estimates and variance. The resulting p-value is similar to the likelihood ratio p-value.

## wald test statistic
coef(fit)^2/fit$var
##          [,1]
## [1,] 2.410893
## wald test statistic
fit$wald.test
##     type 
## 2.410893
## wald p-value, similar to likelihood ratio p-value
## which can be see by print(fit)
pchisq(fit$wald.test,df=1,lower.tail=FALSE)
##      type 
## 0.1204936

Global and Local Tests

Suppose \(\beta \in \mathbb{R}^4\). We may want to test whether the first three parameters are equal to \(0\). This is

\[ \begin{align*} &H_0: \beta_1=\beta_2=\beta_3=0\\ &H_a: Not \, \, H_0 \end{align*} \] We simply subset the MLE \(\widehat{\beta}\) and inverse observed Fisher information \(\widehat{I}(\widehat{\beta})^{-1}\) to the relevant parameter subset (in this case the first 3 elements of \(\widehat{\beta}\) and the upper \(3 \times 3\) portion of \(\widehat{I}(\widehat{\beta})^{-1}\)).

We apply these ideas to the larynx data set, using age and cancer stage as predictors of survival.

data(larynx)
head(larynx)
##   stage time age diagyr delta
## 1     1  0.6  77     76     1
## 2     1  1.3  53     71     1
## 3     1  2.4  45     71     1
## 4     1  2.5  57     78     0
## 5     1  3.2  58     74     1
## 6     1  3.2  51     77     0
fit <- coxph(Surv(time,delta)~as.factor(stage)+age,
             data=larynx,ties="breslow")
fit
## Call:
## coxph(formula = Surv(time, delta) ~ as.factor(stage) + age, data = larynx, 
##     ties = "breslow")
## 
##                      coef exp(coef) se(coef)     z        p
## as.factor(stage)2 0.13856   1.14862  0.46231 0.300    0.764
## as.factor(stage)3 0.63835   1.89335  0.35608 1.793    0.073
## as.factor(stage)4 1.69306   5.43607  0.42221 4.010 6.07e-05
## age               0.01890   1.01908  0.01425 1.326    0.185
## 
## Likelihood ratio test=18.07  on 4 df, p=0.001197
## n= 90, number of events= 50

We conduct a global Wald test that all coefficients equal \(0\).

fit$wald.test
## [1] 20.81687
fit$var
##             [,1]         [,2]          [,3]          [,4]
## [1,] 0.213726421 0.0683008943  0.0689498211  0.0008015350
## [2,] 0.068300894 0.1267932600  0.0682084308  0.0003144479
## [3,] 0.068949821 0.0682084308  0.1782595628 -0.0003990833
## [4,] 0.000801535 0.0003144479 -0.0003990833  0.0002030920
coef(fit)
## as.factor(stage)2 as.factor(stage)3 as.factor(stage)4               age 
##        0.13856390        0.63834973        1.69305644        0.01890184
co_mat <- matrix(coef(fit),ncol=1)
t(co_mat)%*%solve(fit$var)%*%co_mat
##          [,1]
## [1,] 20.81687
pchisq(fit$wald.test,df=4,lower.tail=FALSE)
## [1] 0.0003442704

We now test whether the first three parameters are 0 (ignoring age). This is equivalent to testing for no effect due to cancer stage.

co_mat <- co_mat[1:3]
var_mat <- fit$var[1:3,1:3]
tw <- t(co_mat)%*%solve(var_mat)%*%co_mat
tw
##         [,1]
## [1,] 17.6288
co_mat
## [1] 0.1385639 0.6383497 1.6930564
var_mat
##            [,1]       [,2]       [,3]
## [1,] 0.21372642 0.06830089 0.06894982
## [2,] 0.06830089 0.12679326 0.06820843
## [3,] 0.06894982 0.06820843 0.17825956
pchisq(tw,df=3,lower.tail=FALSE)
##              [,1]
## [1,] 0.0005245945
dim(larynx)
## [1] 90  5

Optional Exercises