Cox Proportional Hazards: Ties and Tests

Cox Proportional Hazards with Ties

There are several versions of the Cox partial likelihood when ties are present in death times (multiple deaths at the same time). These are discussed in Section 8.4 of the textbook. We review the Breslow method (8.4.1) here.

Let $t_1,\ldots,t_D$ denote the distinct death times, $d_i$ the number of deaths at time $t_i$, and $D_i$ the set of individuals who died at time $t_i$. Let $s_i = \sum_{j \in D_i} Z_j$. Then the partial likelihood, derived by profiling out the non-parametric component of the hazard, is

\[ L_1(\beta) = \prod_{i=1}^D \frac{exp(\beta^T s_i)}{\left(\sum_{j \in R(i)} exp(\beta^T Z_j)\right)^{d_i}} \]

The maximum (partial) likelihood estimator is $\widehat{\beta} = argmax_{\beta} L_1(\beta)$. As with the case with no ties, this likelihood cannot be analytically maximized. However we can numerically maximize the likelihood through a grid search, Newton’s method, or other optimization techniques. These optimization techniques are typically applied to $\log L_1$ (since the log is a monotonic transform, maximizer of $L_1$ is the same as the maximizer of $\log L_1$).

The book discusses two other partial likelihood for cases with ties.

Assessment of Cox Model Fit with Categorical Predictors

Idea

A general approach to assess the quality of model $A$ is to do the following:

Fit a model $B$ which is more flexible than model $A$ (e.g. contains model $A$ as a submodel)
Assess whether the fit with model $B$ is similar to the fit with model $A$. If yes, then model $A$ seems reasonable.

For the Cox proportional hazards model, the baseline hazard is unspecified, but the relationship between hazards is controlled by the $\beta$ parameter. For the case with a categorical covariate, we can fit a model with separate non-parametric harards to each group and assess whether the relationship between these hazards is consistent with the Cox Proportional hazards assumption.

The mathematical implementation of this idea is as follows. Suppose we have a single covariate $Z$ with two possible values $0$ or $1$ and we fit a Cox PH model of the standard form

\[ \lambda(t|Z) = \lambda_0(t)e^{\beta Z} \] This implies a cumulative hazard function of \[ \Lambda(t|Z) = e^{\beta Z} \int_{s=0}^t \lambda_0(s) ds \] So the the log cumulative hazard ratio (under the Cox PH model) between the two groups $Z=1$ and $Z=0$ is a constant

\[ \log \frac{\Lambda(t|Z=1)}{\Lambda(t|Z=0)} = \beta \] To determine if this assumption is reasonable, we fit Nelson–Aalen models to estimate $\Lambda(t|Z=1)$ and $\Lambda(t|Z=0)$ and compute their log ratio. If the resulting function appears approximately constant, then the Cox PH model may be reasonable.

Example Data Set

This is example 8.4 in the textbook.

library(KMsurv)
library(survival)
data(kidney,package="KMsurv")
head(kidney)

##   time delta type
## 1  1.5     1    1
## 2  3.5     1    1
## 3  4.5     1    1
## 4  4.5     1    1
## 5  5.5     1    1
## 6  8.5     1    1

We fit the Cox proportional hazards model using the Breslow method for partial likelihood.

?coxph
fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit

## Call:
## coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
## 
##         coef exp(coef) se(coef)      z    p
## type -0.6182    0.5389   0.3981 -1.553 0.12
## 
## Likelihood ratio test=2.45  on 1 df, p=0.1175
## n= 119, number of events= 26

We assess the fit of the Cox model by computing non-parametric estimates for the cumulative hazard for each treatment (each value of type). The Cox model does not appear realistic.

## fit survival functions using fleming-harrington
## this is same as Nelson-Aalen for cumulative hazard
fit <- survfit(Surv(time,delta)~type,data=kidney,
                 type="fleming-harrington")

## separate times and survival functions for two groups
ti1 <- fit$time[1:fit$strata[1]]
ti2 <- fit$time[(fit$strata[1]+1):length(fit$time)]
s1 <- fit$surv[1:fit$strata[1]]
s2 <- fit$surv[(fit$strata[1]+1):length(fit$time)]

## interpolate so survival functions
## evaluated on same grid
lg <- max(min(ti1),min(ti2))
rg <- min(max(ti1),max(ti2))
g <- seq(lg,rg,length.out=100)
s1g <- approx(ti1,s1,xout=g,method="constant",f=0)
s2g <- approx(ti2,s2,xout=g,method="constant",f=0)

## survival functions to cumulative hazards
g1H <- -log(s1g$y)
g2H <- -log(s2g$y)

## log hazard ratio
plot(g,log(g2H/g1H),type='l',xlab="Time",
     ylab="log(H(t|Z=1)/H(t|Z=0))",
     main="Figure 8.1 in Textbook")

Cox versus Kaplan Meier / Nelson-Aalen with Categorical Predictors

In the above example, the Cox model was a poor fit. Kaplan-Meier or Nelson-Aalen is more appropriate for this data set.

Cox Model Advantages

Fewer parameters, more power. Especially important with limited data.
Parameter $\beta$ provides simple summary of covariate effect.

Nelson–Aalen / Kaplan–Meier

More flexibility.

Global and Local Tests

The book discusses Wald, Likelihood Ratio, and Score tests in Section 8.5. We focus here on Wald tests. The likelihood ratio is the default in coxph function in R, but the output allows you to run Wald tests.

For the hypothesis

\[ \begin{align*} &H_0: \beta = \beta_0\\ &H_a: \beta \neq \beta_0 \end{align*} \]

the Wald test statistic is

\[ T_W = (\widehat{\beta} - \beta_0)^T \widehat{I}(\widehat{\beta})(\widehat{\beta} - \beta_0) \]

where $\widehat{I}(\widehat{\beta}) = -\frac{\partial^2}{\partial \beta^2} \log L(\beta)|_{\beta=\widehat{\beta}}$ is the observed Fisher information (the negative Hessian of the log likelihood at $\widehat{\beta}$, alternatively the inverse of the variance of $\widehat{\beta}$). $T_W$ has a $\chi_q^2$ (q is dimension of $\beta$) distribution under $H_0$. We can reject the test at level $\alpha$ if the test statistic exceeds the $1-\alpha$ quantile of a $\chi^2_q$ distribution.

In the 1 dimensional case with $\beta_0=0$, the test statistic simplifies to

\[ T_W = \frac{\widehat{\beta}^2}{\widehat{\sigma}^2} \] where $\widehat{\sigma^2}$ is the variance estimate of $\widehat{\beta}$.

We consider the 1-dimensional Kidney data (for experimental purposes only, recall that the Cox PH model does not fit this data well) and run a Wald test with $H_0: \beta=0$.

fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit

## Call:
## coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
## 
##         coef exp(coef) se(coef)      z    p
## type -0.6182    0.5389   0.3981 -1.553 0.12
## 
## Likelihood ratio test=2.45  on 1 df, p=0.1175
## n= 119, number of events= 26

str(fit)

## List of 18
##  $ coefficients     : Named num -0.618
##   ..- attr(*, "names")= chr "type"
##  $ var              : num [1, 1] 0.159
##  $ loglik           : num [1:2] -104 -103
##  $ score            : num 2.49
##  $ iter             : int 3
##  $ linear.predictors: num [1:119] 0.395 0.395 0.395 0.395 0.395 ...
##  $ residuals        : Named num [1:119] 0.915 0.857 0.824 0.824 0.805 ...
##   ..- attr(*, "names")= chr [1:119] "1" "2" "3" "4" ...
##  $ means            : Named num 1.64
##   ..- attr(*, "names")= chr "type"
##  $ method           : chr "breslow"
##  $ n                : int 119
##  $ nevent           : num 26
##  $ terms            :Classes 'terms', 'formula'  language Surv(time, delta) ~ type
##   .. ..- attr(*, "variables")= language list(Surv(time, delta), type)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "Surv(time, delta)" "type"
##   .. .. .. ..$ : chr "type"
##   .. ..- attr(*, "term.labels")= chr "type"
##   .. ..- attr(*, "specials")=Dotted pair list of 3
##   .. .. ..$ strata : NULL
##   .. .. ..$ cluster: NULL
##   .. .. ..$ tt     : NULL
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= num 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(Surv(time, delta), type)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "nmatrix.2" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "Surv(time, delta)" "type"
##  $ assign           :List of 1
##   ..$ type: int 1
##  $ wald.test        : Named num 2.41
##   ..- attr(*, "names")= chr "type"
##  $ concordance      : Named num [1:7] 413 422 938 2 18 ...
##   ..- attr(*, "names")= chr [1:7] "discordant" "concordant" "tied.x" "tied.y" ...
##  $ y                : 'Surv' num [1:119, 1:2]  1.5  3.5  4.5  4.5  5.5  8.5  8.5  9.5 10.5 11.5 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:119] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:2] "time" "status"
##   ..- attr(*, "type")= chr "right"
##  $ formula          :Class 'formula'  language Surv(time, delta) ~ type
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  $ call             : language coxph(formula = Surv(time, delta) ~ type, data = kidney, ties = "breslow")
##  - attr(*, "class")= chr "coxph"

The Wald test statistic is given in fit$wald.test and can also be obtained from the coefficient estimates and variance. The resulting p-value is similar to the likelihood ratio p-value.

## wald test statistic
coef(fit)^2/fit$var

##          [,1]
## [1,] 2.410893

## wald test statistic
fit$wald.test

##     type 
## 2.410893

## wald p-value, similar to likelihood ratio p-value
## which can be see by print(fit)
pchisq(fit$wald.test,df=1,lower.tail=FALSE)

##      type 
## 0.1204936

Global and Local Tests

Suppose $\beta \in \mathbb{R}^4$. We may want to test whether the first three parameters are equal to $0$. This is

\[ \begin{align*} &H_0: \beta_1=\beta_2=\beta_3=0\\ &H_a: Not \, \, H_0 \end{align*} \] We simply subset the MLE $\widehat{\beta}$ and inverse observed Fisher information $\widehat{I}(\widehat{\beta})^{-1}$ to the relevant parameter subset (in this case the first 3 elements of $\widehat{\beta}$ and the upper $3 \times 3$ portion of $\widehat{I}(\widehat{\beta})^{-1}$).

We apply these ideas to the larynx data set, using age and cancer stage as predictors of survival.

data(larynx)
head(larynx)

##   stage time age diagyr delta
## 1     1  0.6  77     76     1
## 2     1  1.3  53     71     1
## 3     1  2.4  45     71     1
## 4     1  2.5  57     78     0
## 5     1  3.2  58     74     1
## 6     1  3.2  51     77     0

fit <- coxph(Surv(time,delta)~as.factor(stage)+age,
             data=larynx,ties="breslow")
fit

## Call:
## coxph(formula = Surv(time, delta) ~ as.factor(stage) + age, data = larynx, 
##     ties = "breslow")
## 
##                      coef exp(coef) se(coef)     z        p
## as.factor(stage)2 0.13856   1.14862  0.46231 0.300    0.764
## as.factor(stage)3 0.63835   1.89335  0.35608 1.793    0.073
## as.factor(stage)4 1.69306   5.43607  0.42221 4.010 6.07e-05
## age               0.01890   1.01908  0.01425 1.326    0.185
## 
## Likelihood ratio test=18.07  on 4 df, p=0.001197
## n= 90, number of events= 50

We conduct a global Wald test that all coefficients equal $0$.

fit$wald.test

## [1] 20.81687

fit$var

##             [,1]         [,2]          [,3]          [,4]
## [1,] 0.213726421 0.0683008943  0.0689498211  0.0008015350
## [2,] 0.068300894 0.1267932600  0.0682084308  0.0003144479
## [3,] 0.068949821 0.0682084308  0.1782595628 -0.0003990833
## [4,] 0.000801535 0.0003144479 -0.0003990833  0.0002030920

coef(fit)

## as.factor(stage)2 as.factor(stage)3 as.factor(stage)4               age 
##        0.13856390        0.63834973        1.69305644        0.01890184

co_mat <- matrix(coef(fit),ncol=1)
t(co_mat)%*%solve(fit$var)%*%co_mat

##          [,1]
## [1,] 20.81687

pchisq(fit$wald.test,df=4,lower.tail=FALSE)

## [1] 0.0003442704

We now test whether the first three parameters are 0 (ignoring age). This is equivalent to testing for no effect due to cancer stage.

co_mat <- co_mat[1:3]
var_mat <- fit$var[1:3,1:3]
tw <- t(co_mat)%*%solve(var_mat)%*%co_mat
tw

##         [,1]
## [1,] 17.6288

co_mat

## [1] 0.1385639 0.6383497 1.6930564

var_mat

##            [,1]       [,2]       [,3]
## [1,] 0.21372642 0.06830089 0.06894982
## [2,] 0.06830089 0.12679326 0.06820843
## [3,] 0.06894982 0.06820843 0.17825956

pchisq(tw,df=3,lower.tail=FALSE)

##              [,1]
## [1,] 0.0005245945

dim(larynx)

## [1] 90  5

Optional Exercises

Find a better way to reproduce Textbook Figure 8.1 by using some simple existing R functions which does not require manually putting survival / hazard functions on same time grid.
Write a function to compute the Cox partial likelihood with ties using Breslow’s formula in 1-dimension. Compute and plot this function on a grid in a neighorhood of $\widehat{\beta}$ for the 1-dimensional kidney data set. Verify that your function is correct (the MLE returned by coxph should be at the maximum of your function).

Cox Proportional Hazards: Ties and Tests

James Long

2/28/2019

Cox Proportional Hazards with Ties

Assessment of Cox Model Fit with Categorical Predictors

Idea

Example Data Set

Cox versus Kaplan Meier / Nelson-Aalen with Categorical Predictors

Global and Local Tests

Global and Local Tests

Optional Exercises