Introduction to Cox Proportional Hazards Model

library(survival)

Data Set

We will work with a small subset of The Cancer Genome Atlas Renal Cell Carcinoma data set.

load("kirc_small.RData")
ls()

## [1] "kirc_small"

head(kirc_small)

##                              time death_obs BAP1_mutation SLC7A11_ex
## TCGA-3Z-A93Z-01A-11D-A36X-10  385         0             0    33.2466
## TCGA-6D-AA2E-01A-11D-A36X-10  362         0             0    16.7296
## TCGA-A3-3308-01A-01D-0966-08   16         0             1   132.5840
## TCGA-A3-3311-01A-01D-0966-08 1191         1             0    52.6930
## TCGA-A3-3313-01A-01D-0966-08  735         1             0   565.8552
## TCGA-A3-3316-01A-01D-0966-08 1493         0             0    93.8776

This is right censored data. time is time of death or censoring. death_obs is 1 if death observed, 0 if censored. BAP1_mutation indicates presence of mutation in BAP1 gene. SLC7A11_ex is expression of gene SLC7A11.

nrow(kirc_small)

## [1] 378

table(kirc_small$death_obs)

## 
##   0   1 
## 279  99

table(kirc_small$BAP1_mutation)

## 
##   0   1 
## 335  43

par(mfcol=c(1,2))
hist(kirc_small$time)
hist(log2(kirc_small$SLC7A11_ex))

Kaplan-Meier Curve for Survival

Associations and possible causal relations have been discovered for the effects of BAP1 mutations and SLC7A11 on survival. We plot Kaplan Meier for BAP1 and SLC7A11 thresholded on median value.

kirc_small[,"SLC7A11_gr_median"] <- kirc_small$SLC7A11_ex > median(kirc_small$SLC7A11_ex)

par(mfcol=c(1,2),mar=c(5,5,1,1))
fit1 <- survfit(Surv(time,death_obs)~BAP1_mutation,data=kirc_small)
plot(fit1,xlab="Days Since Diagnosis",ylab="S(t)",col=1:2,lty=1:2,lwd=2)
legend("bottomleft",names(fit1$strata),col=1:2,lty=1:2,lwd=2)
fit2 <- survfit(Surv(time,death_obs)~SLC7A11_gr_median,data=kirc_small)
plot(fit2,xlab="Days Since Diagnosis",ylab="S(t)",col=1:2,lty=1:2,lwd=2)
legend("bottomleft",names(fit2$strata),col=1:2,lty=1:2,lwd=2)

survdiff(Surv(time,death_obs)~BAP1_mutation,data=kirc_small)

## Call:
## survdiff(formula = Surv(time, death_obs) ~ BAP1_mutation, data = kirc_small)
## 
##                   N Observed Expected (O-E)^2/E (O-E)^2/V
## BAP1_mutation=0 335       79     87.3     0.795      6.76
## BAP1_mutation=1  43       20     11.7     5.951      6.76
## 
##  Chisq= 6.8  on 1 degrees of freedom, p= 0.009

survdiff(Surv(time,death_obs)~SLC7A11_gr_median,data=kirc_small)

## Call:
## survdiff(formula = Surv(time, death_obs) ~ SLC7A11_gr_median, 
##     data = kirc_small)
## 
##                           N Observed Expected (O-E)^2/E (O-E)^2/V
## SLC7A11_gr_median=FALSE 189       40     49.4      1.77      3.55
## SLC7A11_gr_median=TRUE  189       59     49.6      1.76      3.55
## 
##  Chisq= 3.5  on 1 degrees of freedom, p= 0.06

Comments / Questions :

Why cut SLC7A11 on median? More generally how / why should we discretize a continuous covariate? (Section 8.6 in book)
Build flexible model with both SLC7A11 and BAP1?
Transformations (such as log) for SLC7A11, since heavily right skewed.

See papers TCGA Renal Cell Carcinoma Analysis and SLC7A11 and BAP1 as Possible Causes of Poorer Survival Outcomes for more discussion of this data set. We will return to this in a few weeks.

Cox Proportional Hazards Formulation and Partial Likelihood

Let the data be \[ (T_j,\delta_j,Z_j) \] where \(T_j\) is the observation time, \(\delta_i\) is 0 for censored, and \(Z_j\) is the covariate. The partial likelihood with no ties (book Equation 8.3.1) is

\[ L(\beta) = \prod_{i=1}^D \frac{exp(\sum_{k=1}^p \beta_k Z_{(i)k})}{\sum_{j \in R(t_i)} exp(\sum_{k=1}^p \beta_k Z_{jk})} \]

Focusing on the case with 1-predictor, i.e. \(p=1\), we have

\[ L(\beta) = \prod_{i=1}^D \frac{exp(\beta Z_{(i)})}{\sum_{j \in R(t_i)} exp(\beta Z_{j})} \] We seek to find the \(\beta\) which maximizes this function which is equivalent to maximizing the log

\[ \beta_{MLE} = argmax_\beta LL(\beta) = \sum_{i=1}^D (\beta Z_{(i)} - \log(\sum_{j \in R(t_i)} exp(\beta Z_j))) \]

Maximizing the Partial Likelihood

We consider fitting the Cox proportional hazards model using only SLC7A11_ex as a predictor. We will only use unique times. We cover the partial likelihood for tied death times later.

length(unique(kirc_small$time))

## [1] 347

nrow(kirc_small)

## [1] 378

## first consider case with no event time ties

kirc_small_unique <- kirc_small[!duplicated(kirc_small$time),]

head(kirc_small_unique)

##                              time death_obs BAP1_mutation SLC7A11_ex
## TCGA-3Z-A93Z-01A-11D-A36X-10  385         0             0    33.2466
## TCGA-6D-AA2E-01A-11D-A36X-10  362         0             0    16.7296
## TCGA-A3-3308-01A-01D-0966-08   16         0             1   132.5840
## TCGA-A3-3311-01A-01D-0966-08 1191         1             0    52.6930
## TCGA-A3-3313-01A-01D-0966-08  735         1             0   565.8552
## TCGA-A3-3316-01A-01D-0966-08 1493         0             0    93.8776
##                              SLC7A11_gr_median
## TCGA-3Z-A93Z-01A-11D-A36X-10             FALSE
## TCGA-6D-AA2E-01A-11D-A36X-10             FALSE
## TCGA-A3-3308-01A-01D-0966-08              TRUE
## TCGA-A3-3311-01A-01D-0966-08              TRUE
## TCGA-A3-3313-01A-01D-0966-08              TRUE
## TCGA-A3-3316-01A-01D-0966-08              TRUE

## simplify variable names
ti <- kirc_small_unique$time
del <- kirc_small_unique$death_obs
z <- kirc_small_unique$SLC7A11_ex

## order data increasing with time
ords <- order(ti)
ti <- ti[ords]
del <- del[ords]
z <- z[ords]

## evaluate log likelihood at this beta
beta <- .1 

## the log likelihood is
s <- log(rev(cumsum(rev(exp(beta*z)))))
LL_vec <- beta*z - s
sum(LL_vec[del==1])

## [1] -5087.516

We need to be able to evaluate the log likelihood at many \(\beta\) (so we can find the maximum). So we functionalize the above code a bit.

## the log likelihood is
coxph_log_like <- function(beta,ti,del,z){
  s <- log(rev(cumsum(rev(exp(beta*z)))))
  LL_vec <- beta*z - s
  return(sum(LL_vec[del==1]))
}

We now plot the log likelihood over a range of beta values

## finding an appropriate grid for betas can be tricky
## could standardized predictor first (set to mean 0, sd=1)
betas <- seq(.0001,0.03,by=0.00001)
log_like <- vapply(betas,function(beta) coxph_log_like(beta,ti,del,z),c(0))
mle <- betas[which.max(log_like)]
plot(betas,log_like,ylim=c(-500,-440),log='x')
abline(v=mle)

print(paste("The MLE is:",mle))

## [1] "The MLE is: 0.00266"

Compare result to built in coxph function in survival package:

fit <- coxph(Surv(time,death_obs)~SLC7A11_ex,data=kirc_small_unique)
print(fit)

## Call:
## coxph(formula = Surv(time, death_obs) ~ SLC7A11_ex, data = kirc_small_unique)
## 
##                 coef exp(coef)  se(coef)   z        p
## SLC7A11_ex 0.0026560 1.0026596 0.0006478 4.1 4.13e-05
## 
## Likelihood ratio test=11.52  on 1 df, p=0.0006881
## n= 347, number of events= 89

In practice, the grid search which we did here is very computationally inefficient:

If you need \(p\) grid points with 1 parameter (p grid points), then \(p^2\) with parameters, \(p^3\) with 3 parameters, . . .
Typically find roots of derivative of log likelihood (score) using Newton Raphson, which requires computing matrix of second derivatives (Hessian). See book appendix A for details.
There are lots of possible challenges with optimization that we will not discuss in detail: local maxima, algorithm starting points, iteration step sizes, convergence criteria, etc.

Intuitive Derivation of Partial Likelihood

This is covered in Theoretical Notes 1 (partial likelihood derivation) and 2 (profile likelihood derivation) on page 257 of textbook. We discussed Note 2 in class.

Optional Exercises

Generalize the function coxph_log_like to use a 2 dimensional \(\beta\) vector and evaluate the likelihood on a grid, using both BAP1 and SLC7A11_ex as predictors. Find the maximizer of the likelihood on the grid. Verify your solution by using the coxph function in the survival package.
Rather than use a grid search, use one of the builtin optimization functions in R such as optim or optimize.