```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(knitr.duplicate.label = 'allow')
```
## Cox Proportional Hazards with Ties
There are several versions of the Cox partial likelihood when ties are present in death times (multiple deaths at the same time). These are discussed in Section 8.4 of the textbook. We review the Breslow method (8.4.1) here.
Let $t_1,\ldots,t_D$ denote the distinct death times, $d_i$ the number of deaths at time $t_i$, and $D_i$ the set of individuals who died at time $t_i$. Let $s_i = \sum_{j \in D_i} Z_j$. Then the partial likelihood, derived by profiling out the non-parametric component of the hazard, is
$$
L_1(\beta) = \prod_{i=1}^D \frac{exp(\beta^T s_i)}{\left(\sum_{j \in R(i)} exp(\beta^T Z_j)\right)^{d_i}}
$$
The maximum (partial) likelihood estimator is $\widehat{\beta} = argmax_{\beta} L_1(\beta)$. As with the case with no ties, this likelihood cannot be analytically maximized. However we can numerically maximize the likelihood through a grid search, Newton's method, or other optimization techniques. These optimization techniques are typically applied to $\log L_1$ (since the log is a monotonic transform, maximizer of $L_1$ is the same as the maximizer of $\log L_1$).
The book discusses two other partial likelihood for cases with ties.
## Assessment of Cox Model Fit with Categorical Predictors
### Idea
A general approach to assess the quality of model $A$ is to do the following:
* Fit a model $B$ which is more flexible than model $A$ (e.g. contains model $A$ as a submodel)
* Assess whether the fit with model $B$ is similar to the fit with model $A$. If yes, then model $A$ seems reasonable.
For the Cox proportional hazards model, the baseline hazard is unspecified, but the relationship between hazards is controlled by the $\beta$ parameter. For the case with a categorical covariate, we can fit a model with separate non-parametric harards to each group and assess whether the relationship between these hazards is consistent with the Cox Proportional hazards assumption.
The mathematical implementation of this idea is as follows. Suppose we have a single covariate $Z$ with two possible values $0$ or $1$ and we fit a Cox PH model of the standard form
$$
\lambda(t|Z) = \lambda_0(t)e^{\beta Z}
$$
This implies a cumulative hazard function of
$$
\Lambda(t|Z) = e^{\beta Z} \int_{s=0}^t \lambda_0(s) ds
$$
So the the log cumulative hazard ratio (under the Cox PH model) between the two groups $Z=1$ and $Z=0$ is a constant
$$
\log \frac{\Lambda(t|Z=1)}{\Lambda(t|Z=0)} = \beta
$$
To determine if this assumption is reasonable, we fit Nelson--Aalen models to estimate $\Lambda(t|Z=1)$ and $\Lambda(t|Z=0)$ and compute their log ratio. If the resulting function appears approximately constant, then the Cox PH model may be reasonable.
### Example Data Set
This is example 8.4 in the textbook.
```{r kidney_load}
library(KMsurv)
library(survival)
data(kidney,package="KMsurv")
head(kidney)
```
We fit the Cox proportional hazards model using the Breslow method for partial likelihood.
```{r coxphfit}
?coxph
fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit
```
We assess the fit of the Cox model by computing non-parametric estimates for the cumulative hazard for each treatment (each value of `type`). The Cox model does not appear realistic.
```{r coxphcheck, eval=TRUE}
## fit survival functions using fleming-harrington
## this is same as Nelson-Aalen for cumulative hazard
fit <- survfit(Surv(time,delta)~type,data=kidney,
type="fleming-harrington")
## separate times and survival functions for two groups
ti1 <- fit$time[1:fit$strata[1]]
ti2 <- fit$time[(fit$strata[1]+1):length(fit$time)]
s1 <- fit$surv[1:fit$strata[1]]
s2 <- fit$surv[(fit$strata[1]+1):length(fit$time)]
## interpolate so survival functions
## evaluated on same grid
lg <- max(min(ti1),min(ti2))
rg <- min(max(ti1),max(ti2))
g <- seq(lg,rg,length.out=100)
s1g <- approx(ti1,s1,xout=g,method="constant",f=0)
s2g <- approx(ti2,s2,xout=g,method="constant",f=0)
## survival functions to cumulative hazards
g1H <- -log(s1g$y)
g2H <- -log(s2g$y)
## log hazard ratio
plot(g,log(g2H/g1H),type='l',xlab="Time",
ylab="log(H(t|Z=1)/H(t|Z=0))",
main="Figure 8.1 in Textbook")
```
### Cox versus Kaplan Meier / Nelson-Aalen with Categorical Predictors
In the above example, the Cox model was a poor fit. Kaplan-Meier or Nelson-Aalen is more appropriate for this data set.
**Cox Model Advantages**
* Fewer parameters, more power. Especially important with limited data.
* Parameter $\beta$ provides simple summary of covariate effect.
**Nelson--Aalen / Kaplan--Meier**
* More flexibility.
## Global and Local Tests
The book discusses Wald, Likelihood Ratio, and Score tests in Section 8.5. We focus here on Wald tests. The likelihood ratio is the default in `coxph` function in R, but the output allows you to run Wald tests.
For the hypothesis
$$
\begin{align*}
&H_0: \beta = \beta_0\\
&H_a: \beta \neq \beta_0
\end{align*}
$$
the Wald test statistic is
$$
T_W = (\widehat{\beta} - \beta_0)^T \widehat{I}(\widehat{\beta})(\widehat{\beta} - \beta_0)
$$
where $\widehat{I}(\widehat{\beta}) = -\frac{\partial^2}{\partial \beta^2} \log L(\beta)|_{\beta=\widehat{\beta}}$ is the observed Fisher information (the negative Hessian of the log likelihood at $\widehat{\beta}$, alternatively the inverse of the variance of $\widehat{\beta}$). $T_W$ has a $\chi_q^2$ (q is dimension of $\beta$) distribution under $H_0$. We can reject the test at level $\alpha$ if the test statistic exceeds the $1-\alpha$ quantile of a $\chi^2_q$ distribution.
In the 1 dimensional case with $\beta_0=0$, the test statistic simplifies to
$$
T_W = \frac{\widehat{\beta}^2}{\widehat{\sigma}^2}
$$
where $\widehat{\sigma^2}$ is the variance estimate of $\widehat{\beta}$.
We consider the 1-dimensional Kidney data (for experimental purposes only, recall that the Cox PH model does not fit this data well) and run a Wald test with $H_0: \beta=0$.
```{r coxph_tests}
fit <- coxph(Surv(time,delta)~type,data=kidney,ties="breslow")
fit
str(fit)
```
The Wald test statistic is given in `fit$wald.test` and can also be obtained from the coefficient estimates and variance. The resulting p-value is similar to the likelihood ratio p-value.
```{r coxph_wald}
## wald test statistic
coef(fit)^2/fit$var
## wald test statistic
fit$wald.test
## wald p-value, similar to likelihood ratio p-value
## which can be see by print(fit)
pchisq(fit$wald.test,df=1,lower.tail=FALSE)
```
### Global and Local Tests
Suppose $\beta \in \mathbb{R}^4$. We may want to test whether the first three parameters are equal to $0$. This is
$$
\begin{align*}
&H_0: \beta_1=\beta_2=\beta_3=0\\
&H_a: Not \, \, H_0
\end{align*}
$$
We simply subset the MLE $\widehat{\beta}$ and inverse observed Fisher information $\widehat{I}(\widehat{\beta})^{-1}$ to the relevant parameter subset (in this case the first 3 elements of $\widehat{\beta}$ and the upper $3 \times 3$ portion of $\widehat{I}(\widehat{\beta})^{-1}$).
We apply these ideas to the `larynx` data set, using age and cancer stage as predictors of survival.
```{r larynx_stage}
data(larynx)
head(larynx)
fit <- coxph(Surv(time,delta)~as.factor(stage)+age,
data=larynx,ties="breslow")
fit
```
We conduct a global Wald test that all coefficients equal $0$.
```{r larynx_stage_global_wald}
fit$wald.test
fit$var
coef(fit)
co_mat <- matrix(coef(fit),ncol=1)
t(co_mat)%*%solve(fit$var)%*%co_mat
pchisq(fit$wald.test,df=4,lower.tail=FALSE)
```
We now test whether the first three parameters are 0 (ignoring age). This is equivalent to testing for no effect due to cancer stage.
```{r larynx_stage_local_wald}
co_mat <- co_mat[1:3]
var_mat <- fit$var[1:3,1:3]
tw <- t(co_mat)%*%solve(var_mat)%*%co_mat
tw
co_mat
var_mat
pchisq(tw,df=3,lower.tail=FALSE)
dim(larynx)
```
## Optional Exercises
* Find a better way to reproduce Textbook Figure 8.1 by using some simple existing R functions which does not require manually putting survival / hazard functions on same time grid.
* Write a function to compute the Cox partial likelihood with ties using Breslow's formula in 1-dimension. Compute and plot this function on a grid in a neighorhood of $\widehat{\beta}$ for the 1-dimensional `kidney` data set. Verify that your function is correct (the MLE returned by `coxph` should be at the maximum of your function).
```{r makedotRfile, echo=FALSE, include=FALSE}
## automatrically generates .R file when running knitr
knitr::purl("02coxph.Rmd")
```