Dirichlet Prior with Right Censoring

In last class we derived the conjugacy of the Dirichlet prior for discrete probability mass functions with no censoring. Today we discuss the case with right censoring. Here the posterior is no longer conjugate with the prior so we use Metropolis Hastings Markov Chain Monte Carlo (MCMC) so draw samples from the posterior distribution.

Let \(X_1,\ldots,X_n \sim_{iid} f_X\) with \(f_X\) some discrete distribution. Without loss of generality we assume \(f_X\) support is on \(\{1,\ldots,K\}\). Censoring times are \(C_1,\ldots,C_n \sim_{iid} f_C\) with \(X_i \perp C_i\). Our goal is to estimate \(f_X\). We observe censored sample \((T_i,\delta_i)\) where \(T_i = \min(X_i,C_i)\) and \(\delta_i = \mathbb{1}_{X_i \leq C_i}\).

We use a \(\pi(p) = Dirichlet(\alpha)\) prior on \(f_X\). The posterior is

\[ \begin{align*} \pi(p | (T,\delta)) &\propto f((T,\delta)|p)\pi(p)\\ &= \prod_{i=1}^n f(T_i|p)^{\delta_i} S(T_i|p)^{1-\delta_i} \pi(p)\\ &\propto \left(\prod_{i=1}^n p_{T_i}^{\delta_i} \left(\sum_{j=T_i + 1}^K p_j\right)^{1-\delta_i}\right) \prod_{j=1}^K p_j^{\alpha_j - 1}\\ &= \prod_{j=1}^K \left(\sum_{l=j+1}^K p_l\right)^{\sum_{i=1}^n \mathbb{1}_{\delta_i=0,T_i=j}}p_j^{\alpha_j + \sum_{i=1}^n \mathbb{1}_{T_i=j,\delta_i=1}-1} \end{align*} \] Defining \(d_j = \alpha_j + \sum_{i=1}^n \mathbb{1}_{T_i=j,\delta_i=1}-1\) and \(c_j = \sum_{i=1}^n \mathbb{1}_{\delta_i=0,T_i=j}\) we have

\[ \pi(p|(T,\delta)) \propto \prod_{j=1}^k \left(\sum_{l=j+1}^K p_l\right)^{c_j}p_j^{d_j} \]

Metropolis-Hastings: Small Example

The Metropolis Hastings MCMC sampler allows us to sample from the posterior distribution even when we only know the posterior up to a constant. For illustration, we consider a simple discrete distribution:

set.seed(18042019)
K <- 4
fx <- (K:1)/10
plot(1:K,fx)

We generate a right censored sample from this distribution.

n <- 50
x <- sample(1:K,replace=TRUE,size=n,prob=fx)
ce <- sample(1:(2*K),replace=TRUE,size=n)
ti <- pmin(x,ce)
del <- 1*(x <= ce)
head(cbind(ti,del),10)
##       ti del
##  [1,]  1   1
##  [2,]  1   1
##  [3,]  2   1
##  [4,]  2   1
##  [5,]  2   1
##  [6,]  4   1
##  [7,]  2   1
##  [8,]  1   0
##  [9,]  1   1
## [10,]  2   1
## dirichlet prior
alpha <- rep(1,K)

## data into cj, dj form
cj <- rep(0,K)
names(cj) <- 1:K
temp <- table(ti[del==0])
cj[names(temp)] <- temp


dj <- alpha - 1
names(dj) <- 1:K
temp <- table(ti[del==1])
dj[names(temp)] <- dj[names(temp)] + temp


## proportional to posterior distribution
## dj[j] = alphaj + # deaths at j - 1
## cj[j] = # censoring at j
logpost <- function(p,dj,cj){
  ## return 0 likelihood if p outside simplex
  sp <- sum(p)
  if(sp > 1 | any(p < 0)){
    return(-Inf)
  }
  p <- c(p,1-sp)
  pcs <- c(rev(cumsum(rev(p)))[-1],1)
  return(sum(cj*log(pcs) + dj*log(p)))
}

We run the MCMC:

Niter <- 10000
pchain <- matrix(0,nrow=Niter,ncol=K-1)
pchain[1,] <- 1/K
for(ii in 1:(Niter-1)){
  prop <- pchain[ii,] + rnorm(K-1,mean=0,sd=0.01)
  if((exp(logpost(prop,dj,cj) - logpost(pchain[ii,],dj,cj))) > runif(1)){
    pchain[ii+1,] = prop
  } else {
    pchain[ii+1,] = pchain[ii,]
  }
}

The initial elements of the chain do not sample from the posterior accurately and are known as burn-in. This can be seen by making a plot of the chain. We discard those values.

## view entire
plot(pchain[,1:2],col="#00000030",type='l')
points(fx[1],fx[2],col='red',pch=19,cex=2)