3/26/2019
Comprehensive Cancer Genomic Data Set
Image Source: https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism
Image Source: https://www.ncbi.nlm.nih.gov/probe/docs/applexpression/
Recall the Cox PH MLE with no ties:
\[ LL(\beta) = \sum_{i=1}^D (\beta^T Z_{(i)} - \log(\sum_{j \in R(t_i)} exp(\beta^T Z_j))) \] where \(t_1,\ldots,t_D\) denote the distinct death times, \(Z_{(i)}\) are covariates for individual who died at time \(t_i\), and \(R(t_i)\) is the set of individuals at risk at time \(t_i\).
Question: Suppose we attempt maximize the log likelihood for a data set where \(p > n\). What will happen?
\[ \beta_{MLE} = argmax_\beta LL(\beta) \]
Distribution of \(\sim 20,000\) pvalues for univariate Cox PH fits:
What are the "significantly" associated gene expressions?