We need objective measures of model performance for two reasons:

- Select tuning parameters, e.g. \(\alpha\) and \(\lambda\) in elastic net Cox PH
- Compare two models, e.g.Â elastic net Cox PH and Survival Tree

Since we might wish to compare very different models (Cox PH and Survival Trees), we need metrics which are general. Since we may need to choose between many values of the tuning parameters, we need methods which are fast.

- Discrete Outcome Y
- Accuracy
- AUC
- Brier Score (probabilistic predictions)

- Continuous Outcome
- Mean Squared Error (MSE)
- Mean Absolute Error (MAD)

More generally one can choose a loss function and evaluate model by computing expected loss (risk).

Consider the following procedure:

- Fit the model on the data:
- \(Y \in \mathbb{R}^n\) is response
- \(X \in \mathbb{R}^{n \times p}\) are predictors
- \(\widehat{y} = \widehat{f}(x)\) is prediction of \(y\) at \(x\)
- e.g. \(\widehat{f}(x) = x^T\widehat{\beta}\)

- Measure error using \(Y\) and \(\widehat{Y}\)
- \(MSE(\widehat{\beta}) = \sum_{i=1}^n (y_i - \widehat{y}_i)^2\)

Problem:

- The error rate when evaluating the model on the same data used to fit model parameters is
**optimistic** - Favors
**more complex**models. Known as overfitting.

`library(glmnet)`

`## Loading required package: Matrix`

`## Loading required package: foreach`

`## Loaded glmnet 2.0-16`

```
## simulate data
n <- 200
p <- 200
x <- matrix(rnorm(n*p),ncol=p)
## 10 nonzero predictors
beta <- matrix(c(rep(1,10),rep(0,p-10)),ncol=1)
y <- as.vector(x%*%beta + rnorm(n))
fit <- glmnet(x,y)
length(fit$lambda)
```

`## [1] 90`

`head(fit$lambda)`

`## [1] 1.0915623 0.9945909 0.9062342 0.8257268 0.7523715 0.6855329`

```
## predict responses at every lambda
out <- predict(fit,newx=x)
dim(out)
```

`## [1] 200 90`

```
mse <- colMeans((out - y)^2)
## create new data from same model (same beta)
## and evaluate performance
xnew <- matrix(rnorm(n*p),ncol=p)
ynew <- as.vector(xnew%*%beta + rnorm(n))
out <- predict(fit,newx=xnew)
msenew <- colMeans((out - ynew)^2)
plot(log(fit$lambda),mse,xlim=log(c(0.01,.5)),ylim=c(0,4),
lwd=2,xlab="log(Lambda)",ylab="Mean Squared Error",type='l',xaxs='i')
points(log(fit$lambda),msenew,type='l',col='red',lwd=2)
legend("topleft",c("Training Error","Test Error"),col=1:2,lty=1,lwd=2)
abline(h=1)
```

In cross validation the model coefficient are estimated using K-1 parts of the data and then the model is assessed on the left-out-part. The model left out part is rotated so a total of \(K\) models are fit, for each value of the tuning parameter. The results are averaged across the \(K\) model fits and the tuning parameter with the best performance is chosen.