# Evaluating hyperparameter with Cross Validation

In machine learning we develop algorithms to map from predictive variables to an output variable which we wish to predict using unseen data. When building models we can use hyperparameters of the algorithms to increase model performance.

Lasso regression, which puts a penalty on large model coefficients (see James et al. text book “Introduction to Staticial Learning” for more details), is an example of an algorithm using hyperparameters, to control and find the best amount of shrinkage. Briefly, shrinkage decreases model variance in exchange for some increased model bias, and thus limiting overfitting of a model.

To figure out the appropriate hyperparameter, in this case the regularization parameter $$\lambda$$ (lambda), cross-validation is applied. Cross-validation is a method to simulate the model performance by separating all available data into $$k$$ folds, train a model on all but one fold (test fold), where the whitholded fold (test set) is used consecutively to evaluate the models performance using the model which is trained by all other folds. This approach gives us an impression on the model performance on unseen data. In the iterative process of cross-validation, each of the $$k$$ folds serves as the test set, so that we will go through the process of model fitting $$k$$ times. For model evaluation performance metrics are used, e.g. the residual mean squared error (RMSE) in a regression problem. As a result we get $$k$$ test set metrics, which we subsequently average after $$k$$ iterations yielding the final model performance metric.

This cross-validation process is done for each hyperparameter, in lasso regression it is a set of proposed regularization values $$\lambda$$.

For each proposed hyperparameter, the metric is calculated using the test set, resp. hold-out fold, as described above. The hyperparameter suppling the best metric, e.g. lowest RMSE, is used to create the final model.

Before doing so, we will conduct model training using the lasso regression to get the best hyperparameter of $$\lambda$$. We will use the caretpackage to train a model predicting Income using different variable of the ILSL::Credit dataset, e.g. credit card limit, age and further more.

data <- ISLR::Credit %>%
select(-ID) %>%
mutate(random_noise = rnorm(n = nrow(.), sd = 10))

y <- data\$Income
# creating a matrix with predictors
X <- model.matrix(Income ~ ., data)

# Setup the training process
# We will use 10-fold crossvalidation
# Thus, we will get 10 test set metrics which we eventually average to get
# the model performance parameter using a specifi value of lambda

# 10 fold cross validation
# Define training option with trainControl
tr_ctrl <- caret::trainControl(method = "cv", number = 5)

# Create a grid of proposed lambda values
# Alpha is set to one, which yields lasso regression
# We use CV to get the appropriate lambda value
hyperparams <- expand.grid(lambda = c(seq(0.1, 4, .1)), alpha = 1)

set.seed(124193)
# Model Training (Metric: Rsquared)
cv_models <- caret::train(x = X,
y = y,
family = "gaussian",
metric = "RMSE",
method = "glmnet",
trControl = tr_ctrl,
tuneGrid = hyperparams)

# Extracting the best hyperparameters (yielded lowest  RMSE)
best_tune <- pluck(cv_models, "bestTune")

# Plot Regularization Parameter Lambda against RMSE (y-axis)
plot(cv_models) # Finalizing the model

We see in the plot that the cross validated RMSE is lowest when $$\lambda$$=0.1, this hyperparameter value should be used in our final model.

In the next section we will use the glmnet function from the glmnet packages which allows us to create a regression model with the specific alpha value.

# Setting alpha to 1 yielding lasso regression
# Setting the regularization parameter lambda to best_tune, the value yielding lowest RMSE in the cross-validation procedure
final_model <- glmnet::glmnet(x = X[, -1], y = y, family = "gaussian", alpha = 1, lambda = best_tune)

# Get final model coefficients
broom::tidy(final_model)  %>%
select(term, estimate) %>%
mutate_at("estimate", round, 2) %>%
knitr::kable()
term estimate
(Intercept) -49.33
Limit 0.02
Rating 0.13
Cards 1.16
Education -0.13
GenderFemale -1.24
StudentYes 39.81
MarriedYes -0.05
EthnicityAsian 0.96
EthnicityCaucasian 0.06
Balance -0.09
random_noise -0.08
coefs <- coef(final_model)

# comuting RSquared value
# predict
y_hat <- matrix(X, ncol = ncol(X)) %*% as.matrix(coefs, nrow = 6)

# R-Squared
r_sq <- 1 - (sum((y_hat[, 1] - y)^2) / sum((y - mean(y))^2))

Following the hyperparameter tuning of lambda, the regularization parameter of the Lasso-Regression we eventually fitted the final model using the hyperparameter lambda which yielded the highest cross-validated R-Squared statistic; In this example the value of lambda was $$\lambda$$ = 0.1 with a final RMSE value of 10.72, however we do not bother about this value, since we estimated the R-Squared statistic for unseen data using cross-validation, and we except the model to performe as indicated by the cross-validation procedure.

# Bottomline

The bottom line of this post is, that we use cross-validation to investigate possible hyperparameters and their performance on unseen data, simulating this scenario by partioning the dataset into folds, withholding a fold and use this fold as the test set.

We compare the cross-validated metric of each cross-validation process and choose the one with the best metric value, e.g. lowest RMSE.

We then use all the data in our dataset to fit the final model. This final model will be used to predict future data. In our case our model estimates the income of people with variable such as age, gender and others.