Tune GAMs Based on Multiple Formulas Using the mlr3 Package

In machine learning, Generalized Additive Models (GAMs) are a popular choice for modeling complex relationships between variables. One of the key aspects of tuning a GAM is to choose an appropriate basis dimension (k) that best represents the smooth term in the model. In this article, we will explore how to tune multiple GAMs using different formulas and basis dimensions.

Problem Statement

The problem statement asks us to:

Tune a GAM based on several formulas associated with different combinations of k.
Use a grid search to accomplish this.
Address an error message related to the TunerGridSearch class not supporting a specific parameter type (ParamUty).

Background and Context

GAMs are a type of generalized linear model that use non-parametric regression to estimate the relationship between variables. The smooth term in a GAM is represented by a basis function, which can be chosen from various families such as polynomial, spline, or B-spline.

The mlr3 package provides an interface for working with machine learning models and hyperparameter tuning. In this article, we will use the classif.gam learner to fit a GAM model and tune its hyperparameters using a grid search.

Solution

To solve the problem, we need to modify the original code to:

Define the formula for each task separately.
Create a search space that includes the desired range of basis dimensions (k).
Use a suitable performance measure (e.g., AUC-ROC) to evaluate the models.

Here’s an updated example code:

library(mlr3verse)
library(tidyverse)

# Load the dataset and reduce it to the variables of interest
df <- tsk("sonar")$data()
df_red <- df %>% select("Class", "V15")

# Define the task (Class ~ V15)
task_sonar <- as_task_classif(
  df_red,
  target = "Class",
  id = "sonar"
)

# Define multiple tasks with different formulas
tasks <- list(
  task_sonar_1 <- as_task_classif(df_red, target = "Class", id = "sonar_1"),
  task_sonar_2 <- as_task_classif(df_red, target = "Class", id = "sonar_2")
)

# Define the learner and performance measure
learner <- lrn("classif.kknn", predict_type = "prob")
measure <- mlr3::msr("classif.auc")

# Create search spaces for each task with different basis dimensions (k)
search_spaces <- list(
  ps(k = p_int(1, 2)),
  ps(k = p_int(2, 4))
)

# Define the tuner and resampling scheme
tuner <- mlr3tuning::tnr("grid_search")
resampling <- rsmp("cv", folds = 5)
inner_resampling <- rsmp("cv", folds = 5)
outer_resampling <- rsmp("cv", folds = 5)

# Run automatic tuning for each task
at1 <- mlr3tuning::auto_tuner(
  tuner = tuner,
  learner = learner,
  resampling = inner_resampling,
  measure = measure,
  search_space = search_spaces[1],
  terminator = "none"
)
at2 <- mlr3tuning::auto_tuner(
  tuner = tuner,
  learner = learner,
  resampling = outer_resampling,
  measure = measure,
  search_space = search_spaces[2],
  terminator = "none"
)

# Train the models with the optimal hyperparameters
at1$train(task_sonar_1)
at2$train(task_sonar_2)

Explanation and Advice

In this updated example, we define multiple tasks with different formulas and create separate search spaces for each task. We use a grid search to find the optimal basis dimension (k) that results in the highest AUC-ROC score.

To address the error message related to TunerGridSearch, you can either:

Update your R version or install an older compatible version.
Use a different tuning method, such as random search or Bayesian optimization.

In general, when working with GAMs and hyperparameter tuning, it’s essential to carefully select the performance measure and resampling scheme to ensure that the models are properly evaluated and compared.

Last modified on 2023-08-04