Models Configuration¶
The Models Configuration module defines how each model in your experiment should be trained. WarpRec allows flexible configuration of training settings, including hyperparameter search, scheduling, and resource management.
This section is divided into several nested sections to provide detailed control over model training:
- meta: Meta parameters affecting model initialization and checkpoint handling.
- optimization: Hyperparameter optimization settings using Ray Tune.
- early_stopping: Optional strategy to stop trials that reach a plateau.
- parameters: Model-specific parameters.
Meta Parameters¶
The meta section allows controlling aspects of the model that do not directly interfere with training:
- save_model: Whether to save the model in the experiment directory. Defaults to
False. - save_recs: Whether to save generated recommendations. Defaults to
False. - load_from: Path to pre-trained model weights to load. Defaults to
None.
Optimization Configuration¶
The optimization section defines how hyperparameter optimization is performed:
-
strategy: Optimization strategy. Defaults to
grid. Supported strategies:grid: Exhaustive search across the entire search space.random: Random search within the search space.hopt: HyperOpt algorithm for efficient exploration.optuna: Optuna algorithm for efficient exploration.bohb: BOHB algorithm for efficient exploration.
-
scheduler: Scheduling algorithm for trials. Defaults to
fifo. Supported schedulers:fifo: First In First Out.asha: ASHA scheduler for optimized early stopping and trial pruning.
-
lr_scheduler: Scheduling algorithm to adjust the learning rate at run time. Defaults to
None. - optimizer: Optimizer to use during the training process. Defaults to
None. - properties: Nested section for strategy and scheduler parameters.
- device: Training device, e.g.,
cpuorcuda. Overrides global device. - cpu_per_trial: Number of CPU cores allocated per trial. Defaults to
1. - gpu_per_trial: Number of GPUs allocated per trial. Defaults to
0. - custom_resources_per_trial: A dictionary containing custom resources to request per trial during optimization. Defaults to an empty dictionary.
- max_concurrent_trials: Maximum number of trials allowed to run concurrently. Defaults to
None, in which case WarpRec estimates a safe cap from the current Ray cluster resources. - label_selector: A dictionary containing a set of labels with respective rules.
- num_workers: Number of worker processes for data loading. Defaults to
None(main process). - block_size: Number of items to predict at once for efficiency. Defaults to
50. - checkpoint_to_keep: Number of checkpoints to retain in Ray. Defaults to
5.
Advanced Resource Management & Node Affinity
WarpRec allows strict control over cluster scheduling through logical resources and label selectors.
1. Logical Resource Constraints (custom_resources_per_trial)
Use this to prevent Out-Of-Memory (OOM) errors by treating RAM as a consumable logical resource. First, provision the Ray node exposing its memory capacity:
2. Node Affinity via Label Selectors (label_selector)
Use this to enforce execution on specific hardware or environments (e.g., isolating development from production, or targeting specific GPU architectures). Provision a node with custom labels:
env=dev.
Tip: Ray automatically injects hardware labels. You can use label_selector: {"ray.io/accelerator-type": "A100"} to target specific GPU architectures without manual node labeling. For more details, refer to the Ray Scheduling Documentation.
LR Scheduler Section¶
Within WarpRec standard pipelines, you can use a learning rate scheduler to increase your model performance. To do so, you can pass the following parameters under the lr_scheduler configuration block:
- name: Name of the scheduler (e.g., StepLR, ReduceLROnPlateau).
- params: A dictionary of parameters expected by the specific scheduler.
An example of this configuration could be something like this:
For further details about the scheduling algorithms and their parameters, you can check the original PyTorch Guide.
Optimizer Section¶
Within WarpRec standard pipelines, you customize the optimizer used during training to fit your need. To do so, you can pass the following parameters under the optimizer configuration block:
- name: Name of the optimizer (e.g., Adam, AdamW).
- params: A dictionary of parameters expected by the specific optimizer.
An example of this configuration could be something like this:
For further details about the optimizers and their parameters, you can check the original PyTorch Guide.
Tip
You can combine the use of a learning rate scheduler and an optimizer to boost model performance. Choosing the right optimizer or scheduler is a critical factor in achieving optimal convergence, directly impacting the effectiveness and stability of the learning process.
Properties Section¶
The properties subsection provides additional parameters to the optimization strategy or scheduler:
- mode: Whether to maximize or minimize the validation metric. Accepted values:
min/max. Defaults tomax. - desired_training_it: Defines the number of iterations for final training after cross-validation. Strategies:
median,mean,min,max. Defaults tomedian. - seed: Random seed for reproducibility. Defaults to
42. - time_attr: Attribute used to measure time in the scheduler.
- max_t: Maximum time units per trial.
- grace_period: Minimum time units per trial.
- reduction_factor: ASHA scheduler reduction rate.
Early Stopping¶
The early_stopping section optionally adds stopping criteria for each trial:
- monitor: Metric to monitor, e.g.,
score(validation metric) orloss. - patience: Consecutive evaluations without improvement before stopping. Required if early stopping is enabled.
- grace_period: Minimum number of evaluations before early stopping can trigger.
- min_delta: Minimum change to consider as an improvement.
ASHA Scheduler for Efficient Trial Pruning
When running a large hyperparameter search (e.g., with random, optuna, or hopt), many trials will perform poorly from the start. Instead of waiting for all trials to finish, you can use the ASHA (Asynchronous Successive Halving Algorithm) scheduler to aggressively terminate bad trials early and allocate resources only to the most promising ones.
To use ASHA, you must define the scheduler as asha and provide the required parameters inside the properties block: max_t, grace_period, and reduction_factor.
Configuration Example:
models:
LightGCN:
optimization:
strategy: optuna
num_samples: 100
scheduler: asha
properties:
time_attr: training_iteration # The metric used to track time/progress
max_t: 200 # Maximum iterations a trial can run
grace_period: 20 # Minimum iterations before pruning begins
reduction_factor: 3.0 # Halving rate (keeps top 1/3 of trials)
# Model parameters
embedding_size: [choice, 64, 128, 256]
n_layers: [choice, 1, 2, 3]
learning_rate: [loguniform, 1e-5, 1e-2]
epochs: 200
How this works in practice:
time_attr: training_iteration: Tells the scheduler to evaluate the progress of the trials based on the number of training epochs/iterations completed.grace_period: 20: Every single trial is guaranteed to run for at least 20 iterations. This prevents the scheduler from killing a model that just has a slow start.reduction_factor: 3.0: At iteration 20, the scheduler compares all running trials. Only the top 33% (1/3) of the trials are allowed to continue. The bottom 66% are permanently stopped. This process repeats at iterations 60 (20 * 3) and 180 (60 * 3).max_t: 200: The absolute maximum number of iterations any trial is allowed to reach. This should generally match your model'sepochsparameter.
Result: By pruning unpromising trials early, ASHA allows you to test 100 configurations in a fraction of the time and compute cost it would take using the standard fifo scheduler.
Example Model Configuration¶
In this section, we provide examples illustrating how to define the appropriate configuration for your experiment.
Basic Configuration¶
The simplest way to define the model configuration is by directly specifying the parameters. Grid search is the default optimization strategy, so for each parameter, you can provide a list of values to explore, and WarpRec will manage the process automatically.
The following example demonstrates a basic grid search using the EASE model:
An in depth configuration might include a model with more parameters and early stopping:
models:
BPR:
early_stopping:
patience: 20
grace_period: 10
embedding_size: [64, 128, 256]
reg_weight: [0., 0.001, 1e-6]
batch_size: [512, 1024, 2048, 4096]
epochs: 300
learning_rate: [0.001, 1e-4, 1e-5]
Note
- Each model requires a separate configuration.
- Trials of the same model can run in parallel; multiple models are trained sequentially.
- Model parameters depend on the specific algorithm; consult the Recommenders Documentation.
Advanced Configuration¶
For advanced users, WarpRec provides support for sophisticated hyperparameter tuning and search space exploration, enabling efficient hyperparameter optimization and distributed experimentation.
Let's start from a really simple model configuration:
models:
LightGCN:
embedding_size: 64
n_layers: 2
reg_weight: 0.0001
batch_size: 512
epochs: 50
learning_rate: 0.001
This executes a grid search over a single parameter combination, effectively training just one model. Next, we will extend this example to explore a more comprehensive grid search:
models:
LightGCN:
early_stopping:
patience: 20
grace_period: 10
embedding_size: [64, 128, 256]
n_layers: [1, 2, 3]
reg_weight: [0., 1e-6]
batch_size: [512, 1024, 2048]
epochs: 200
learning_rate: [0.001, 1e-4, 1e-5]
This configuration produces a total of 3 x 3 x 2 x 3 x 3 = 162 trials. Depending on the dataset size and available resources, the exploration may require some time. To optimize performance, you can leverage WarpRec's parallelization capabilities by adding the following to the configuration:
models:
LightGCN:
optimization:
cpu_per_trial: 4
gpu_per_trial: 0.25
early_stopping:
patience: 20
grace_period: 10
embedding_size: [64, 128, 256]
n_layers: [1, 2, 3]
reg_weight: [0., 1e-6]
batch_size: [512, 1024, 2048]
epochs: 200
learning_rate: [0.001, 1e-4, 1e-5]
With this setup, you can train up to 4 models at a time (if only 1 GPU is available), though this change will require more computational resources.
Search Space Configuration¶
Advanced search algorithms (HyperOpt, Optuna) allow fine-grained exploration of hyperparameters. WarpRec supports multiple search spaces:
uniform/quniform: Uniform distribution and quantized uniform distribution.loguniform/qloguniform: Logarithmic uniform distribution and quantized logarithmic uniform distribution.randn/qrandn: Random normal and quantized random normal.randint/qrandint: Random integers and quantized random integers.lograndint/qlograndint: Logarithmic random integers.choice: Default for discrete options.grid: Default for exhaustive grid search.
Structure of parameter sampling in WarpRec
Each parameter is defined as a list where:
search_space(str) - Name of the search space (e.g.'uniform','qrandint','loguniform').min(float/int) - Minimum value of the sampling range.max(float/int) - Maximum value of the sampling range.quantization(optional, float/int) - Step size for quantized spaces (e.g.'qrandint','qloguniform'). Only used for quantized spaces.log_base(optional, int) - Base of the logarithm for log-scaled spaces (e.g.'loguniform','qloguniform'). Only used for log spaces.
The following examples illustrate how to sample values from these search spaces:
param_1: ['uniform', 0.0, 1.0]
param_2: ['qrandint', 10, 500, 5]
param_3: ['qloguniform', 0.0, 1.0, 0.005, 2]
Let's now use the sampling spaces to create a more complex HPO and have more control over the parameter space:
models:
LightGCN:
optimization:
cpu_per_trial: 4
gpu_per_trial: 0.25
validation_metric: Recall@5
strategy: hopt
num_samples: 100
early_stopping:
patience: 20
grace_period: 10
embedding_size: [qrandint, 64, 320, 64]
n_layers: [1, 2, 3]
weight_decay: [uniform, 0.0, 1e-6]
batch_size: [qrandint, 512, 10240, 512]
epochs: 200
learning_rate: [uniform, 1e-6, 1e-3]
This configuration performs hyperparameter optimization over 100 potential parameter combinations for the LightGCN model, executing 4 trials in parallel and applying early stopping.