Configuration Structure¶
This section provides a comprehensive overview of WarpRec's configuration system. WarpRec configurations allow users to customize data loading, preprocessing, model training, evaluation, and general experiment settings. Proper configuration ensures reproducibility, optimal performance, and easy management of multiple experiments. WarpRec's configuration is divided into several main sections, each one has its dedicated description with all the keywords available and their behavior.
Pipeline Configurations¶
WarpRec supports multiple pipelines that leverage the configuration system differently depending on the purpose of the experiment:
- Training Pipeline: Executes full workflow including hyperparameter optimization (HPO), model training, evaluation, and result saving.
- Swarm Pipeline: Executes the same workflow as the training pipeline every trial will be processed in parallel saturating every resource possible in the Ray cluster.
- Design Pipeline: Focuses on testing and evaluating models without HPO. This pipeline is ideal for rapid prototyping and design experiments.
- Evaluation Pipeline: Evaluates pre-trained models without training, using provided checkpoints.
- Estimate Pipeline: Estimates runtime and memory requirements with lightweight profiling before executing a full experiment.
A single configuration file can be used across multiple pipelines; WarpRec ensures that workflows remain interchangeable, with some sections being ignored or interpreted differently depending on the pipeline.
Training Pipeline¶
The training pipeline is the core of the framework. It executes a complete experiment with all the provided models, performs hyperparameter optimization (HPO), and generates reports on model performance. The workflow requires the following sections:
- reader
- writer
- splitter
- models
- evaluation
Optionally, you can also provide:
- filtering
- dashboard
- general
A minimal training configuration example:
reader:
loading_strategy: dataset
data_type: transaction
reading_method: local
local_path: path/to/your/dataset.tsv
rating_type: implicit
writer:
dataset_name: MyDataset
writing_method: local
local_experiment_path: experiment/test/
splitter:
test_splitting:
strategy: temporal_holdout
ratio: 0.1
models:
ItemKNN:
k: 10
similarity: cosine
evaluation:
top_k: [10, 20, 50]
metrics: [nDCG, Precision, Recall, HitRate]
Run the training pipeline with:
Info
The swarm pipeline expects the same configuration file as the training pipeline. You can start the swarm pipeline with:
Design Pipeline¶
The design pipeline is used for rapid evaluation and testing of models. It does not execute HPO, and requires models to have single-value hyperparameters. The workflow requires the following sections:
- reader
- splitter
- models
- evaluation
Optionally, you can also provide:
- filtering
- general
An example use of the design pipeline is testing a custom implementation. Here is a configuration example:
reader:
loading_strategy: dataset
data_type: transaction
reading_method: local
local_path: path/to/your/dataset.tsv
rating_type: explicit
splitter:
test_splitting:
strategy: temporal_holdout
ratio: 0.1
models:
# Models in the design pipeline must have single-value hyperparameters
CustomBPR:
embedding_size: 32
weight_decay: 0.
batch_size: 1024
epochs: 10
learning_rate: 0.0001
evaluation:
top_k: [10, 20, 50]
batch_size: 1024
metrics: [nDCG, Precision, Recall, HitRate]
general:
custom_models: [my_custom_model.py]
Run the design pipeline with:
Evaluation Pipeline¶
The evaluation pipeline is used for rapid evaluation of models. It does not train the models but rather evaluate them, using (optionally) pre-trained checkpoints. The workflow requires the following sections:
- reader
- writer
- splitter
- models
- evaluation
Optionally, you can also provide:
- filtering
- general
An example use of the evaluation pipeline is to evaluate a pre-trained model. Here is a configuration example:
reader:
loading_strategy: dataset
data_type: transaction
reading_method: local
local_path: path/to/your/dataset.tsv
rating_type: explicit
splitter:
test_splitting:
strategy: temporal_holdout
ratio: 0.1
models:
BPR:
meta:
load_from: path/to/checkpoint.pth
embedding_size: 512
reg_weight: 1e-4
batch_size: 4096
epochs: 200
learning_rate: 1e-3
evaluation:
top_k: [10, 20, 50]
batch_size: 1024
metrics: [nDCG, Precision, Recall, HitRate]
Run the evaluation pipeline with:
Estimate Pipeline¶
The estimate pipeline is used to approximate execution cost before running a full experiment. It supports the same dataset loading and evaluation setup as the evaluation pipeline, but adds an estimate section to control how many batches are sampled for timing. The workflow requires the following sections:
- reader
- writer
- splitter
- models
- evaluation
- estimate
Optionally, you can also provide:
- filtering
- general
An example use of the estimate pipeline is to compare expected cost across multiple candidate models. Here is a configuration example:
reader:
loading_strategy: dataset
data_type: transaction
reading_method: local
local_path: path/to/your/dataset.tsv
rating_type: implicit
writer:
dataset_name: MyEstimateRun
writing_method: local
local_experiment_path: experiment/test/
splitter:
test_splitting:
strategy: temporal_holdout
ratio: 0.1
models:
ItemKNN:
k: [50, 100, 200]
similarity: cosine
BPR:
embedding_size: [64, 128]
reg_weight: [0.0001, 0.001]
batch_size: 4096
epochs: 100
learning_rate: [grid, 0.0005, 0.001]
evaluation:
top_k: [10, 20]
metrics: [nDCG, Precision, Recall]
estimate:
warmup_batches: 10
train_batches: 100
eval_batches: 100
Run the estimate pipeline with: