Configuration Structure¶

This section provides a comprehensive overview of WarpRec's configuration system. WarpRec configurations allow users to customize data loading, preprocessing, model training, evaluation, and general experiment settings. Proper configuration ensures reproducibility, optimal performance, and easy management of multiple experiments. WarpRec's configuration is divided into several main sections, each one has its dedicated description with all the keywords available and their behavior.

Pipeline Configurations¶

WarpRec supports multiple pipelines that leverage the configuration system differently depending on the purpose of the experiment:

Training Pipeline: Executes full workflow including hyperparameter optimization (HPO), model training, evaluation, and result saving.
Swarm Pipeline: Executes the same workflow as the training pipeline every trial will be processed in parallel saturating every resource possible in the Ray cluster.
Design Pipeline: Focuses on testing and evaluating models without HPO. This pipeline is ideal for rapid prototyping and design experiments.
Evaluation Pipeline: Evaluates pre-trained models without training, using provided checkpoints.
Estimate Pipeline: Estimates runtime and memory requirements with lightweight profiling before executing a full experiment.

A single configuration file can be used across multiple pipelines; WarpRec ensures that workflows remain interchangeable, with some sections being ignored or interpreted differently depending on the pipeline.

Training Pipeline¶

The training pipeline is the core of the framework. It executes a complete experiment with all the provided models, performs hyperparameter optimization (HPO), and generates reports on model performance. The workflow requires the following sections:

reader
writer
splitter
models
evaluation

Optionally, you can also provide:

filtering
dashboard
general

A minimal training configuration example:

reader:
    loading_strategy: dataset
    data_type: transaction
    reading_method: local
    local_path: path/to/your/dataset.tsv
    rating_type: implicit
writer:
    dataset_name: MyDataset
    writing_method: local
    local_experiment_path: experiment/test/
splitter:
    test_splitting:
        strategy: temporal_holdout
        ratio: 0.1
models:
    ItemKNN:
        k: 10
        similarity: cosine
evaluation:
    top_k: [10, 20, 50]
    metrics: [nDCG, Precision, Recall, HitRate]

Run the training pipeline with:

python -m warprec.run --config path/to/the/config.yml --pipeline train

Info

The swarm pipeline expects the same configuration file as the training pipeline. You can start the swarm pipeline with:

python -m warprec.run --config path/to/the/config.yml --pipeline swarm

Design Pipeline¶

The design pipeline is used for rapid evaluation and testing of models. It does not execute HPO, and requires models to have single-value hyperparameters. The workflow requires the following sections:

reader
splitter
models
evaluation

Optionally, you can also provide:

filtering
general

An example use of the design pipeline is testing a custom implementation. Here is a configuration example:

reader:
    loading_strategy: dataset
    data_type: transaction
    reading_method: local
    local_path: path/to/your/dataset.tsv
    rating_type: explicit
splitter:
    test_splitting:
        strategy: temporal_holdout
        ratio: 0.1
models:
    # Models in the design pipeline must have single-value hyperparameters
    CustomBPR:
        embedding_size: 32
        weight_decay: 0.
        batch_size: 1024
        epochs: 10
        learning_rate: 0.0001
evaluation:
    top_k: [10, 20, 50]
    batch_size: 1024
    metrics: [nDCG, Precision, Recall, HitRate]
general:
    custom_models: [my_custom_model.py]

Run the design pipeline with:

python -m warprec.run --config path/to/the/config.yml --pipeline design

Evaluation Pipeline¶

The evaluation pipeline is used for rapid evaluation of models. It does not train the models but rather evaluate them, using (optionally) pre-trained checkpoints. The workflow requires the following sections:

reader
writer
splitter
models
evaluation

Optionally, you can also provide:

filtering
general

An example use of the evaluation pipeline is to evaluate a pre-trained model. Here is a configuration example:

reader:
    loading_strategy: dataset
    data_type: transaction
    reading_method: local
    local_path: path/to/your/dataset.tsv
    rating_type: explicit
splitter:
    test_splitting:
        strategy: temporal_holdout
        ratio: 0.1
models:
    BPR:
        meta:
            load_from: path/to/checkpoint.pth
        embedding_size: 512
        reg_weight: 1e-4
        batch_size: 4096
        epochs: 200
        learning_rate: 1e-3
evaluation:
    top_k: [10, 20, 50]
    batch_size: 1024
    metrics: [nDCG, Precision, Recall, HitRate]

Run the evaluation pipeline with:

python -m warprec.run --config path/to/the/config.yml --pipeline eval

Estimate Pipeline¶

The estimate pipeline is used to approximate execution cost before running a full experiment. It supports the same dataset loading and evaluation setup as the evaluation pipeline, but adds an estimate section to control how many batches are sampled for timing. The workflow requires the following sections:

reader
writer
splitter
models
evaluation
estimate

Optionally, you can also provide:

filtering
general

An example use of the estimate pipeline is to compare expected cost across multiple candidate models. Here is a configuration example:

reader:
    loading_strategy: dataset
    data_type: transaction
    reading_method: local
    local_path: path/to/your/dataset.tsv
    rating_type: implicit
writer:
    dataset_name: MyEstimateRun
    writing_method: local
    local_experiment_path: experiment/test/
splitter:
    test_splitting:
        strategy: temporal_holdout
        ratio: 0.1
models:
    ItemKNN:
        k: [50, 100, 200]
        similarity: cosine
    BPR:
        embedding_size: [64, 128]
        reg_weight: [0.0001, 0.001]
        batch_size: 4096
        epochs: 100
        learning_rate: [grid, 0.0005, 0.001]
evaluation:
    top_k: [10, 20]
    metrics: [nDCG, Precision, Recall]
estimate:
    warmup_batches: 10
    train_batches: 100
    eval_batches: 100

Run the estimate pipeline with:

python -m warprec.run --config path/to/the/config.yml --pipeline estimate