Training Pipeline¶
The Training Pipeline is the main experimental pipeline for full-scale experiments. It leverages Ray Tune for distributed hyperparameter optimization, supports cross-validation, computes statistical significance tests between models, serializes trained model checkpoints, and produces comprehensive result reports.
When to Use¶
- Running full benchmark experiments with HPO
- Comparing multiple models with statistical significance testing
- Producing reproducible experimental results with persistent artifacts
- Scaling training to multi-GPU or multi-node clusters
Prerequisites¶
A Ray cluster must be running before invoking the Training Pipeline:
For multi-node clusters, connect worker nodes with:
For shared machines, restrict GPU visibility:
Note
These are some examples of Ray usage. For a more complex setup, refer to the Ray documentation
Configuration¶
The Training Pipeline requires all configuration sections: reader, writer, splitter, models, and evaluation.
reader:
loading_strategy: dataset
data_type: transaction
reading_method: local
local_path: path/to/my/dataset.tsv
rating_type: implicit
writer:
dataset_name: MyBenchmark
writing_method: local
local_experiment_path: experiments/
splitter:
test_splitting:
strategy: temporal_holdout
ratio: 0.1
validation_splitting:
strategy: temporal_holdout
ratio: 0.1
dashboard:
codecarbon:
enabled: true
save_to_file: true
output_dir: ./carbon_reports/
models:
LightGCN:
optimization:
strategy: hopt
scheduler: asha
device: cuda
cpu_per_trial: 4
gpu_per_trial: 1
num_samples: 20
early_stopping:
monitor: score
patience: 10
grace_period: 5
embedding_size: [64, 128, 256]
n_layers: [2, 3, 4]
reg_weight: [uniform, 0.0001, 0.01]
batch_size: 4096
epochs: 200
learning_rate: [uniform, 0.0001, 0.01]
evaluation:
top_k: [10, 20, 50]
metrics: [nDCG, Precision, Recall, HitRate]
validation_metric: nDCG@10
strategy: full
stat_significance:
wilcoxon_test: true
corrections:
bonferroni: true
fdr: true
alpha: 0.05
Running¶
Execution Flows¶
The Training Pipeline supports three data split scenarios:
1. Train/Test (no validation split)
When only test_splitting is configured, HPO runs on the test split directly. Use this for simple experiments where a separate validation set is not needed.
2. Train/Validation/Test
When both test_splitting and validation_splitting are configured, HPO runs on the validation split, and the best model is evaluated on the held-out test set. This is the recommended setup.
3. Cross-Validation
When the splitting strategy is k_fold_cross_validation, the pipeline trains and evaluates across all k folds. For each model, HPO runs on each validation fold, and the best hyperparameters are selected based on the average validation metric across folds.
HPO Strategies¶
| Strategy | Description |
|---|---|
grid |
Exhaustive grid search over all hyperparameter combinations. |
random |
Random sampling from the search space. |
hopt |
Bayesian optimization via HyperOpt (Tree-structured Parzen Estimators). |
optuna |
Bayesian optimization via Optuna with advanced pruning. |
bohb |
Bayesian Optimization and HyperBand for combined exploration and early stopping. |
Schedulers:
| Scheduler | Description |
|---|---|
fifo |
First-In-First-Out. Runs all trials to completion. |
asha |
Asynchronous Successive Halving. Aggressively prunes underperforming trials based on intermediate results. |
Search Space Syntax¶
Hyperparameter search spaces are defined inline in the YAML configuration:
| Syntax | Meaning |
|---|---|
[grid, 64, 128] |
Iterates over all values in the list (no random sampling). |
[choice, 64, 128, 256] |
Samples a random element from the list. |
[uniform, 0.0, 1.0] |
Samples a float uniformly between min and max. |
[quniform, 0.0, 1.0, 0.1] |
Samples uniformly, but quantized (rounded) to the nearest q. |
[loguniform, 1e-4, 1e-1] |
Samples a float from a log-uniform distribution. |
[qloguniform, 1e-4, 1e-1, 5e-5] |
Log-uniform sampling, quantized to the nearest q. |
[randn, 0.0, 1.0] |
Samples from a normal distribution with mean and std. |
[qrandn, 0.0, 1.0, 0.1] |
Normal distribution, quantized to the nearest q. |
[randint, 1, 10] |
Samples an integer uniformly between min and max. |
[qrandint, 1, 10, 2] |
Integer uniform sampling, quantized to the nearest q. |
[lograndint, 1, 100] |
Samples an integer from a log-uniform distribution. |
[qlograndint, 1, 100, 2] |
Log-uniform integer sampling, quantized to the nearest q. |
Note
The default value for each search space is always grid for grid search scenarios or choice for optimization strategies.
Output Artifacts¶
The Training Pipeline persists the following artifacts via the Writer module:
- Results: Metric scores for the best configuration of each model (CSV).
- Per-user metrics: Granular per-user scores (CSV) when
per_user: trueis configured. - Recommendations: Top-K recommendation lists for each user (CSV).
- Model checkpoints: Serialized model weights (
.pth) for the best configuration. - Hyperparameters: Optimal hyperparameters per model (JSON).
- Statistical significance: Paired test results and correction tables (CSV).
- Time reports: Execution timing and CodeCarbon energy reports (when enabled).