Skip to content

Writers

The WarpRec data writing module provides a unified interface to persist the results and artifacts generated during the recommendation pipeline. It is designed to be flexible and extensible, allowing users to save data to different destinations, including:

  • Local files
  • Azure Blob Storage

Key features include:

  • Automatic saving of results and artifacts to the configured destination.
  • Support for different output formats (e.g., CSV, TSV).

This design allows WarpRec to maintain consistency, traceability, and interoperability, facilitating the sharing of experimental outcomes and the integration with external systems. The backend is selected through configuration; the directory structure and output format are identical regardless of the destination. When using Azure Blob Storage, all artifacts are automatically uploaded as blobs within the specified container.

API Reference

For class signatures and parameters, see the Data Management API Reference.


Experiment Directory Structure

When an experiment starts, WarpRec automatically initializes a dedicated, timestamped directory for the run. This directory houses all generated outputs and follows the structure below:

experiment_dir/
├── evaluation/
├── params/
├── ray_results/
├── recs/
├── serialized/
├── split/
└── config.json

Each file and subdirectory is timestamped to ensure reproducibility and allow multiple runs within the same environment without overwriting results.

Important

The ray_results/ directory is created by the Ray library and may contain large files. Repetitive experiments can lead to significant disk space usage. It is advisable to periodically clean up this directory.

Directory Contents

Directory/File Description
evaluation/ Contains model evaluation results, timing summaries, and estimate reports generated by evaluation or estimate workflows.
params/ Stores hyperparameter configurations found during the Hyperparameter Optimization (HPO) phase.
recs/ Contains the final recommendations produced by each model (if enabled).
serialized/ Stores the serialized (pickled) versions of trained models (if serialization is enabled).
split/ Contains the dataset splits (train/validation/test) used for the experiment (if enabled).
config.json A fully resolved configuration file for the experiment. This includes user-specified values and automatically set default parameters, ensuring precise tracking of the configuration used for each run.

Evaluation Reporting Artifacts

Within the evaluation/ directory, WarpRec produces timestamped reporting artifacts for both standard evaluation runs and estimate runs:

Overall Results

A tabular file (Overall_Results) summarizing the evaluation metrics per model and cutoff \(\mathbf{k}\):

Model       Top@k   Metric_1    Metric_2    ...
My_Model    10      0.001       0.002       ...
...

Note

  • One row is produced per \(\mathbf{(model, k)}\) combination.
  • Metrics reflect the configured evaluation protocol (e.g., full vs. sampled).

Time Report

A timing and environment summary file (Time_Report) with wall-clock measurements:

  • Data Preparation Time: End-to-end data preparation (splitting, filtering, evaluation sampling if enabled).
  • Hyperparameter Exploration Time: Total HPO duration, including Ray cluster bootstrap/initialization overhead.
  • Average Trial Time: Mean training time per trial, including end-of-epoch evaluation.
  • Evaluation Time: Time to evaluate the best model; depends on the evaluation strategy ("full" or "sampled").
  • Inference Time (ms): Time to predict for \(\mathbf{1,000 users \times 1,000 items}\); intended as a comparative guideline.
  • Total Time: Aggregate end-to-end duration for the model's experiment pipeline.

Unless otherwise specified, durations are wall-clock; inference time is reported in milliseconds.

Estimate Report

The Estimate Pipeline writes an Estimate_Report file with per-model resource and timing projections aggregated across all successful setups:

  • Setup Count: Number of discrete setups that were successfully estimated for the model.
  • Measured Train Batches / Measured Eval Batches: Number of batches actually sampled during lightweight profiling.
  • Estimated Train Epoch Time / Estimated Total Train Time: Extrapolated training-time projections for iterative models.
  • Estimated Evaluation Time: Extrapolated evaluation-time projection.
  • Train/Eval RAM and VRAM summaries: Minimum, average, maximum, and standard deviation of sampled memory usage.
  • Notes: Additional context, such as CPU-only execution or analytical space estimates for non-iterative models.

Dataset Split Structure

The split/ directory contains the dataset partitions generated by WarpRec, if dataset splitting is enabled. The structure typically follows this pattern:

experiment_dir/
├── split/
|   ├── train.tsv
|   ├── validation.tsv
|   ├── test.tsv
|   ├── 1/
|   |   ├── train.tsv
|   |   ├── validation.tsv
|   ├── 2/
|   |   ├── train.tsv
|   |   ├── validation.tsv
...
└── config.json

Note

  • The output format (e.g., TSV) is configurable.
  • At the top level of the split/ directory, you will find the primary train/validation/test split.
  • If validation folds are generated (e.g., for cross-validation), subdirectories named 1/, 2/, etc., are created. Each contains its own train/validation split.