Skip to content

Splitter Configuration

The Splitter Configuration module defines how a dataset is partitioned into training, validation, and test sets prior to model training. Proper splitting is crucial to build a reliable evaluation pipeline and ensure fair comparison between models.

WarpRec provides multiple splitting strategies that can be tailored to your experimental needs, including:

  • Temporal strategies
  • Random strategies
  • Timestamp slicing
  • K-Fold Cross Validation

Important

  • Temporal strategies require that timestamps are present in the dataset loaded by the reader.
  • Test set is required for train and design pipelines; validation set is optional.

Supported Splitting Strategies

Strategy Type Description
temporal_holdout Temporal Split by timestamp; most recent interactions become test set.
temporal_leave_k_out Temporal Leave the last K interactions per user for testing.
random_holdout Random Randomly sample a ratio of interactions for the test set.
random_leave_k_out Random Randomly leave K interactions per user for testing.
timestamp_slicing Temporal Split at a specific timestamp boundary.
k_fold_cross_validation Cross-validation K-fold partitioning (validation only).

1. Temporal Holdout

Orders interactions by timestamp and reserves a portion of the most recent interactions as the test set.

splitter:
  test_splitting:
    strategy: temporal_holdout
    ratio: 0.1

2. Temporal Leave-K-Out

Orders interactions by timestamp and leaves exactly K interactions per user for the test set. Users with fewer than K interactions remain entirely in the training set.

splitter:
  test_splitting:
    strategy: temporal_leave_k_out
    k: 1

3. Random Holdout

Randomly selects a portion of interactions to include in the test set.

splitter:
  test_splitting:
    strategy: random_holdout
    ratio: 0.1

4. Random Leave-K-Out

Randomly selects K interactions per user to include in the test set. Users with fewer than K interactions remain entirely in the training set.

splitter:
  test_splitting:
    strategy: random_leave_k_out
    k: 1

5. Timestamp Slicing

Splits the dataset based on a specific timestamp:

  • Interactions before the timestamp -> training set
  • Interactions after the timestamp -> test set

WarpRec also supports the special keyword best to automatically select an optimal timestamp.

splitter:
  test_splitting:
    strategy: timestamp_slicing
    timestamp: 10009287 | best

6. K-Fold Cross Validation

Partitions the dataset into K folds, using K-1 folds for training and the remaining fold for validation. The process is repeated K times to exhaust all possible fold combinations.

splitter:
  validation_splitting:
    strategy: k_fold_cross_validation
    folds: 10

Note

  • This strategy is applicable only for validation sets, not test sets.
  • Provides less biased and more accurate evaluation metrics, but requires additional training time.

Example Splitter Configuration

This example demonstrates a full configuration:

  • Test set split using temporal holdout (10% of the most recent interactions)
  • Validation set using 10-fold cross validation
splitter:
  test_splitting:
    strategy: temporal_holdout
    ratio: 0.1
  validation_splitting:
    strategy: k_fold_cross_validation
    folds: 10