Filtering Configuration¶
The Filtering Configuration module defines the preprocessing strategies applied to the dataset before the splitting phase. Filtering is a fundamental step when the dataset contains redundant or low-quality interactions, or when its size exceeds the available computational resources.
By applying filters, WarpRec ensures that the resulting dataset is both computationally manageable and more representative of the target recommendation task.
General Configuration Format¶
Filtering strategies must be declared under the filtering section of the configuration file.
Each strategy is specified by name, followed by its parameters (if required).
filtering:
strategy_name_1:
arg_name_1: value_1
strategy_name_2:
arg_name_1: value_1
arg_name_2: value_2
...
Important
- Strategies are executed top to bottom in the exact order they are listed.
- Some strategies (e.g.,
MinRatingandUserAverage) are incompatible with implicit feedback datasets. - Incorrect strategy names or invalid parameter keys will cause WarpRec to raise an error.
Supported Filtering Strategies¶
WarpRec currently supports the following filtering strategies:
| Strategy | Category | Description |
|---|---|---|
MinRating |
Rating-based | Remove interactions below a rating threshold. |
UserAverage |
Rating-based | Remove interactions below each user's average rating. |
ItemAverage |
Rating-based | Remove interactions below each item's average rating. |
UserMin |
Frequency-based | Remove users with fewer than N interactions. |
UserMax |
Frequency-based | Remove users with more than N interactions. |
ItemMin |
Frequency-based | Remove items with fewer than N interactions. |
ItemMax |
Frequency-based | Remove items with more than N interactions. |
IterativeKCore |
Core-based | Iterative k-core filtering until convergence. |
NRoundsKCore |
Core-based | k-core filtering for a fixed number of rounds. |
UserHeadN |
Temporal | Retain the first N interactions per user. |
UserTailN |
Temporal | Retain the last N interactions per user. |
DropUser |
Entity removal | Remove specific users by ID. |
DropItem |
Entity removal | Remove specific items by ID. |
1. MinRating
Removes all interactions where the rating value is strictly below the specified threshold. Not compatible with implicit feedback datasets.
2. UserAverage
Removes all interactions for which the rating is below the corresponding user's average rating. Not applicable to implicit feedback scenarios.
3. ItemAverage
Removes all interactions for which the rating is below the corresponding item's average rating. Not applicable to implicit feedback scenarios.
4. UserMin
Removes all interactions involving users with fewer interactions than the given threshold.
5. UserMax
Removes all interactions involving users with more interactions than the given threshold. This is particularly useful for cold-start user analysis.
6. ItemMin
Removes all interactions involving items with fewer interactions than the given threshold.
7. ItemMax
Removes all interactions involving items with more interactions than the given threshold. Useful for analyzing cold-start item scenarios.
8. IterativeKCore
Applies UserMin and ItemMin iteratively until no further interactions can be removed
(i.e., until a stable state is reached).
9. NRoundsKCore
Applies UserMin and ItemMin for a fixed number of iterations.
This is a simplified variant of IterativeKCore that does not require full convergence.
Tip
IterativeKCore ensures dataset stability, but may be computationally expensive.
NRoundsKCore is recommended when deterministic runtime is preferred over convergence.
10. UserHeadN
Selects and retains the first N interactions for each user. If timestamps are available, interactions are sorted chronologically before selection. If no timestamps are provided, the original ordering of interactions is preserved.
11. UserTailN
Selects and retains the last N interactions for each user. If timestamps are available, interactions are sorted chronologically before selection. If no timestamps are provided, the original ordering of interactions is preserved.
12. DropUser
Filter out all interactions involving specific users identified by their user IDs.
13. DropItem
Filter out all interactions involving specific items identified by their item IDs.
Example Filtering Pipeline¶
The following example demonstrates a pipeline where:
- All ratings below 3.0 are removed.
- Users with fewer than 10 interactions are filtered out.