Sequential Recommenders¶
The Sequential Recommenders module of WarpRec focuses on models that leverage the temporal order of user interactions to predict future behaviors. Unlike general recommenders, which often treat interactions as independent events, sequential models explicitly capture the dynamics of user preferences within a session or across time. These models are particularly effective for tasks such as next-item prediction in e-commerce or personalized content recommendation in streaming services.
API Reference
For class signatures, parameters, and source code, see the Sequential API Reference.
Summary of Available Sequential Models¶
| Category | Model | Description |
|---|---|---|
| CNN-Based | Caser | Convolutional model capturing local and global sequential patterns. |
| Markov-Chains | FOSSIL | Combines Markov Chains with factored item similarity for sequential prediction. |
| Neighborhood-Based | STAN | Session-based model using time- and position-weighted neighborhood matching. |
| RNN-Based | GRU4Rec | Session-based recommender using GRUs for short-term preference modeling. |
| NARM | Hybrid encoder (GRU + Attention) capturing sequential behavior and main purpose. | |
| Transformer-Based | BERT4Rec | Bidirectional Transformer model trained on a masked item prediction task. |
| BSARec | Bandlimited self-attention combining frequency filtering and Transformer attention. | |
| CL4SRec | SASRec backbone with sequence augmentations and contrastive learning. | |
| CORE | Unifies encoding/decoding spaces using linear combination of embeddings and RDM. | |
| DuoRec | Contrastive sequential model with unsupervised and supervised sequence-level CL. | |
| eSASRec | Enhanced SASRec with LiGR blocks and optional sampled-softmax training. | |
| gSASRec | General self-attention model for diverse and evolving user behaviors. | |
| LightSANs | Efficient self-attention with low-rank decomposition and decoupled positioning. | |
| LinRec | Linear attention mechanism (O(N)) for efficient long-term recommendation. | |
| SASRec | Transformer-inspired model learning short- and long-term user preferences. |
CNN-Based¶
CNN-based sequential recommenders apply convolutional operations to user interaction histories, treating sequences as structured data. These models can capture both short-term dependencies and long-term patterns through different convolutional filters.
Caser¶
Caser (Convolutional Sequence Embedding Recommendation): Treats a user's interaction history as a 2D "image" and applies horizontal and vertical convolutional filters. Caser models local patterns (short-term interests) as well as long-term user preferences, making it effective in session-based recommendation scenarios.
For further details, please refer to the paper.
models:
Caser:
embedding_size: 64
n_h: 8
n_v: 4
dropout_prob: 0.5
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 20
Markov-Chains¶
Some sequential recommenders combine Markovian assumptions with item similarity to balance short-term context with long-term personalization. These models are especially suited to scenarios where user behavior exhibits both immediate intent and broader preferences.
FOSSIL¶
FOSSIL (FactOrized Sequential Prediction with Item SImilarity ModeLs): Integrates a first-order Markov Chain for short-term user behavior with a factored item similarity model (inspired by SLIM) to address data sparsity and capture long-term dependencies.
For further details, please refer to the paper.
models:
FOSSIL:
embedding_size: 64
order_len: 8
alpha: 0.001
reg_weight: 0.001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
Neighborhood-Based¶
Neighborhood-based sequential recommenders identify similar historical sessions and use them to predict the next item. These models are non-neural and rely on weighted similarity between the current session and past sessions.
STAN¶
STAN (Sequence and Time Aware Neighborhood): A session-based recommendation model that scores candidate items by matching the current session against historical neighbor sessions. It incorporates three weighting factors: position decay (lambda_1) gives more importance to recently viewed items within a session, temporal decay (lambda_2) favors sessions that occurred closer in time, and co-occurrence temporal proximity (lambda_3) weights the relevance of shared items based on their temporal distance within sessions.
For further details, please refer to the paper.
RNN-Based¶
Recurrent Neural Networks (RNNs) process sequences step by step, maintaining a hidden state to capture temporal dependencies. They are effective for modeling evolving user interests within sessions.
GRU4Rec¶
GRU4Rec (Gated Recurrent Unit for Recommendation): One of the earliest deep learning approaches for session-based recommendation. It leverages GRUs to model user interaction sequences, focusing on short-term behavior and next-item prediction.
For further details, please refer to the paper.
models:
GRU4Rec:
embedding_size: 128
hidden_size: 64
num_layers: 2
dropout_prob: 0.1
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
NARM¶
NARM (Neural Attentive Session-based Recommendation): A hybrid encoder-decoder model that improves upon standard RNNs by incorporating an attention mechanism. It uses a Global Encoder (GRU) to model the user's sequential behavior and a Local Encoder (Attention) to capture the user's main purpose in the current session. These features are combined into a unified session representation for bi-linear matching with candidate items.
For further details, please refer to the paper.
models:
NARM:
embedding_size: 128
hidden_size: 64
n_layers: 2
hidden_dropout_prob: 0.1
attn_dropout_prob: 0.1
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
Transformer-Based¶
Transformer-inspired recommenders employ self-attention to capture dependencies across an entire sequence simultaneously. They excel at modeling both short-term and long-term user preferences without relying on recurrence or convolution.
BERT4Rec¶
BERT4Rec (Bidirectional Encoder Representations from Transformers for Recommendation): Applies a bidirectional Transformer architecture to sequential recommendation. Instead of predicting the next item, it is trained on a "cloze" task, where it predicts randomly masked items in a sequence, allowing it to learn context from both past and future interactions.
For further details, please refer to the paper.
models:
BERT4Rec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
mask_prob: 0.2
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
BSARec¶
BSARec (Bandlimited Self-Attention for Sequential Recommendation): Combines frequency-domain filtering with self-attention in each block. The model captures periodic behavior through FFT-based low/high-pass decomposition while preserving Transformer-based sequential dependency modeling.
For further details, please refer to the paper.
models:
BSARec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
alpha: 0.5
c: 10
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
CL4SRec¶
CL4SRec (Contrastive Learning for Sequential Recommendation): Extends a SASRec-style encoder with stochastic sequence augmentations (crop/mask/reorder) and an InfoNCE objective, jointly optimized with next-item prediction.
For further details, please refer to the paper.
models:
CL4SRec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
ssl_lambda: 0.1
tau: 1.0
sim_type: "dot"
crop_eta: 0.6
mask_gamma: 0.3
reorder_beta: 0.6
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
CORE¶
CORE (Consistent Representation Encoder): A session-based recommendation framework that unifies the representation space for both encoding and decoding. Unlike standard deep encoders that project session embeddings into a different space than item embeddings, CORE encodes sessions as a weighted sum of item embeddings (using a Transformer to learn the weights). It also employs Robust Distance Measuring (RDM) based on cosine similarity to prevent overfitting.
For further details, please refer to the paper.
models:
CORE:
embedding_size: 64
dnn_type: "trm"
n_layers: 2
n_heads: 8
inner_size: 256
hidden_dropout_prob: 0.1
attn_dropout_prob: 0.1
layer_norm_eps: 1e-12
session_dropout: 0.1
item_dropout: 0.1
temperature: 0.07
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 50
DuoRec¶
DuoRec (Dual Contrastive Sequential Recommendation): Uses a SASRec-like backbone and introduces two contrastive signals: unsupervised consistency across stochastic views of the same sequence, and supervised consistency across sequences sharing the same target item.
For further details, please refer to the paper.
models:
DuoRec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
ssl_type: "us_x"
ssl_lambda: 0.1
ssl_lambda_sem: 0.1
tau: 1.0
sim_type: "dot"
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
eSASRec¶
eSASRec (Enhanced SASRec): A modular enhancement of SASRec that supports LiGR blocks, sampled-softmax training, and optional mixed-negative sampling. It improves training efficiency while preserving strong sequential modeling performance.
For further details, please refer to the paper.
models:
eSASRec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
use_relative_pos: False
use_sampled_softmax: True
use_ligr: True
mn_ratio: 0.0
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
gSASRec¶
gSASRec (General Self-Attentive Sequential Recommendation): Extends SASRec by introducing general self-attention. This enables better modeling of diverse or evolving user interests.
For further details, please refer to the paper.
models:
gSASRec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
attn_dropout_prob: 0.1
gbce_t: 0.5
reuse_item_embeddings: True
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
LightSANs¶
LightSANs (Low-Rank Decomposed Self-Attention Networks): A sequential recommender that improves upon standard self-attention (like SASRec) by introducing low-rank decomposed self-attention to reduce complexity and decoupled position encoding to better model sequential relations.
For further details, please refer to the paper.
models:
LightSANs:
embedding_size: 128
n_layers: 2
n_heads: 8
k_interests: 5
inner_size: 512
dropout_prob: 0.5
attn_dropout_prob: 0.5
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
LinRec¶
LinRec (Linear Attention Mechanism for Long-term Sequential Recommender Systems): Proposed to address the quadratic complexity of standard Transformers. LinRec employs a linear attention mechanism (O(N)) using L2 normalization and ELU activation. This allows for efficient modeling of long-term user sequences while maintaining the ability to capture complex dependencies, making it significantly faster than standard SASRec on long sequences.
For further details, please refer to the paper.
models:
LinRec:
embedding_size: 128
n_layers: 2
n_heads: 8
inner_size: 512
dropout_prob: 0.1
reg_weight: 0.001
weight_decay: 0.0001
batch_size: 2048
epochs: 200
learning_rate: 0.001
neg_samples: 1
max_seq_len: 200
SASRec¶
SASRec (Self-Attentive Sequential Recommendation): A Transformer-based model that uses stacked self-attention blocks to capture item dependencies in user sequences. SASRec effectively models dynamic user preferences in sparse datasets, learning both short- and long-term interests.
For further details, please refer to the paper.