Skip to content

Sequential Recommenders

The Sequential Recommenders module of WarpRec focuses on models that leverage the temporal order of user interactions to predict future behaviors. Unlike general recommenders, which often treat interactions as independent events, sequential models explicitly capture the dynamics of user preferences within a session or across time. These models are particularly effective for tasks such as next-item prediction in e-commerce or personalized content recommendation in streaming services.

API Reference

For class signatures, parameters, and source code, see the Sequential API Reference.

Summary of Available Sequential Models

Category Model Description
CNN-Based Caser Convolutional model capturing local and global sequential patterns.
Markov-Chains FOSSIL Combines Markov Chains with factored item similarity for sequential prediction.
Neighborhood-Based STAN Session-based model using time- and position-weighted neighborhood matching.
RNN-Based GRU4Rec Session-based recommender using GRUs for short-term preference modeling.
NARM Hybrid encoder (GRU + Attention) capturing sequential behavior and main purpose.
Transformer-Based BERT4Rec Bidirectional Transformer model trained on a masked item prediction task.
BSARec Bandlimited self-attention combining frequency filtering and Transformer attention.
CL4SRec SASRec backbone with sequence augmentations and contrastive learning.
CORE Unifies encoding/decoding spaces using linear combination of embeddings and RDM.
DuoRec Contrastive sequential model with unsupervised and supervised sequence-level CL.
eSASRec Enhanced SASRec with LiGR blocks and optional sampled-softmax training.
gSASRec General self-attention model for diverse and evolving user behaviors.
LightSANs Efficient self-attention with low-rank decomposition and decoupled positioning.
LinRec Linear attention mechanism (O(N)) for efficient long-term recommendation.
SASRec Transformer-inspired model learning short- and long-term user preferences.

CNN-Based

CNN-based sequential recommenders apply convolutional operations to user interaction histories, treating sequences as structured data. These models can capture both short-term dependencies and long-term patterns through different convolutional filters.

Caser

Caser (Convolutional Sequence Embedding Recommendation): Treats a user's interaction history as a 2D "image" and applies horizontal and vertical convolutional filters. Caser models local patterns (short-term interests) as well as long-term user preferences, making it effective in session-based recommendation scenarios.

For further details, please refer to the paper.

models:
  Caser:
    embedding_size: 64
    n_h: 8
    n_v: 4
    dropout_prob: 0.5
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 20

Markov-Chains

Some sequential recommenders combine Markovian assumptions with item similarity to balance short-term context with long-term personalization. These models are especially suited to scenarios where user behavior exhibits both immediate intent and broader preferences.

FOSSIL

FOSSIL (FactOrized Sequential Prediction with Item SImilarity ModeLs): Integrates a first-order Markov Chain for short-term user behavior with a factored item similarity model (inspired by SLIM) to address data sparsity and capture long-term dependencies.

For further details, please refer to the paper.

models:
  FOSSIL:
    embedding_size: 64
    order_len: 8
    alpha: 0.001
    reg_weight: 0.001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

Neighborhood-Based

Neighborhood-based sequential recommenders identify similar historical sessions and use them to predict the next item. These models are non-neural and rely on weighted similarity between the current session and past sessions.

STAN

STAN (Sequence and Time Aware Neighborhood): A session-based recommendation model that scores candidate items by matching the current session against historical neighbor sessions. It incorporates three weighting factors: position decay (lambda_1) gives more importance to recently viewed items within a session, temporal decay (lambda_2) favors sessions that occurred closer in time, and co-occurrence temporal proximity (lambda_3) weights the relevance of shared items based on their temporal distance within sessions.

For further details, please refer to the paper.

models:
  STAN:
    k: 500
    lambda_1: 10
    lambda_2: 10
    lambda_3: 10
    max_seq_len: 20

RNN-Based

Recurrent Neural Networks (RNNs) process sequences step by step, maintaining a hidden state to capture temporal dependencies. They are effective for modeling evolving user interests within sessions.

GRU4Rec

GRU4Rec (Gated Recurrent Unit for Recommendation): One of the earliest deep learning approaches for session-based recommendation. It leverages GRUs to model user interaction sequences, focusing on short-term behavior and next-item prediction.

For further details, please refer to the paper.

models:
  GRU4Rec:
    embedding_size: 128
    hidden_size: 64
    num_layers: 2
    dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

NARM

NARM (Neural Attentive Session-based Recommendation): A hybrid encoder-decoder model that improves upon standard RNNs by incorporating an attention mechanism. It uses a Global Encoder (GRU) to model the user's sequential behavior and a Local Encoder (Attention) to capture the user's main purpose in the current session. These features are combined into a unified session representation for bi-linear matching with candidate items.

For further details, please refer to the paper.

models:
  NARM:
    embedding_size: 128
    hidden_size: 64
    n_layers: 2
    hidden_dropout_prob: 0.1
    attn_dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

Transformer-Based

Transformer-inspired recommenders employ self-attention to capture dependencies across an entire sequence simultaneously. They excel at modeling both short-term and long-term user preferences without relying on recurrence or convolution.

BERT4Rec

BERT4Rec (Bidirectional Encoder Representations from Transformers for Recommendation): Applies a bidirectional Transformer architecture to sequential recommendation. Instead of predicting the next item, it is trained on a "cloze" task, where it predicts randomly masked items in a sequence, allowing it to learn context from both past and future interactions.

For further details, please refer to the paper.

models:
  BERT4Rec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    mask_prob: 0.2
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

BSARec

BSARec (Bandlimited Self-Attention for Sequential Recommendation): Combines frequency-domain filtering with self-attention in each block. The model captures periodic behavior through FFT-based low/high-pass decomposition while preserving Transformer-based sequential dependency modeling.

For further details, please refer to the paper.

models:
  BSARec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    alpha: 0.5
    c: 10
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

CL4SRec

CL4SRec (Contrastive Learning for Sequential Recommendation): Extends a SASRec-style encoder with stochastic sequence augmentations (crop/mask/reorder) and an InfoNCE objective, jointly optimized with next-item prediction.

For further details, please refer to the paper.

models:
  CL4SRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    ssl_lambda: 0.1
    tau: 1.0
    sim_type: "dot"
    crop_eta: 0.6
    mask_gamma: 0.3
    reorder_beta: 0.6
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

CORE

CORE (Consistent Representation Encoder): A session-based recommendation framework that unifies the representation space for both encoding and decoding. Unlike standard deep encoders that project session embeddings into a different space than item embeddings, CORE encodes sessions as a weighted sum of item embeddings (using a Transformer to learn the weights). It also employs Robust Distance Measuring (RDM) based on cosine similarity to prevent overfitting.

For further details, please refer to the paper.

models:
  CORE:
    embedding_size: 64
    dnn_type: "trm"
    n_layers: 2
    n_heads: 8
    inner_size: 256
    hidden_dropout_prob: 0.1
    attn_dropout_prob: 0.1
    layer_norm_eps: 1e-12
    session_dropout: 0.1
    item_dropout: 0.1
    temperature: 0.07
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 50

DuoRec

DuoRec (Dual Contrastive Sequential Recommendation): Uses a SASRec-like backbone and introduces two contrastive signals: unsupervised consistency across stochastic views of the same sequence, and supervised consistency across sequences sharing the same target item.

For further details, please refer to the paper.

models:
  DuoRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    ssl_type: "us_x"
    ssl_lambda: 0.1
    ssl_lambda_sem: 0.1
    tau: 1.0
    sim_type: "dot"
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

eSASRec

eSASRec (Enhanced SASRec): A modular enhancement of SASRec that supports LiGR blocks, sampled-softmax training, and optional mixed-negative sampling. It improves training efficiency while preserving strong sequential modeling performance.

For further details, please refer to the paper.

models:
  eSASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    use_relative_pos: False
    use_sampled_softmax: True
    use_ligr: True
    mn_ratio: 0.0
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

gSASRec

gSASRec (General Self-Attentive Sequential Recommendation): Extends SASRec by introducing general self-attention. This enables better modeling of diverse or evolving user interests.

For further details, please refer to the paper.

models:
  gSASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    gbce_t: 0.5
    reuse_item_embeddings: True
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

LightSANs

LightSANs (Low-Rank Decomposed Self-Attention Networks): A sequential recommender that improves upon standard self-attention (like SASRec) by introducing low-rank decomposed self-attention to reduce complexity and decoupled position encoding to better model sequential relations.

For further details, please refer to the paper.

models:
  LightSANs:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    k_interests: 5
    inner_size: 512
    dropout_prob: 0.5
    attn_dropout_prob: 0.5
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

LinRec

LinRec (Linear Attention Mechanism for Long-term Sequential Recommender Systems): Proposed to address the quadratic complexity of standard Transformers. LinRec employs a linear attention mechanism (O(N)) using L2 normalization and ELU activation. This allows for efficient modeling of long-term user sequences while maintaining the ability to capture complex dependencies, making it significantly faster than standard SASRec on long sequences.

For further details, please refer to the paper.

models:
  LinRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

SASRec

SASRec (Self-Attentive Sequential Recommendation): A Transformer-based model that uses stacked self-attention blocks to capture item dependencies in user sequences. SASRec effectively models dynamic user preferences in sparse datasets, learning both short- and long-term interests.

For further details, please refer to the paper.

models:
  SASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200