Skip to content

WarpRec

Sequential

Sequential Recommenders¶

The Sequential Recommenders module of WarpRec focuses on models that leverage the temporal order of user interactions to predict future behaviors. Unlike general recommenders, which often treat interactions as independent events, sequential models explicitly capture the dynamics of user preferences within a session or across time. These models are particularly effective for tasks such as next-item prediction in e-commerce or personalized content recommendation in streaming services.

API Reference

For class signatures, parameters, and source code, see the Sequential API Reference.

Summary of Available Sequential Models¶

Category	Model	Description
CNN-Based	Caser	Convolutional model capturing local and global sequential patterns.
Markov-Chains	FOSSIL	Combines Markov Chains with factored item similarity for sequential prediction.
Neighborhood-Based	STAN	Session-based model using time- and position-weighted neighborhood matching.
RNN-Based	GRU4Rec	Session-based recommender using GRUs for short-term preference modeling.
	NARM	Hybrid encoder (GRU + Attention) capturing sequential behavior and main purpose.
Transformer-Based	BERT4Rec	Bidirectional Transformer model trained on a masked item prediction task.
	BSARec	Bandlimited self-attention combining frequency filtering and Transformer attention.
	CL4SRec	SASRec backbone with sequence augmentations and contrastive learning.
	CORE	Unifies encoding/decoding spaces using linear combination of embeddings and RDM.
	DuoRec	Contrastive sequential model with unsupervised and supervised sequence-level CL.
	eSASRec	Enhanced SASRec with LiGR blocks and optional sampled-softmax training.
	gSASRec	General self-attention model for diverse and evolving user behaviors.
	LightSANs	Efficient self-attention with low-rank decomposition and decoupled positioning.
	LinRec	Linear attention mechanism (O(N)) for efficient long-term recommendation.
	SASRec	Transformer-inspired model learning short- and long-term user preferences.

CNN-Based¶

CNN-based sequential recommenders apply convolutional operations to user interaction histories, treating sequences as structured data. These models can capture both short-term dependencies and long-term patterns through different convolutional filters.

Caser¶

Caser (Convolutional Sequence Embedding Recommendation): Treats a user's interaction history as a 2D "image" and applies horizontal and vertical convolutional filters. Caser models local patterns (short-term interests) as well as long-term user preferences, making it effective in session-based recommendation scenarios.

For further details, please refer to the paper.

models:
  Caser:
    embedding_size: 64
    n_h: 8
    n_v: 4
    dropout_prob: 0.5
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 20

Markov-Chains¶

Some sequential recommenders combine Markovian assumptions with item similarity to balance short-term context with long-term personalization. These models are especially suited to scenarios where user behavior exhibits both immediate intent and broader preferences.

FOSSIL¶

FOSSIL (FactOrized Sequential Prediction with Item SImilarity ModeLs): Integrates a first-order Markov Chain for short-term user behavior with a factored item similarity model (inspired by SLIM) to address data sparsity and capture long-term dependencies.

For further details, please refer to the paper.

models:
  FOSSIL:
    embedding_size: 64
    order_len: 8
    alpha: 0.001
    reg_weight: 0.001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

Neighborhood-Based¶

Neighborhood-based sequential recommenders identify similar historical sessions and use them to predict the next item. These models are non-neural and rely on weighted similarity between the current session and past sessions.

STAN¶

STAN (Sequence and Time Aware Neighborhood): A session-based recommendation model that scores candidate items by matching the current session against historical neighbor sessions. It incorporates three weighting factors: position decay (lambda_1) gives more importance to recently viewed items within a session, temporal decay (lambda_2) favors sessions that occurred closer in time, and co-occurrence temporal proximity (lambda_3) weights the relevance of shared items based on their temporal distance within sessions.

For further details, please refer to the paper.

models:
  STAN:
    k: 500
    lambda_1: 10
    lambda_2: 10
    lambda_3: 10
    max_seq_len: 20

RNN-Based¶

Recurrent Neural Networks (RNNs) process sequences step by step, maintaining a hidden state to capture temporal dependencies. They are effective for modeling evolving user interests within sessions.

GRU4Rec¶

GRU4Rec (Gated Recurrent Unit for Recommendation): One of the earliest deep learning approaches for session-based recommendation. It leverages GRUs to model user interaction sequences, focusing on short-term behavior and next-item prediction.

For further details, please refer to the paper.

models:
  GRU4Rec:
    embedding_size: 128
    hidden_size: 64
    num_layers: 2
    dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

NARM¶

NARM (Neural Attentive Session-based Recommendation): A hybrid encoder-decoder model that improves upon standard RNNs by incorporating an attention mechanism. It uses a Global Encoder (GRU) to model the user's sequential behavior and a Local Encoder (Attention) to capture the user's main purpose in the current session. These features are combined into a unified session representation for bi-linear matching with candidate items.

For further details, please refer to the paper.

models:
  NARM:
    embedding_size: 128
    hidden_size: 64
    n_layers: 2
    hidden_dropout_prob: 0.1
    attn_dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

Transformer-Based¶

Transformer-inspired recommenders employ self-attention to capture dependencies across an entire sequence simultaneously. They excel at modeling both short-term and long-term user preferences without relying on recurrence or convolution.

BERT4Rec¶

BERT4Rec (Bidirectional Encoder Representations from Transformers for Recommendation): Applies a bidirectional Transformer architecture to sequential recommendation. Instead of predicting the next item, it is trained on a "cloze" task, where it predicts randomly masked items in a sequence, allowing it to learn context from both past and future interactions.

For further details, please refer to the paper.

models:
  BERT4Rec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    mask_prob: 0.2
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

BSARec¶

BSARec (Bandlimited Self-Attention for Sequential Recommendation): Combines frequency-domain filtering with self-attention in each block. The model captures periodic behavior through FFT-based low/high-pass decomposition while preserving Transformer-based sequential dependency modeling.

For further details, please refer to the paper.

models:
  BSARec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    alpha: 0.5
    c: 10
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

CL4SRec¶

CL4SRec (Contrastive Learning for Sequential Recommendation): Extends a SASRec-style encoder with stochastic sequence augmentations (crop/mask/reorder) and an InfoNCE objective, jointly optimized with next-item prediction.

For further details, please refer to the paper.

models:
  CL4SRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    ssl_lambda: 0.1
    tau: 1.0
    sim_type: "dot"
    crop_eta: 0.6
    mask_gamma: 0.3
    reorder_beta: 0.6
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

CORE¶

CORE (Consistent Representation Encoder): A session-based recommendation framework that unifies the representation space for both encoding and decoding. Unlike standard deep encoders that project session embeddings into a different space than item embeddings, CORE encodes sessions as a weighted sum of item embeddings (using a Transformer to learn the weights). It also employs Robust Distance Measuring (RDM) based on cosine similarity to prevent overfitting.

For further details, please refer to the paper.

models:
  CORE:
    embedding_size: 64
    dnn_type: "trm"
    n_layers: 2
    n_heads: 8
    inner_size: 256
    hidden_dropout_prob: 0.1
    attn_dropout_prob: 0.1
    layer_norm_eps: 1e-12
    session_dropout: 0.1
    item_dropout: 0.1
    temperature: 0.07
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 50

DuoRec¶

DuoRec (Dual Contrastive Sequential Recommendation): Uses a SASRec-like backbone and introduces two contrastive signals: unsupervised consistency across stochastic views of the same sequence, and supervised consistency across sequences sharing the same target item.

For further details, please refer to the paper.

models:
  DuoRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    ssl_type: "us_x"
    ssl_lambda: 0.1
    ssl_lambda_sem: 0.1
    tau: 1.0
    sim_type: "dot"
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

eSASRec¶

eSASRec (Enhanced SASRec): A modular enhancement of SASRec that supports LiGR blocks, sampled-softmax training, and optional mixed-negative sampling. It improves training efficiency while preserving strong sequential modeling performance.

For further details, please refer to the paper.

models:
  eSASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    use_relative_pos: False
    use_sampled_softmax: True
    use_ligr: True
    mn_ratio: 0.0
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

gSASRec¶

gSASRec (General Self-Attentive Sequential Recommendation): Extends SASRec by introducing general self-attention. This enables better modeling of diverse or evolving user interests.

For further details, please refer to the paper.

models:
  gSASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    gbce_t: 0.5
    reuse_item_embeddings: True
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

LightSANs¶

LightSANs (Low-Rank Decomposed Self-Attention Networks): A sequential recommender that improves upon standard self-attention (like SASRec) by introducing low-rank decomposed self-attention to reduce complexity and decoupled position encoding to better model sequential relations.

For further details, please refer to the paper.

models:
  LightSANs:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    k_interests: 5
    inner_size: 512
    dropout_prob: 0.5
    attn_dropout_prob: 0.5
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

LinRec¶

LinRec (Linear Attention Mechanism for Long-term Sequential Recommender Systems): Proposed to address the quadratic complexity of standard Transformers. LinRec employs a linear attention mechanism (O(N)) using L2 normalization and ELU activation. This allows for efficient modeling of long-term user sequences while maintaining the ability to capture complex dependencies, making it significantly faster than standard SASRec on long sequences.

For further details, please refer to the paper.

models:
  LinRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200

SASRec¶

SASRec (Self-Attentive Sequential Recommendation): A Transformer-based model that uses stacked self-attention blocks to capture item dependencies in user sequences. SASRec effectively models dynamic user preferences in sparse datasets, learning both short- and long-term interests.

For further details, please refer to the paper.

models:
  SASRec:
    embedding_size: 128
    n_layers: 2
    n_heads: 8
    inner_size: 512
    dropout_prob: 0.1
    attn_dropout_prob: 0.1
    reg_weight: 0.001
    weight_decay: 0.0001
    batch_size: 2048
    epochs: 200
    learning_rate: 0.001
    neg_samples: 1
    max_seq_len: 200