Skip to main navigation Skip to search Skip to main content

LLM-enhanced multimodal fusion for cross-domain sequential recommendation

  • Wangyu Wu
  • , Zhenhong Chen
  • , Wenqiao Zhang
  • , Siqi Song
  • , Xianglin Qiu
  • , Xiaowei Huang
  • , Fei Ma*
  • , Jimin Xiao*
  • *Corresponding author for this work
  • Xi'an Jiaotong-Liverpool University
  • University of Liverpool
  • Microsoft USA
  • Zhejiang University

Research output: Contribution to journalArticlepeer-review

Abstract

AbstractCross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, capturing both intra- and inter-sequence item relationships. To further enhance the value of visual and textual data, we propose LLM-EMF, an innovative approach that incorporates Large Language Models(LLM) to enrich textual data and boosts recommendation performance by merging visual and textual information. Additionally, a multi-attention mechanism is designed to jointly learn single-domain and cross-domain preferences, effectively capturing complex user interests. Evaluations on four e-commerce datasets demonstrate that LLM-EMF outperforms existing methods in modeling cross-domain user preferences, highlighting the advantages of multimodal integration in sequential recommendation systems.

Original languageEnglish
Article number132228
JournalExpert Systems with Applications
Volume321
DOIs
Publication statusPublished - 25 Jul 2026

Keywords

  • CDSR
  • CLIP-based embeddings
  • Large language models

Cite this