Multi-Modal Contrastive Pre-training for Recommendation

被引:0
|
作者
Liu, Zhuang [1 ]
Ma, Yunpu [2 ]
Schubert, Matthias [2 ]
Ouyang, Yuanxin [1 ]
Xiong, Zhang [3 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Ludwig Maximilians Univ Munchen, Lehrstuhl Datenbanksyst & Data Min, Munich, Germany
[3] Beihang Univ, Minist Educ, Engn Res Ctr Adv Comp Applicat Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Recommender system; Multi-modal side information; Contrastive learning; Pre-training model;
D O I
10.1145/3512527.3531378
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [41] Enhancing Biomedical Multi-modal Representation Learning with Multi-scale Pre-training and Perturbed Report Discrimination
    Zhong, Xinliu
    Batmanghclich, Kayhan
    Sun, Li
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 480 - 485
  • [42] WUKONG- READER: Multi-modal Pre-training for Fine-grained Visual Document Understanding
    Bai, Haoli
    Liu, Zhiguang
    Meng, Xiaojun
    Li, Wentao
    Liu, Shuang
    Luo, Yifeng
    Xie, Nian
    Zheng, Rongfu
    Wang, Liangwei
    Hou, Lu
    Wei, Jiansheng
    Jiang, Xin
    Liu, Qun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13386 - 13401
  • [43] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
    Moon, Jong Hak
    Lee, Hyungyung
    Shin, Woncheol
    Kim, Young-Hak
    Choi, Edward
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
  • [44] Multi-modal cross-domain self-supervised pre-training for fMRI and EEG fusion
    Wei, Xinxu
    Zhao, Kanhao
    Jiao, Yong
    Carlisle, Nancy B.
    Xie, Hua
    Fonzo, Gregory A.
    Zhang, Yu
    NEURAL NETWORKS, 2025, 184
  • [45] Multi-Modal API Recommendation
    Irsan, Ivana Clairine
    Zhang, Ting
    Thung, Ferdian
    Kim, Kisub
    Lo, David
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 272 - 283
  • [46] A Multi-view Molecular Pre-training with Generative Contrastive Learning
    Liu, Yunwu
    Zhang, Ruisheng
    Yuan, Yongna
    Ma, Jun
    Li, Tongfeng
    Yu, Zhixuan
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
  • [47] Connecting Multi-modal Contrastive Representations
    Wang, Zehan
    Zhao, Yang
    Cheng, Xize
    Huang, Haifeng
    Liu, Jiageng
    Tang, Li
    Li, Linjun
    Wang, Yongqi
    Yin, Aoxiong
    Zhang, Ziang
    Zhao, Zhou
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
    Jiang, Chaoya
    Ye, Wei
    Xu, Haiyang
    Huang, Songfang
    Huang, Fei
    Zhang, Shikun
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
  • [49] Research on image caption generation method based on multi-modal pre-training model and text mixup optimization
    Sun, Jing-Tao
    Min, Xuan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5743 - 5761
  • [50] Adversarial momentum-contrastive pre-training
    Xu, Cong
    Li, Dan
    Yang, Min
    PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179