Multi-Modal Contrastive Pre-training for Recommendation

被引:0
|
作者
Liu, Zhuang [1 ]
Ma, Yunpu [2 ]
Schubert, Matthias [2 ]
Ouyang, Yuanxin [1 ]
Xiong, Zhang [3 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Ludwig Maximilians Univ Munchen, Lehrstuhl Datenbanksyst & Data Min, Munich, Germany
[3] Beihang Univ, Minist Educ, Engn Res Ctr Adv Comp Applicat Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Recommender system; Multi-modal side information; Contrastive learning; Pre-training model;
D O I
10.1145/3512527.3531378
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [1] Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training
    Liu, Hao
    Han, Jindong
    Fu, Yanjie
    Li, Yanyan
    Chen, Kai
    Xiong, Hui
    VLDB JOURNAL, 2023, 32 (02): : 325 - 342
  • [2] Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training
    Hao Liu
    Jindong Han
    Yanjie Fu
    Yanyan Li
    Kai Chen
    Hui Xiong
    The VLDB Journal, 2023, 32 : 325 - 342
  • [3] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
    Zhao, Tianqi
    Kong, Ming
    Liang, Tian
    Zhu, Qiang
    Kuang, Kun
    Wu, Fei
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
  • [4] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [5] Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training
    Ju, Xincheng
    Zhang, Dong
    Zhu, Suyang
    Li, Junhui
    Li, Shoushan
    Zhou, Guodong
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1045 - 1055
  • [6] MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation
    Wang, Jinpeng
    Zeng, Ziyun
    Wang, Yunxiao
    Wang, Yuting
    Lu, Xingyu
    Li, Tianxiang
    Yuan, Jun
    Zhang, Rui
    Zheng, Hai-Tao
    Xia, Shu-Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6548 - 6557
  • [7] Temporal Contrastive Pre-Training for Sequential Recommendation
    Tian, Changxin
    Lin, Zihan
    Bian, Shuqing
    Wang, Jinpeng
    Zhao, Wayne Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1925 - 1934
  • [8] TableVLM: Multi-modal Pre-training for Table Structure Recognition
    Chen, Leiyuan
    Huang, Chengsong
    Zheng, Xiaoqing
    Lin, Jinshu
    Huang, Xuanjing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2437 - 2449
  • [9] MGeo: Multi-Modal Geographic Language Model Pre-Training
    Ding, Ruixue
    Chen, Boli
    Xie, Pengjun
    Huang, Fei
    Li, Xin
    Zhang, Qiang
    Xu, Yao
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
  • [10] Enriching AI-based Predictive Models from Retinal Imaging by Multi-Modal Contrastive Pre-training
    Sukei, Emese
    Riedl, Sophie
    Rumetshofer, Elisabeth
    Schmidinger, Niklas
    Mayr, Andreas
    Schmidt-Erfurth, Ursula
    Klambauer, Guenter
    Bogunovic, Hrvoje
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)