Multi-Modal Contrastive Pre-training for Recommendation

被引：0

作者：

Liu, Zhuang ^{[1
]}

Ma, Yunpu ^{[2
]}

Schubert, Matthias ^{[2
]}

Ouyang, Yuanxin ^{[1
]}

Xiong, Zhang ^{[3
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Ludwig Maximilians Univ Munchen, Lehrstuhl Datenbanksyst & Data Min, Munich, Germany

[3] Beihang Univ, Minist Educ, Engn Res Ctr Adv Comp Applicat Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Recommender system; Multi-modal side information; Contrastive learning; Pre-training model;

D O I：

10.1145/3512527.3531378

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

引用

页码：99 / 108

页数：10

共 50 条

[31] LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Tu, Yi
Guo, Ya
Chen, Huan
Tang, Jinyang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15200 - 15212
[32] Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
Salemi, Alireza
Rafiee, Mahta
Zamani, Hamed
PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 169 - 176
[33] LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding
Xu, Yang
Xu, Yiheng
Lv, Tengchao
Cui, Lei
Wei, Furu
Wang, Guoxin
Lu, Yijuan
Florencio, Dinei
Zhang, Cha
Che, Wanxiang
Zhang, Min
Zhou, Lidong
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2579 - 2591
[34] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
Yuan, Zheng
Jin, Qiao
Tan, Chuanqi
Zhao, Zhengyun
Yuan, Hongyi
Huang, Fei
Huang, Songfang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 547 - 556
[35] Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training
Yin, Bing
Yin, Shi
Liu, Cong
Zhang, Yanyong
Xi, Changfeng
Yin, Baocai
Ling, Zhenhua
IET COMPUTER VISION, 2024, 18 (01) : 33 - 45
[36] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
Song, Ruihua
MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3
[37] Multi-modal U-Nets with Boundary Loss and Pre-training for Brain Tumor Segmentation
Lorenzo, Pablo Ribalta
Marcinkiewicz, Michal
Nalepa, Jakub
BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 135 - 147
[38] A multi-modal pre-training transformer for universal transfer learning in metal-organic frameworks
Kang, Yeonghun
Park, Hyunsoo
Smit, Berend
Kim, Jihan
NATURE MACHINE INTELLIGENCE, 2023, 5 (03) : 309 - 318
[39] Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Su, Weijie
Zhu, Xizhou
Tao, Chenxin
Lu, Lewei
Li, Bin
Huang, Gao
Qiao, Yu
Wang, Xiaogang
Zhou, Jie
Dai, Jifeng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15888 - 15899
[40] Using contrastive language-image pre-training for Thai recipe recommendation
Chuenbanluesuk, Thanatkorn
Plodprong, Voramate
Karoon, Weerasak
Rueangsri, Kotchakorn
Pojam, Suthasinee
Siriborvornratanakul, Thitirat
LANGUAGE RESOURCES AND EVALUATION, 2025,

← 1 2 3 4 5 →