MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

被引：2

作者：

Zheng, Xiaoyang ^{[1
]}

Wang, Zilong ^{[1
]}

Li, Sen ^{[1
]}

Xu, Ke ^{[2
]}

Zhuang, Tao ^{[1
]}

Liu, Qingwen ^{[1
]}

Zeng, Xiaoyi ^{[1
]}

机构：

[1] Alibaba Grp, Hangzhou, Peoples R China

[2] City Univ Hong Kong, Hong Kong, Peoples R China

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年

关键词：

Multimodal Pre-training; Semantic Retrieval; Representation Learning;

D O I：

10.1145/3543873.3584627

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in ecommerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the query representations and improve text-to-multimodal matching. To this end, we present a novel vision-language (V+L) pre-training methods to exploit the multimodal information of (user query, product title, product image). Extensive experiments demonstrate that our retrieval-specific pre-training model (referred to as MAKE) outperforms existing V+L pre-training methods on the text-to-multimodal retrieval task. MAKE has been deployed online and brings major improvements on the retrieval system of Taobao Search.

引用

页码：356 / 360

页数：5

共 50 条

[21] MAFA: Managing False Negatives for Vision-Language Pre-training
Byun, Jaeseok
Kim, Dohoon
Moon, Taesup
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27304 - 27314
[22] Unsupervised Domain Adaption Harnessing Vision-Language Pre-Training
Zhou, Wenlve
Zhou, Zhiheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8201 - 8214
[23] Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training
Wang, Shengyi
El-Gohary, Nora
COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 1 - 8
[24] Multimodal Pre-training Method for Vision-language Understanding and Generation
Liu T.-Y.
Wu Z.-X.
Chen J.-J.
Jiang Y.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
[25] Unified Vision-Language Pre-Training for Image Captioning and VQA
Zhou, Luowei
Palangi, Hamid
Zhang, Lei
Hu, Houdong
Corso, Jason J.
Gao, Jianfeng
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13041 - 13049
[26] Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis
Ling, Yan
Yu, Jianfei
Xia, Rui
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2149 - 2159
[27] Retrieval-based Knowledge Augmented Vision Language Pre-training
Rao, Jiahua
Shan, Zifei
Liu, Longpo
Zhou, Yao
Yang, Yuedong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5399 - 5409
[28] Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Dou, Zi-Yi
Kamath, Aishwarya
Gan, Zhe
Zhang, Pengchuan
Wang, Jianfeng
Li, Linjie
Liu, Zicheng
Liu, Ce
LeCun, Yann
Peng, Nanyun
Gao, Jianfeng
Wang, Lijuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[29] Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
Gan, Zhe
Li, Linjie
Li, Chunyuan
Wang, Lijuan
Liu, Zicheng
Gao, Jianfeng
FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 14 (3-4): : 163 - 352
[30] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
Chen, Xiaofei
He, Yuting
Xue, Cheng
Ge, Rongjun
Li, Shuo
Yang, Guanyu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415

← 1 2 3 4 5 →