MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search

被引:2
|
作者
Zheng, Xiaoyang [1 ]
Wang, Zilong [1 ]
Li, Sen [1 ]
Xu, Ke [2 ]
Zhuang, Tao [1 ]
Liu, Qingwen [1 ]
Zeng, Xiaoyi [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
[2] City Univ Hong Kong, Hong Kong, Peoples R China
关键词
Multimodal Pre-training; Semantic Retrieval; Representation Learning;
D O I
10.1145/3543873.3584627
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in ecommerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the query representations and improve text-to-multimodal matching. To this end, we present a novel vision-language (V+L) pre-training methods to exploit the multimodal information of (user query, product title, product image). Extensive experiments demonstrate that our retrieval-specific pre-training model (referred to as MAKE) outperforms existing V+L pre-training methods on the text-to-multimodal retrieval task. MAKE has been deployed online and brings major improvements on the retrieval system of Taobao Search.
引用
收藏
页码:356 / 360
页数:5
相关论文
共 50 条
  • [1] Delving into E-Commerce Product Retrieval with Vision-Language Pre-training
    Zheng, Xiaoyang
    Lv, Fuyu
    Wang, Zilong
    Liu, Qingwen
    Zeng, Xiaoyi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3385 - 3389
  • [2] Survey on Vision-language Pre-training
    Yin J.
    Zhang Z.-D.
    Gao Y.-H.
    Yang Z.-W.
    Li L.
    Xiao M.
    Sun Y.-Q.
    Yan C.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
  • [3] VLP: A Survey on Vision-language Pre-training
    Chen, Fei-Long
    Zhang, Du-Zhen
    Han, Ming-Lun
    Chen, Xiu-Yi
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
  • [4] VLP: A Survey on Vision-language Pre-training
    Fei-Long Chen
    Du-Zhen Zhang
    Ming-Lun Han
    Xiu-Yi Chen
    Jing Shi
    Shuang Xu
    Bo Xu
    Machine Intelligence Research, 2023, 20 (01) : 38 - 56
  • [5] VLP: A Survey on Vision-language Pre-training
    Fei-Long Chen
    Du-Zhen Zhang
    Ming-Lun Han
    Xiu-Yi Chen
    Jing Shi
    Shuang Xu
    Bo Xu
    Machine Intelligence Research, 2023, 20 : 38 - 56
  • [6] GilBERT: Generative Vision-Language Pre-Training for Image-Text Retrieval
    Hong, Weixiang
    Ji, Kaixiang
    Liu, Jiajia
    Wang, Jian
    Chen, Jingdong
    Chu, Wei
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1379 - 1388
  • [7] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
    Jian, Yiren
    Gao, Chongyang
    Vosoughi, Soroush
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Pre-training A Prompt Pool for Vision-Language Model
    Liu, Jun
    Gu, Yang
    Yang, Zhaohua
    Guo, Shuai
    Liu, Huaqiu
    Chen, Yiqiang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [9] Vision-language pre-training via modal interaction
    Cheng, Hang
    Ye, Hehui
    Zhou, Xiaofei
    Liu, Ximeng
    Chen, Fei
    Wang, Meiqing
    PATTERN RECOGNITION, 2024, 156
  • [10] Contrastive Vision-Language Pre-training with Limited Resources
    Cui, Quan
    Zhou, Boyan
    Guo, Yu
    Yin, Weidong
    Wu, Hao
    Yoshie, Osamu
    Chen, Yubo
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 236 - 253