Inflate and Shrink: Enriching and Reducing Interactions for Fast Text-Image Retrieval

被引:0
|
作者
Liu, Haoliang [1 ]
Yu, Tan
Li, Ping
机构
[1] Baidu Res, Cognt Comp Lab, 10 Xibeiwang East Rd, Beijing 100193, Peoples R China
关键词
LANGUAGE; VISION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.
引用
收藏
页码:9796 / 9809
页数:14
相关论文
共 50 条
  • [1] U-BERT for Fast and Scalable Text-Image Retrieval
    Yu, Tan
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE 2022 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2022, 2022, : 103 - 113
  • [2] Text-Image Retrieval With Salient Features
    Feng, Xia
    Hu, Zhiyi
    Liu, Caihua
    Ip, W. H.
    Chen, Huiying
    JOURNAL OF DATABASE MANAGEMENT, 2021, 32 (04) : 1 - 13
  • [3] Experiences in evaluating multilingual and text-image information retrieval
    Garcia-Serrano, Ana M.
    Martinez-Fernandez, Jose L.
    Martinez, Paloma
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2006, 21 (07) : 655 - 677
  • [4] A Learning to Rank framework applied to text-image retrieval
    David Buffoni
    Sabrina Tollari
    Patrick Gallinari
    Multimedia Tools and Applications, 2012, 60 : 161 - 180
  • [5] A Learning to Rank framework applied to text-image retrieval
    Buffoni, David
    Tollari, Sabrina
    Gallinari, Patrick
    MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 60 (01) : 161 - 180
  • [6] Improving text-image cross-modal retrieval with contrastive loss
    Chumeng Zhang
    Yue Yang
    Junbo Guo
    Guoqing Jin
    Dan Song
    An An Liu
    Multimedia Systems, 2023, 29 : 569 - 575
  • [7] Enhancing Text-Image Person Retrieval Through Nuances Varied Sample
    Xia, Jiaer
    Yang, Haozhe
    Zhang, Yan
    Dai, Pingyang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 185 - 196
  • [8] Knowledge-Aware Text-Image Retrieval for Remote Sensing Images
    Mi, Li
    Dai, Xianjie
    Castillo-Navarro, Javiera
    Tuia, Devis
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [9] Improving text-image cross-modal retrieval with contrastive loss
    Zhang, Chumeng
    Yang, Yue
    Guo, Junbo
    Jin, Guoqing
    Song, Dan
    Liu, An An
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
  • [10] Federated training of GNNs with similarity graph reasoning for text-image retrieval
    Yan, Xueming
    Wang, Chuyue
    Jin, Yaochu
    NEUROCOMPUTING, 2025, 623