Inflate and Shrink: Enriching and Reducing Interactions for Fast Text-Image Retrieval

被引:0
|
作者
Liu, Haoliang [1 ]
Yu, Tan
Li, Ping
机构
[1] Baidu Res, Cognt Comp Lab, 10 Xibeiwang East Rd, Beijing 100193, Peoples R China
关键词
LANGUAGE; VISION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.
引用
收藏
页码:9796 / 9809
页数:14
相关论文
共 50 条
  • [31] MGAN: Attempting a Multimodal Graph Attention Network for Remote Sensing Cross-Modal Text-Image Retrieval
    Wang, Zhiming
    Dong, Zhihua
    Yang, Xiaoyu
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 261 - 273
  • [32] A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text-Image Retrieval in Remote Sensing
    Zhang, Xiong
    Li, Weipeng
    Wang, Xu
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Long
    Zhang, Haisu
    REMOTE SENSING, 2023, 15 (18)
  • [33] Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising
    Yu, Tan
    Liu, Jie
    Jin, Zhipeng
    Yang, Yi
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4655 - 4660
  • [34] Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus
    Schneider, Florian
    Biemann, Chris
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3245 - 3250
  • [35] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Wenhao Li
    Hongqing Zhu
    Suyi Yang
    Pengyu Wang
    Han Zhang
    Neural Computing and Applications, 2022, 34 : 21387 - 21401
  • [36] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    Neural Computing and Applications, 2022, 34 (23) : 21387 - 21401
  • [37] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21387 - 21401
  • [38] A Jointly Guided Deep Network for Fine-Grained Cross-Modal Remote Sensing Text-Image Retrieval
    Yang, Lei
    Feng, Yong
    Zhou, Mingling
    Xiong, Xiancai
    Wang, Yongheng
    Qiang, Baohua
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2023, 32 (13)
  • [39] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
    Zhang, Shun
    Li, Yupeng
    Mei, Shaohui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [40] Complementary or Substitutive? A Novel Deep Learning Method to Leverage Text-image Interactions for Multimodal Review Helpfulness Prediction
    Xiao, Shuaiyong
    Chen, Gang
    Zhang, Chenghong
    Li, Xiangge
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 208