Inflate and Shrink: Enriching and Reducing Interactions for Fast Text-Image Retrieval

被引：0

作者：

Liu, Haoliang ^{[1
]}

Yu, Tan

Li, Ping

机构：

[1] Baidu Res, Cognt Comp Lab, 10 Xibeiwang East Rd, Beijing 100193, Peoples R China

来源：

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年

关键词：

LANGUAGE; VISION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

By exploiting the cross-modal attention, cross-BERT methods have achieved state-of-the-art accuracy in cross-modal retrieval. Nevertheless, the heavy text-image interactions in the cross-BERT model are prohibitively slow for large-scale retrieval. Late-interaction methods trade off retrieval accuracy and efficiency by exploiting cross-modal interaction only in the late stage, attaining a satisfactory retrieval speed. In this work, we propose an inflating and shrinking approach to further boost the efficiency and accuracy of late-interaction methods. The inflating operation plugs several codes in the input of the encoder to exploit the text-image interactions more thoroughly for higher retrieval accuracy. Then the shrinking operation gradually reduces the text-image interactions through knowledge distilling for higher efficiency. Through an inflating operation followed by a shrinking operation, both efficiency and accuracy of a late-interaction model are boosted. Systematic experiments on public benchmarks demonstrate the effectiveness of our inflating and shrinking approach.

引用

页码：9796 / 9809

页数：14

共 50 条

[1] U-BERT for Fast and Scalable Text-Image Retrieval
Yu, Tan
Fei, Hongliang
Li, Ping
PROCEEDINGS OF THE 2022 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2022, 2022, : 103 - 113
[2] Text-Image Retrieval With Salient Features
Feng, Xia
Hu, Zhiyi
Liu, Caihua
Ip, W. H.
Chen, Huiying
JOURNAL OF DATABASE MANAGEMENT, 2021, 32 (04) : 1 - 13
[3] Experiences in evaluating multilingual and text-image information retrieval
Garcia-Serrano, Ana M.
Martinez-Fernandez, Jose L.
Martinez, Paloma
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2006, 21 (07) : 655 - 677
[4] A Learning to Rank framework applied to text-image retrieval
David Buffoni
Sabrina Tollari
Patrick Gallinari
Multimedia Tools and Applications, 2012, 60 : 161 - 180
[5] A Learning to Rank framework applied to text-image retrieval
Buffoni, David
Tollari, Sabrina
Gallinari, Patrick
MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 60 (01) : 161 - 180
[6] Improving text-image cross-modal retrieval with contrastive loss
Chumeng Zhang
Yue Yang
Junbo Guo
Guoqing Jin
Dan Song
An An Liu
Multimedia Systems, 2023, 29 : 569 - 575
[7] Enhancing Text-Image Person Retrieval Through Nuances Varied Sample
Xia, Jiaer
Yang, Haozhe
Zhang, Yan
Dai, Pingyang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 185 - 196
[8] Knowledge-Aware Text-Image Retrieval for Remote Sensing Images
Mi, Li
Dai, Xianjie
Castillo-Navarro, Javiera
Tuia, Devis
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[9] Improving text-image cross-modal retrieval with contrastive loss
Zhang, Chumeng
Yang, Yue
Guo, Junbo
Jin, Guoqing
Song, Dan
Liu, An An
MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
[10] Federated training of GNNs with similarity graph reasoning for text-image retrieval
Yan, Xueming
Wang, Chuyue
Jin, Yaochu
NEUROCOMPUTING, 2025, 623

← 1 2 3 4 5 →