Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

被引：0

作者：

Huang, Hailang ^{[1
]}

Nie, Zhijie ^{[1
,2
]}

Wang, Ziqiao ^{[3
]}

Shang, Ziyu ^{[4
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing, Peoples R China

[2] Beihang Univ, Shen Yuan Honors Coll, Beijing, Peoples R China

[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada

[4] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plugand-play, meaning it can be easily applied to existing imagetext retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.

引用

页码：18298 / 18306

页数：9

共 50 条

[21] Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval
Lu, Haoyu
Huo, Yuqi
Ding, Mingyu
Fei, Nanyi
Lu, Zhiwu
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (04) : 569 - 582
[22] Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Mithun, Niluthpol Chowdhury
Panda, Rameswar
Papalexakis, Evangelos E.
Roy-Chowdhury, Amit K.
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1856 - 1864
[23] SAM: cross-modal semantic alignments module for image-text retrieval
Pilseo Park
Soojin Jang
Yunsung Cho
Youngbin Kim
Multimedia Tools and Applications, 2024, 83 : 12363 - 12377
[24] Cross-modal Contrastive Learning for Generalizable and Efficient Image-text Retrieval
Haoyu Lu
Yuqi Huo
Mingyu Ding
Nanyi Fei
Zhiwu Lu
Machine Intelligence Research, 2023, 20 : 569 - 582
[25] Global-Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image-Text Retrieval
Hu, Gang
Wen, Zaidao
Lv, Yafei
Zhang, Jianting
Wu, Qian
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[26] SAM: cross-modal semantic alignments module for image-text retrieval
Park, Pilseo
Jang, Soojin
Cho, Yunsung
Kim, Youngbin
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
[27] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
Zeng, Sheng
Liu, Changhong
Zhou, Jun
Chen, Yong
Jiang, Aiwen
Li, Hanxi
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
[28] An Enhanced Feature Extraction Framework for Cross-Modal Image-Text Retrieval
Zhang, Jinzhi
Wang, Luyao
Zheng, Fuzhong
Wang, Xu
Zhang, Haisu
REMOTE SENSING, 2024, 16 (12)
[29] Image-text bidirectional learning network based cross-modal retrieval
Li, Zhuoyi
Lu, Huibin
Fu, Hao
Gu, Guanghua
NEUROCOMPUTING, 2022, 483 : 148 - 159
[30] RICH: A rapid method for image-text cross-modal hash retrieval
Li, Bo
Yao, Dan
Li, Zhixin
DISPLAYS, 2023, 79

← 1 2 3 4 5 →