Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

被引:0
|
作者
Huang, Hailang [1 ]
Nie, Zhijie [1 ,2 ]
Wang, Ziqiao [3 ]
Shang, Ziyu [4 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing, Peoples R China
[2] Beihang Univ, Shen Yuan Honors Coll, Beijing, Peoples R China
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada
[4] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plugand-play, meaning it can be easily applied to existing imagetext retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.
引用
收藏
页码:18298 / 18306
页数:9
相关论文
共 50 条
  • [41] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [42] Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency for Image-Text Matching
    Zeng, Pengpeng
    Gao, Lianli
    Lyu, Xinyu
    Jing, Shuaiqi
    Song, Jingkuan
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2205 - 2213
  • [43] Cross-modal image-text search via Efficient Discrete Class Alignment Hashing
    Wang, Song
    Zhao, Huan
    Wang, Yunbo
    Huang, Jing
    Li, Keqin
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (03)
  • [44] Multi-view visual semantic embedding for cross-modal image-text retrieval
    Li, Zheng
    Guo, Caili
    Wang, Xin
    Zhang, Hao
    Hu, Lin
    PATTERN RECOGNITION, 2025, 159
  • [45] Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
    Wang, Jian
    He, Yonghao
    Kang, Cuicui
    Xiang, Shiming
    Pan, Chunhong
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 347 - 354
  • [46] Cross-modal fabric image-text retrieval based on convolutional neural network and TinyBERT
    Xiang, Jun
    Zhang, Ning
    Pan, Ruru
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59725 - 59746
  • [47] Cross-modal information balance-aware reasoning network for image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Hao, Fei
    Pang, Guangyao
    Wang, Zehao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
  • [48] Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval
    Xiong, Siyu
    Pan, Lili
    Ma, Xueqiang
    Hu, Qinghua
    Beckman, Eric
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (10) : 4423 - 4434
  • [49] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
    Chen, Hui
    Ding, Guiguang
    Liu, Xudong
    Lin, Zijia
    Liu, Ji
    Han, Jungong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
  • [50] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506