Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

被引:0
|
作者
Huang, Hailang [1 ]
Nie, Zhijie [1 ,2 ]
Wang, Ziqiao [3 ]
Shang, Ziyu [4 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing, Peoples R China
[2] Beihang Univ, Shen Yuan Honors Coll, Beijing, Peoples R China
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada
[4] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plugand-play, meaning it can be easily applied to existing imagetext retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.
引用
收藏
页码:18298 / 18306
页数:9
相关论文
共 50 条
  • [31] UNI-MODAL AND CROSS-MODAL CODING IN THE MENTALLY-RETARDED
    ROSEN, M
    KIVITZ, M
    ROSEN, BS
    AMERICAN JOURNAL OF MENTAL DEFICIENCY, 1965, 69 (05): : 716 - 722
  • [32] Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search
    Wang, Song
    Zhao, Huan
    Li, Keqin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 8022 - 8036
  • [33] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [34] Cross-modal Prominent Fragments Enhancement Aligning Network for Image-text Retrieval
    Zhang, Yang
    Zhou, Yue
    Yang, Zonghao
    Chen, Ao
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [35] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [36] Fine-grained Feature Assisted Cross-modal Image-text Retrieval
    Bu, Chaofei
    Liu, Xueliang
    Huang, Zhen
    Su, Yuling
    Tu, Junfeng
    Hong, Richang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 306 - 320
  • [37] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
    Sogi, Naoya
    Shibata, Takashi
    Terao, Makoto
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464
  • [38] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [39] An Efficient Cross-Modal Privacy-Preserving Image-Text Retrieval Scheme
    Zhang, Kejun
    Xu, Shaofei
    Song, Yutuo
    Xu, Yuwei
    Li, Pengcheng
    Yang, Xiang
    Zou, Bing
    Wang, Wenbin
    SYMMETRY-BASEL, 2024, 16 (08):
  • [40] Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning
    Liu, Junhao
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3242 - 3253