Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval

被引:0
|
作者
Huang, Hailang [1 ]
Nie, Zhijie [1 ,2 ]
Wang, Ziqiao [3 ]
Shang, Ziyu [4 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing, Peoples R China
[2] Beihang Univ, Shen Yuan Honors Coll, Beijing, Peoples R China
[3] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada
[4] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plugand-play, meaning it can be easily applied to existing imagetext retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24 itr cusa.
引用
收藏
页码:18298 / 18306
页数:9
相关论文
共 50 条
  • [1] Cross-modal alignment with graph reasoning for image-text retrieval
    Zheng Cui
    Yongli Hu
    Yanfeng Sun
    Junbin Gao
    Baocai Yin
    Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
  • [2] Cross-modal alignment with graph reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Gao, Junbin
    Yin, Baocai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 23615 - 23632
  • [3] Adaptive Cross-Modal Embeddings for Image-Text Alignment
    Wehrmann, Pinatas
    Kolling, Camila
    Barros, Rodrigo C.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12313 - 12320
  • [4] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [5] Online deep hashing for both uni-modal and cross-modal retrieval
    Xie, Yicai
    Zeng, Xianhua
    Wang, Tinghua
    Yi, Yun
    Information Sciences, 2022, 608 : 1480 - 1502
  • [6] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757
  • [7] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [8] Online deep hashing for both uni-modal and cross-modal retrieval
    Xie, Yicai
    Zeng, Xianhua
    Wang, Tinghua
    Yi, Yun
    INFORMATION SCIENCES, 2022, 608 : 1480 - 1502
  • [9] Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval
    Zhang, Shun
    Li, Yupeng
    Mei, Shaohui
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [10] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
    Seo, Sanghyun
    Kim, Juntae
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353