Review of unlabeled image-text cross-modal retrieval based on real-valued features

被引:0
|
作者
Zhang, Li [1 ]
Chen, Kang [1 ]
Sun, Guanghui [2 ]
机构
[1] Faculty of Computing, Harbin Institute of Technology, Harbin,150001, China
[2] School of Astronautics, Harbin Institute of Technology, Harbin,150001, China
关键词
Feature extraction;
D O I
10.11918/202404027
中图分类号
学科分类号
摘要
In order to investigate the current development status and key issues in the field of cross-modal retrieval based on real-valued features for unlabeled datasets (hereinafter referred to as cross-modal retrieval), this paper conducts an analysis and summary of the existing literatures. Cross-modal retrieval refers to the retrieval of samples from one modality that are relevant to a given query from another modality. Firstly, using a time complexity-based classification approach, existing cross-modal retrieval methods are categorized into feature-based methods and score-based methods. Secondly, the research status of these two categories of methods is described, and the main issues in the current stage for each category are analyzed and discussed. Furthermore, two mainstream datasets and commonly used evaluation metrics for cross-modal retrieval are introduced, and the performance of the two categories of methods on public datasets is compared and analyzed. Finally, key issues to be addressed in the field of cross-modal retrieval are summarized. The research indicates that although significant progress has been made in existing cross-modal retrieval methods, there are still key issues that urgently need to be addressed. These key issues represent important directions for future development in the field of cross-modal retrieval. © 2024 Harbin Institute of Technology. All rights reserved.
引用
收藏
页码:1 / 16
相关论文
共 50 条
  • [41] Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval
    Xiong, Siyu
    Pan, Lili
    Ma, Xueqiang
    Hu, Qinghua
    Beckman, Eric
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (10) : 4423 - 4434
  • [42] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
    Chen, Hui
    Ding, Guiguang
    Liu, Xudong
    Lin, Zijia
    Liu, Ji
    Han, Jungong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
  • [43] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297
  • [44] Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
    Wang, Sijin
    Wang, Ruiping
    Yao, Ziwei
    Shan, Shiguang
    Chen, Xilin
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1497 - 1506
  • [45] Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
    Sun, Tianci
    Zheng, Chengyu
    Li, Xiu
    Nie, Jie
    Gao, Yanli
    Huang, Lei
    Wei, Zhiqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6968 - 6980
  • [46] A TRANSFORMER-BASED CROSS-MODAL IMAGE-TEXT RETRIEVAL METHOD USING FEATURE DECOUPLING AND RECONSTRUCTION
    Zhang, Huan
    Sun, Yingzhi
    Liao, Yu
    Xu, SiYuan
    Yang, Rui
    Wang, Shuang
    Hou, Biao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1796 - 1799
  • [47] TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
    Li, Qiqi
    Ma, Longfei
    Jiang, Zheng
    Li, Mingyong
    Jin, Bo
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3713 - 3728
  • [48] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
    Yang, Rui
    Zhang, Di
    Guo, YanHe
    Wang, Shuang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
  • [49] Deep Cross-Modal Projection Learning for Image-Text Matching
    Zhang, Ying
    Lu, Huchuan
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723
  • [50] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842