Review of unlabeled image-text cross-modal retrieval based on real-valued features

被引:0
|
作者
Zhang, Li [1 ]
Chen, Kang [1 ]
Sun, Guanghui [2 ]
机构
[1] Faculty of Computing, Harbin Institute of Technology, Harbin,150001, China
[2] School of Astronautics, Harbin Institute of Technology, Harbin,150001, China
关键词
Feature extraction;
D O I
10.11918/202404027
中图分类号
学科分类号
摘要
In order to investigate the current development status and key issues in the field of cross-modal retrieval based on real-valued features for unlabeled datasets (hereinafter referred to as cross-modal retrieval), this paper conducts an analysis and summary of the existing literatures. Cross-modal retrieval refers to the retrieval of samples from one modality that are relevant to a given query from another modality. Firstly, using a time complexity-based classification approach, existing cross-modal retrieval methods are categorized into feature-based methods and score-based methods. Secondly, the research status of these two categories of methods is described, and the main issues in the current stage for each category are analyzed and discussed. Furthermore, two mainstream datasets and commonly used evaluation metrics for cross-modal retrieval are introduced, and the performance of the two categories of methods on public datasets is compared and analyzed. Finally, key issues to be addressed in the field of cross-modal retrieval are summarized. The research indicates that although significant progress has been made in existing cross-modal retrieval methods, there are still key issues that urgently need to be addressed. These key issues represent important directions for future development in the field of cross-modal retrieval. © 2024 Harbin Institute of Technology. All rights reserved.
引用
收藏
页码:1 / 16
相关论文
共 50 条
  • [31] Fine-grained Feature Assisted Cross-modal Image-text Retrieval
    Bu, Chaofei
    Liu, Xueliang
    Huang, Zhen
    Su, Yuling
    Tu, Junfeng
    Hong, Richang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 306 - 320
  • [32] An Efficient Cross-Modal Privacy-Preserving Image-Text Retrieval Scheme
    Zhang, Kejun
    Xu, Shaofei
    Song, Yutuo
    Xu, Yuwei
    Li, Pengcheng
    Yang, Xiang
    Zou, Bing
    Wang, Wenbin
    SYMMETRY-BASEL, 2024, 16 (08):
  • [33] Improving Cross-Modal Image-Text Retrieval With Teacher-Student Learning
    Liu, Junhao
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (08) : 3242 - 3253
  • [34] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [35] Continual learning for cross-modal image-text retrieval based on domain-selective attention
    Yang, Rui
    Wang, Shuang
    Gu, Yu
    Wang, Jihui
    Sun, Yingzhi
    Zhang, Huan
    Liao, Yu
    Jiao, Licheng
    PATTERN RECOGNITION, 2024, 149
  • [36] Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
    Huang, Hailang
    Nie, Zhijie
    Wang, Ziqiao
    Shang, Ziyu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18298 - 18306
  • [37] Adaptive Cross-Modal Embeddings for Image-Text Alignment
    Wehrmann, Pinatas
    Kolling, Camila
    Barros, Rodrigo C.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12313 - 12320
  • [38] Multi-view visual semantic embedding for cross-modal image-text retrieval
    Li, Zheng
    Guo, Caili
    Wang, Xin
    Zhang, Hao
    Hu, Lin
    PATTERN RECOGNITION, 2025, 159
  • [39] Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
    Wang, Jian
    He, Yonghao
    Kang, Cuicui
    Xiang, Shiming
    Pan, Chunhong
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 347 - 354
  • [40] Cross-modal information balance-aware reasoning network for image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Hao, Fei
    Pang, Guangyao
    Wang, Zehao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120