Review of unlabeled image-text cross-modal retrieval based on real-valued features

被引:0
|
作者
Zhang, Li [1 ]
Chen, Kang [1 ]
Sun, Guanghui [2 ]
机构
[1] Faculty of Computing, Harbin Institute of Technology, Harbin,150001, China
[2] School of Astronautics, Harbin Institute of Technology, Harbin,150001, China
关键词
Feature extraction;
D O I
10.11918/202404027
中图分类号
学科分类号
摘要
In order to investigate the current development status and key issues in the field of cross-modal retrieval based on real-valued features for unlabeled datasets (hereinafter referred to as cross-modal retrieval), this paper conducts an analysis and summary of the existing literatures. Cross-modal retrieval refers to the retrieval of samples from one modality that are relevant to a given query from another modality. Firstly, using a time complexity-based classification approach, existing cross-modal retrieval methods are categorized into feature-based methods and score-based methods. Secondly, the research status of these two categories of methods is described, and the main issues in the current stage for each category are analyzed and discussed. Furthermore, two mainstream datasets and commonly used evaluation metrics for cross-modal retrieval are introduced, and the performance of the two categories of methods on public datasets is compared and analyzed. Finally, key issues to be addressed in the field of cross-modal retrieval are summarized. The research indicates that although significant progress has been made in existing cross-modal retrieval methods, there are still key issues that urgently need to be addressed. These key issues represent important directions for future development in the field of cross-modal retrieval. © 2024 Harbin Institute of Technology. All rights reserved.
引用
收藏
页码:1 / 16
相关论文
共 50 条
  • [21] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
  • [22] An Enhanced Feature Extraction Framework for Cross-Modal Image-Text Retrieval
    Zhang, Jinzhi
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Xu
    Zhang, Haisu
    REMOTE SENSING, 2024, 16 (12)
  • [23] RICH: A rapid method for image-text cross-modal hash retrieval
    Li, Bo
    Yao, Dan
    Li, Zhixin
    DISPLAYS, 2023, 79
  • [24] Cross-modal fabric image-text retrieval based on convolutional neural network and TinyBERT
    Xiang, Jun
    Zhang, Ning
    Pan, Ruru
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59725 - 59746
  • [25] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
    Zhang J.
    Lin Z.
    Jiang X.
    Li M.
    Wang C.
    Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
  • [26] SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1086 - 1097
  • [27] Cross-modal Prominent Fragments Enhancement Aligning Network for Image-text Retrieval
    Zhang, Yang
    Zhou, Yue
    Yang, Zonghao
    Chen, Ao
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [28] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [29] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
    Sogi, Naoya
    Shibata, Takashi
    Terao, Makoto
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464
  • [30] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62