Image-text bidirectional learning network based cross-modal retrieval

被引:11
|
作者
Li, Zhuoyi [1 ,2 ]
Lu, Huibin [1 ,2 ]
Fu, Hao [1 ,2 ]
Gu, Guanghua [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; bidirectional learning network; common representation space; discriminant consistency loss; bidirectional crisscross loss; REPRESENTATION;
D O I
10.1016/j.neucom.2022.02.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of cross-modal retrieval has attracted significant attention in the cross-media retrieval community. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. The existing numerous cross-modal retrieval approaches tend to jointly construct a common subspace, while these methods fail to consider mutual influence between modalities sufficiently during the whole training process. In this paper, we propose a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. The method constructs a common representation space and directly measures the similarity of heterogeneous data. More specifically, a multi-layer supervision network is proposed to learn the cross-modal relevance of the generated representations. Moreover, a bidirectional crisscross loss function is proposed to preserve the modal invariance with the bidirectional learning strategy in the common representation space. The loss functions of discriminant consistency and the bidirectional crisscross loss are integrated into an objective function which aims to minimize the intra-class distance and maximize the inter-class distance. Comprehensive experimental results on four widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [31] RICH: A rapid method for image-text cross-modal hash retrieval
    Li, Bo
    Yao, Dan
    Li, Zhixin
    DISPLAYS, 2023, 79
  • [32] Deep Cross-Modal Projection Learning for Image-Text Matching
    Zhang, Ying
    Lu, Huchuan
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723
  • [33] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
    Yang, Rui
    Zhang, Di
    Guo, YanHe
    Wang, Shuang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
  • [34] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
    Zhang J.
    Lin Z.
    Jiang X.
    Li M.
    Wang C.
    Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
  • [35] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
    Zheng, Juncheng
    Liang, Meiyu
    Yu, Yang
    Du, Junping
    Xue, Zhe
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
  • [36] Knowledge Decomposition and Replay: A Novel Cross-modal Image-text Retrieval Continual Learning Method
    Yang, Rui
    Wang, Shuang
    Zhang, Huan
    Xu, Siyuan
    Guo, YanHe
    Ye, Xiutiao
    Hou, Biao
    Jiao, Licheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6510 - 6519
  • [37] Masking-Based Cross-Modal Remote Sensing Image-Text Retrieval via Dynamic Contrastive Learning
    Zhao, Zuopeng
    Miao, Xiaoran
    He, Chen
    Hu, Jianfeng
    Min, Bingbing
    Gao, Yumeng
    Liu, Ying
    Pharksuwan, Kanyaphakphachsorn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [38] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [39] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
    Sogi, Naoya
    Shibata, Takashi
    Terao, Makoto
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464
  • [40] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62