Image-text bidirectional learning network based cross-modal retrieval

被引:11
|
作者
Li, Zhuoyi [1 ,2 ]
Lu, Huibin [1 ,2 ]
Fu, Hao [1 ,2 ]
Gu, Guanghua [1 ,2 ]
机构
[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China
[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; bidirectional learning network; common representation space; discriminant consistency loss; bidirectional crisscross loss; REPRESENTATION;
D O I
10.1016/j.neucom.2022.02.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of cross-modal retrieval has attracted significant attention in the cross-media retrieval community. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. The existing numerous cross-modal retrieval approaches tend to jointly construct a common subspace, while these methods fail to consider mutual influence between modalities sufficiently during the whole training process. In this paper, we propose a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. The method constructs a common representation space and directly measures the similarity of heterogeneous data. More specifically, a multi-layer supervision network is proposed to learn the cross-modal relevance of the generated representations. Moreover, a bidirectional crisscross loss function is proposed to preserve the modal invariance with the bidirectional learning strategy in the common representation space. The loss functions of discriminant consistency and the bidirectional crisscross loss are integrated into an objective function which aims to minimize the intra-class distance and maximize the inter-class distance. Comprehensive experimental results on four widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:148 / 159
页数:12
相关论文
共 50 条
  • [41] Fine-grained Feature Assisted Cross-modal Image-text Retrieval
    Bu, Chaofei
    Liu, Xueliang
    Huang, Zhen
    Su, Yuling
    Tu, Junfeng
    Hong, Richang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XI, 2025, 15041 : 306 - 320
  • [42] An Efficient Cross-Modal Privacy-Preserving Image-Text Retrieval Scheme
    Zhang, Kejun
    Xu, Shaofei
    Song, Yutuo
    Xu, Yuwei
    Li, Pengcheng
    Yang, Xiang
    Zou, Bing
    Wang, Wenbin
    SYMMETRY-BASEL, 2024, 16 (08):
  • [43] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [44] Review of unlabeled image-text cross-modal retrieval based on real-valued features
    Zhang, Li
    Chen, Kang
    Sun, Guanghui
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2024, 56 (09): : 1 - 16
  • [45] Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
    Huang, Hailang
    Nie, Zhijie
    Wang, Ziqiao
    Shang, Ziyu
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18298 - 18306
  • [46] Perceive, Reason, and Align: Context-guided cross-modal correlation learning for image-text retrieval
    Liu, Zheng
    Pei, Xinlei
    Gao, Shanshan
    Li, Changhao
    Wang, Jingyao
    Xu, Junhao
    APPLIED SOFT COMPUTING, 2024, 154
  • [47] Adaptive Cross-Modal Embeddings for Image-Text Alignment
    Wehrmann, Pinatas
    Kolling, Camila
    Barros, Rodrigo C.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12313 - 12320
  • [48] Multi-view visual semantic embedding for cross-modal image-text retrieval
    Li, Zheng
    Guo, Caili
    Wang, Xin
    Zhang, Hao
    Hu, Lin
    PATTERN RECOGNITION, 2025, 159
  • [49] Unsupervised deep hashing with multiple similarity preservation for cross-modal image-text retrieval
    Xiong, Siyu
    Pan, Lili
    Ma, Xueqiang
    Hu, Qinghua
    Beckman, Eric
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (10) : 4423 - 4434
  • [50] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
    Chen, Hui
    Ding, Guiguang
    Liu, Xudong
    Lin, Zijia
    Liu, Ji
    Han, Jungong
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660