Image-text bidirectional learning network based cross-modal retrieval

被引：11

作者：

Li, Zhuoyi ^{[1
,2
]}

Lu, Huibin ^{[1
,2
]}

Fu, Hao ^{[1
,2
]}

Gu, Guanghua ^{[1
,2
]}

机构：

[1] Yanshan Univ, Sch Informat Sci & Engn, Qinhuangdao, Peoples R China

[2] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao, Hebei, Peoples R China

来源：

NEUROCOMPUTING | 2022年 / 483卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; bidirectional learning network; common representation space; discriminant consistency loss; bidirectional crisscross loss; REPRESENTATION;

D O I：

10.1016/j.neucom.2022.02.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The problem of cross-modal retrieval has attracted significant attention in the cross-media retrieval community. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. The existing numerous cross-modal retrieval approaches tend to jointly construct a common subspace, while these methods fail to consider mutual influence between modalities sufficiently during the whole training process. In this paper, we propose a novel image-text Bidirectional Learning Network (BLN) based cross-modal retrieval method. The method constructs a common representation space and directly measures the similarity of heterogeneous data. More specifically, a multi-layer supervision network is proposed to learn the cross-modal relevance of the generated representations. Moreover, a bidirectional crisscross loss function is proposed to preserve the modal invariance with the bidirectional learning strategy in the common representation space. The loss functions of discriminant consistency and the bidirectional crisscross loss are integrated into an objective function which aims to minimize the intra-class distance and maximize the inter-class distance. Comprehensive experimental results on four widely-used databases show that the proposed method is effective and superior to the existing cross-modal retrieval methods. (c) 2022 Elsevier B.V. All rights reserved.

引用

页码：148 / 159

页数：12

共 50 条

[31] RICH: A rapid method for image-text cross-modal hash retrieval
Li, Bo
Yao, Dan
Li, Zhixin
DISPLAYS, 2023, 79
[32] Deep Cross-Modal Projection Learning for Image-Text Matching
Zhang, Ying
Lu, Huchuan
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723
[33] A TEXTURE AND SALIENCY ENHANCED IMAGE LEARNING METHOD FOR CROSS-MODAL REMOTE SENSING IMAGE-TEXT RETRIEVAL
Yang, Rui
Zhang, Di
Guo, YanHe
Wang, Shuang
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 4895 - 4898
[34] Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrieval
Zhang J.
Lin Z.
Jiang X.
Li M.
Wang C.
Multimedia Tools and Applications, 2024, 83 (42) : 90487 - 90509
[35] Multimodal Knowledge Graph-guided Cross-Modal Graph Network for Image-text Retrieval
Zheng, Juncheng
Liang, Meiyu
Yu, Yang
Du, Junping
Xue, Zhe
2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 97 - 100
[36] Knowledge Decomposition and Replay: A Novel Cross-modal Image-text Retrieval Continual Learning Method
Yang, Rui
Wang, Shuang
Zhang, Huan
Xu, Siyuan
Guo, YanHe
Ye, Xiutiao
Hou, Biao
Jiao, Licheng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6510 - 6519
[37] Masking-Based Cross-Modal Remote Sensing Image-Text Retrieval via Dynamic Contrastive Learning
Zhao, Zuopeng
Miao, Xiaoran
He, Chen
Hu, Jianfeng
Min, Bingbing
Gao, Yumeng
Liu, Ying
Pharksuwan, Kanyaphakphachsorn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
[38] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
[39] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Sogi, Naoya
Shibata, Takashi
Terao, Makoto
COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464
[40] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
Huang, Jinghao
Chen, Yaxiong
Xiong, Shengwu
Lu, Xiaoqiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62

← 1 2 3 4 5 →