Semantic-enhanced discriminative embedding learning for cross-modal retrieval

被引:1
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
关键词
Cross-modal retrieval; Semantic enhanced; Erasing; Metric learning;
D O I
10.1007/s13735-022-00237-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models.
引用
收藏
页码:369 / 382
页数:14
相关论文
共 50 条
  • [31] Cross-modal hashing with semantic deep embedding
    Yan, Cheng
    Bai, Xiao
    Wang, Shuai
    Zhou, Jun
    Hancock, Edwin R.
    NEUROCOMPUTING, 2019, 337 : 58 - 66
  • [32] Binary Set Embedding for Cross-Modal Retrieval
    Yu, Mengyang
    Liu, Li
    Shao, Ling
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (12) : 2899 - 2910
  • [33] Cross-modal semantic autoencoder with embedding consensus
    Sun, Shengzi
    Guo, Binghui
    Mi, Zhilong
    Zheng, Zhiming
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [34] Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval
    Zhu, Lei
    Zhang, Chengyuan
    Song, Jiayu
    Zhang, Shichao
    Tian, Chunwei
    Zhu, Xinghui
    IEEE MULTIMEDIA, 2022, 29 (03) : 17 - 26
  • [35] Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval
    Ge, Xuri
    Chen, Fuhai
    Xu, Songpei
    Tao, Fuxiang
    Jose, Joemon M.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1022 - 1031
  • [36] Discriminative Cross-Modal Hashing with Coupled Semantic Correlation
    Yan S.-Y.
    Liu C.-H.
    Jiang A.-W.
    Ye J.-H.
    Wang M.-W.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (01): : 164 - 175
  • [37] Discriminative Correlation Quantization for Cross-Modal Similarity Retrieval
    Tang, Jun
    Li, XuanMeng
    Wang, Nian
    Zhu, Ming
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 700 - 710
  • [38] Discriminative correlation hashing for supervised cross-modal retrieval
    Lu, Xu
    Zhang, Huaxiang
    Sun, Jiande
    Wang, Zhenhua
    Guo, Peilian
    Wan, Wenbo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 65 : 221 - 230
  • [39] Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (05) : 2494 - 2507
  • [40] Learning latent hash codes with discriminative structure preserving for cross-modal retrieval
    Zhang, Donglin
    Wu, Xiao-Jun
    Yu, Jun
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (01) : 283 - 297