Semantic-enhanced discriminative embedding learning for cross-modal retrieval

被引:1
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
关键词
Cross-modal retrieval; Semantic enhanced; Erasing; Metric learning;
D O I
10.1007/s13735-022-00237-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models.
引用
收藏
页码:369 / 382
页数:14
相关论文
共 50 条
  • [21] Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning
    Zhang, Liang
    Ma, Bingpeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) : 1220 - 1233
  • [22] Correlation embedding semantic-enhanced hashing for multimedia retrieval
    Big Data Institute, School of Computer Science and Engineering, Central South University, Hunan, ChangSha
    410000, China
    不详
    TN
    37235, United States
    Image Vision Comput, 1600, (February 2025):
  • [23] Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval
    Deng, Cheng
    Tang, Xu
    Yan, Junchi
    Liu, Wei
    Gao, Xinbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) : 208 - 218
  • [24] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [25] Discriminative latent semantics-preserving similarity embedding hashing for cross-modal retrieval
    Chen Y.
    Tan J.
    Yang Z.
    Cheng Y.
    Chen R.
    Neural Computing and Applications, 2024, 36 (18) : 10655 - 10680
  • [26] Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
    Xie, Zhongwei
    Liu, Ling
    Wu, Yanzhao
    Li, Lin
    Zhong, Luo
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3304 - 3316
  • [27] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54
  • [28] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [29] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300
  • [30] Cross-modal semantic autoencoder with embedding consensus
    Shengzi Sun
    Binghui Guo
    Zhilong Mi
    Zhiming Zheng
    Scientific Reports, 11