Deep cross-modal hashing with multi-task latent space learning

被引:2
|
作者
Wu, Song [1 ]
Yuan, Xiang [1 ]
Xiao, Guoqiang [1 ]
Lew, Michael S. [2 ]
Gao, Xinbo [3 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
[2] Leiden Univ, Liacs Media Lab, Leiden, Netherlands
[3] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing, Peoples R China
关键词
Cross-modal retrieval; Deep hashing; Semantic dependency; Knowledge transfer; BINARY-CODES; REPRESENTATION;
D O I
10.1016/j.engappai.2024.108944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal Hashing (CMH) retrieval aims to mutually search data from heterogeneous modalities by projecting original modality data into a common hamming space, with the significant advantages of low storage and computing costs. However, CMH remains challenging for multi-label cross-modal datasets. Firstly, preserving content similarity would inevitably be deficient under the representation of short-length binary codes. Secondly, different semantics are treated independently, whereas their co-occurrences are neglected, reducing retrieval quality. Thirdly, the commonly used metric learning objective is ineffective in capturing similarity information at a fine-grained level, leading to the imprecise preservation of such information. Therefore, we propose a Deep Cross-Modal Hashing with Multi-Task Latent Space Learning (DMLSH) framework to tackle these bottlenecks. For a more thorough excavation of distinctive features with diverse characteristics underneath heterogeneous data, our DMLSH is designed to preserve three different types of knowledge. The first is the semantic relevance and co-occurrence with the integration of the attention module and the Long Short-Term Memory (LSTM) layer; The second is the highly precise pairwise correlation considering the quantification of semantic similarity with self-paced optimization; The last is the pairwise similarity information discovered by a self-supervised semantic network from a perspective of probabilistic knowledge transfer. Abundant knowledge from the latent spaces is seamlessly refined and fused into a common Hamming space by a hashing attention mechanism, facilitating the discrimination of hash codes and the elimination of modalities' heterogeneity. Exhaustive experiments demonstrate the state-of-the-art performance of our proposed DMLSH on four mainstream cross-modal retrieval benchmarks.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] CROSS-MODAL DEEP METRIC LEARNING WITH MULTI-TASK REGULARIZATION
    Huang, Xin
    Peng, Yuxin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 943 - 948
  • [2] Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval
    Xie, De
    Deng, Cheng
    Li, Chao
    Liu, Xianglong
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 3626 - 3637
  • [3] Deep Cross-Modal Hashing
    Jiang, Qing-Yuan
    Li, Wu-Jun
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3270 - 3278
  • [4] Multi-task Learning for Deep Semantic Hashing
    Ma, Lei
    Li, Hongliang
    Wu, Qingbo
    Shang, Chao
    Ngan, Kingngi
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [5] Learning a maximized shared latent factor for cross-modal hashing
    Wang, Song
    Zhao, Huan
    Nai, Kei
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [6] Task-adaptive Asymmetric Deep Cross-modal Hashing
    Li, Fengling
    Wang, Tong
    Zhu, Lei
    Zhang, Zheng
    Wang, Xinhua
    KNOWLEDGE-BASED SYSTEMS, 2021, 219 (219)
  • [7] Learning a maximized shared latent factor for cross-modal hashing
    Wang, Song
    Zhao, Huan
    Nai, Kei
    Knowledge-Based Systems, 2021, 228
  • [8] Deep Hashing Similarity Learning for Cross-Modal Retrieval
    Ma, Ying
    Wang, Meng
    Lu, Guangyun
    Sun, Yajun
    IEEE ACCESS, 2024, 12 : 8609 - 8618
  • [9] Contrastive Multi-Bit Collaborative Learning for Deep Cross-Modal Hashing
    Wu, Qingpeng
    Zhang, Zheng
    Liu, Yishu
    Zhang, Jingyi
    Nie, Liqiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 5835 - 5848
  • [10] FROM INTRA-MODAL TO INTER-MODAL SPACE: MULTI-TASK LEARNING OF SHARED REPRESENTATIONS FOR CROSS-MODAL RETRIEVAL
    Choi, Jaeyoung
    Larson, Martha
    Friedland, Gerald
    Hanjalic, Alan
    2019 IEEE FIFTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2019), 2019, : 1 - 10