Deep cross-modal hashing with multi-task latent space learning

被引:2
|
作者
Wu, Song [1 ]
Yuan, Xiang [1 ]
Xiao, Guoqiang [1 ]
Lew, Michael S. [2 ]
Gao, Xinbo [3 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
[2] Leiden Univ, Liacs Media Lab, Leiden, Netherlands
[3] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing, Peoples R China
关键词
Cross-modal retrieval; Deep hashing; Semantic dependency; Knowledge transfer; BINARY-CODES; REPRESENTATION;
D O I
10.1016/j.engappai.2024.108944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal Hashing (CMH) retrieval aims to mutually search data from heterogeneous modalities by projecting original modality data into a common hamming space, with the significant advantages of low storage and computing costs. However, CMH remains challenging for multi-label cross-modal datasets. Firstly, preserving content similarity would inevitably be deficient under the representation of short-length binary codes. Secondly, different semantics are treated independently, whereas their co-occurrences are neglected, reducing retrieval quality. Thirdly, the commonly used metric learning objective is ineffective in capturing similarity information at a fine-grained level, leading to the imprecise preservation of such information. Therefore, we propose a Deep Cross-Modal Hashing with Multi-Task Latent Space Learning (DMLSH) framework to tackle these bottlenecks. For a more thorough excavation of distinctive features with diverse characteristics underneath heterogeneous data, our DMLSH is designed to preserve three different types of knowledge. The first is the semantic relevance and co-occurrence with the integration of the attention module and the Long Short-Term Memory (LSTM) layer; The second is the highly precise pairwise correlation considering the quantification of semantic similarity with self-paced optimization; The last is the pairwise similarity information discovered by a self-supervised semantic network from a perspective of probabilistic knowledge transfer. Abundant knowledge from the latent spaces is seamlessly refined and fused into a common Hamming space by a hashing attention mechanism, facilitating the discrimination of hash codes and the elimination of modalities' heterogeneity. Exhaustive experiments demonstrate the state-of-the-art performance of our proposed DMLSH on four mainstream cross-modal retrieval benchmarks.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Discriminative Latent Feature Space Learning for Cross-Modal Retrieval
    Tang, Xu
    Deng, Cheng
    Gao, Xinbo
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 507 - 510
  • [42] MedPrompt: Cross-modal Prompting for Multi-task Medical Image Translation
    Chen, Xuhang
    Luo, Shenghong
    Pun, Chi-Man
    Wang, Shuqiang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XIV, 2025, 15044 : 61 - 75
  • [43] Multi-Scale Correlation for Sequential Cross-modal Hashing Learning
    Ye, Zhaoda
    Peng, Yuxin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 852 - 860
  • [44] Multimodal sentiment analysis model based on multi-task learning and stacked cross-modal Transformer
    Chen Q.-H.
    Sun J.-J.
    Lou Y.-B.
    Fang Z.-J.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (12): : 2421 - 2429
  • [45] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
    Ming, Zuheng
    Burie, Jean-Christophe
    Luqman, Muhammad Muzzamil
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (1-2) : 33 - 48
  • [46] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
    Zuheng Ming
    Jean-Christophe Burie
    Muhammad Muzzamil Luqman
    International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 33 - 48
  • [47] Multi-Modal Medical Image Matching Based on Multi-Task Learning and Semantic-Enhanced Cross-Modal Retrieval
    Zhang, Yilin
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 2041 - 2049
  • [48] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [49] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [50] Latent Semantic Sparse Hashing for Cross-Modal Similarity Search
    Zhou, Jile
    Ding, Guiguang
    Guo, Yuchen
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 415 - 424