Deep cross-modal hashing with multi-task latent space learning

被引:2
|
作者
Wu, Song [1 ]
Yuan, Xiang [1 ]
Xiao, Guoqiang [1 ]
Lew, Michael S. [2 ]
Gao, Xinbo [3 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
[2] Leiden Univ, Liacs Media Lab, Leiden, Netherlands
[3] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing, Peoples R China
关键词
Cross-modal retrieval; Deep hashing; Semantic dependency; Knowledge transfer; BINARY-CODES; REPRESENTATION;
D O I
10.1016/j.engappai.2024.108944
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal Hashing (CMH) retrieval aims to mutually search data from heterogeneous modalities by projecting original modality data into a common hamming space, with the significant advantages of low storage and computing costs. However, CMH remains challenging for multi-label cross-modal datasets. Firstly, preserving content similarity would inevitably be deficient under the representation of short-length binary codes. Secondly, different semantics are treated independently, whereas their co-occurrences are neglected, reducing retrieval quality. Thirdly, the commonly used metric learning objective is ineffective in capturing similarity information at a fine-grained level, leading to the imprecise preservation of such information. Therefore, we propose a Deep Cross-Modal Hashing with Multi-Task Latent Space Learning (DMLSH) framework to tackle these bottlenecks. For a more thorough excavation of distinctive features with diverse characteristics underneath heterogeneous data, our DMLSH is designed to preserve three different types of knowledge. The first is the semantic relevance and co-occurrence with the integration of the attention module and the Long Short-Term Memory (LSTM) layer; The second is the highly precise pairwise correlation considering the quantification of semantic similarity with self-paced optimization; The last is the pairwise similarity information discovered by a self-supervised semantic network from a perspective of probabilistic knowledge transfer. Abundant knowledge from the latent spaces is seamlessly refined and fused into a common Hamming space by a hashing attention mechanism, facilitating the discrimination of hash codes and the elimination of modalities' heterogeneity. Exhaustive experiments demonstrate the state-of-the-art performance of our proposed DMLSH on four mainstream cross-modal retrieval benchmarks.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Simulating cross-modal medical images using multi-task adversarial learning of a deep convolutional neural network
    Kumar, Vikas
    Sharma, Manoj
    Jehadeesan, R.
    Venkatraman, B.
    Sheet, Debdoot
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (04)
  • [22] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Jin, Yi
    Li, Jie
    Lang, Congyan
    Ruan, Qiuqi
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (03) : 905 - 920
  • [23] Multi-task clustering ELM for VIS-NIR cross-modal feature learning
    Yi Jin
    Jie Li
    Congyan Lang
    Qiuqi Ruan
    Multidimensional Systems and Signal Processing, 2017, 28 : 905 - 920
  • [24] Deep Cross-Modal Hashing With Hashing Functions and Unified Hash Codes Jointly Learning
    Tu, Rong-Cheng
    Mao, Xian-Ling
    Ma, Bing
    Hu, Yong
    Yan, Tan
    Wei, Wei
    Huang, Heyan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 560 - 572
  • [25] Discrete Latent Factor Model for Cross-Modal Hashing
    Jiang, Qing-Yuan
    Li, Wu-Jun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) : 3490 - 3501
  • [26] Deep adversarial multi-label cross-modal hashing algorithm
    Xiaohan Yang
    Zhen Wang
    Wenhao Liu
    Xinyi Chang
    Nannan Wu
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [27] Deep Multi-Level Semantic Hashing for Cross-Modal Retrieval
    Ji, Zhenyan
    Yao, Weina
    Wei, Wei
    Song, Houbing
    Pi, Huaiyu
    IEEE ACCESS, 2019, 7 : 23667 - 23674
  • [28] Deep medical cross-modal attention hashing
    Zhang, Yong
    Ou, Weihua
    Shi, Yufeng
    Deng, Jiaxin
    You, Xinge
    Wang, Anzhi
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1519 - 1536
  • [29] Unsupervised Deep Fusion Cross-modal Hashing
    Huang, Jiaming
    Min, Chen
    Jing, Liping
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 358 - 366
  • [30] Deep Binary Reconstruction for Cross-Modal Hashing
    Hu, Di
    Nie, Feiping
    Li, Xuelong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) : 973 - 985