Deep Multi-Modal Hashing With Semantic Enhancement for Multi-Label Micro-Video Retrieval

被引:1
|
作者
Jing, Peiguang [1 ]
Sun, Haoyi [2 ]
Nie, Liqiang [3 ]
Li, Yun [4 ,5 ]
Su, Yuting [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Future Technol, Tianjin 300072, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[4] Guangxi Univ Finance & Econ, Sch Big Data & Artificial Intelligence, Guangxi 530001, Peoples R China
[5] Guangxi Key Lab Big Data Finance & Econ, Nanning 530001, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Hash functions; Encoding; Representation learning; Convolutional neural networks; Quantization (signal); Kernel; Deep hashing; micro-video retrieval; multi-label; multi-modality; MAXIMUM-LIKELIHOOD; QUANTIZATION;
D O I
10.1109/TKDE.2023.3337077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pressing need for low storage and high efficiency has significantly propelled the advancement of deep hashing techniques in the realm of large-scale search and retrieval tasks. As one of the most prevailing forms of user-generated contents, micro-videos usually represent more complicated multi-modal behaviors that are further challenged in multi-label retrieval. Existing multi-modal hashing methods tend to prioritize the complementarity and consistency in multi-modal fusion, while neglecting the completeness problem. In this paper, we propose a deep multi-modal hashing with semantic enhancement (DMHSE) method that effectively integrates complete multi-modal representation learning with discriminative binary coding by means of collaboration between two distinct encoders, FoldCoder and HashCoder. FoldCoder translates latent multi-modal representation learning to a degradation process through mimicking data transmitting. Further, it incorporates a prompt learning paradigm to maximize the utilization of multi-label semantics for guiding representation learning. HashCoder combines pairwise and central constraints to ensure more discriminative hashing results. Pairwise constraint preserves the original local relevance structure, while central constraint tackles the problem of semantic ambiguity in multi-label data by leveraging the global label distribution. Experimental results demonstrate that DMHSE achieves superior performance in multi-label micro-video retrieval tasks.
引用
收藏
页码:5080 / 5091
页数:12
相关论文
共 50 条
  • [1] Micro-video multi-label classification method based on multi-modal feature encoding
    Jing P.
    Li Y.
    Su Y.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (04): : 109 - 117
  • [2] Deep robust multilevel semantic hashing for multi-label cross-modal retrieval
    Song, Ge
    Tan, Xiaoyang
    Zhao, Jun
    Yang, Ming
    PATTERN RECOGNITION, 2021, 120
  • [3] Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition
    Guo, Jie
    Nie, Xiushan
    Yin, Yilong
    IEEE ACCESS, 2020, 8 : 29518 - 29524
  • [4] Deep Matrix Factorization With Complementary Semantic Aggregation for Micro-Video Multi-Label Classification
    Jing, Peiguang
    Liu, Xiaoyu
    Wang, Xuehui
    Su, Yuting
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1685 - 1689
  • [5] Deep Semantic-Aware Proxy Hashing for Multi-Label Cross-Modal Retrieval
    Huo, Yadong
    Qin, Qibing
    Dai, Jiangyan
    Wang, Lei
    Zhang, Wenfeng
    Huang, Lei
    Wang, Chengduan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 576 - 589
  • [6] Predicting Micro-video Popularity via Multi-modal Retrieval Augmentation
    Zhong, Ting
    Lang, Jian
    Zhang, Yifan
    Cheng, Zhangtao
    Zhang, Kunpeng
    Zhou, Fan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2579 - 2583
  • [7] A deep low-rank semantic factorization method for micro-video multi-label classification
    Fan, Fugui
    Su, Yuting
    Liu, Yun
    Jing, Peiguang
    Qu, Kaihua
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [8] Deep self-enhancement hashing for robust multi-label cross-modal retrieval
    Song, Ge
    Su, Hanwen
    Huang, Kai
    Song, Fengyi
    Yang, Ming
    PATTERN RECOGNITION, 2024, 147
  • [9] Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval
    Zhao, Fang
    Huang, Yongzhen
    Wang, Liang
    Tan, Tieniu
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1556 - 1564
  • [10] Deep Multi-Label Hashing for Image Retrieval
    Zhong, Xian
    Li, Jiachen
    Huang, Wenxin
    Xie, Liang
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1245 - 1251