Cross-media retrieval via fusing multi-modality and multi-grained data

被引:0
|
作者
Liu, Z. [1 ,2 ]
Yuan, S. [1 ,2 ]
Pei, X. [1 ,2 ]
Gao, S. [1 ,2 ]
Han, H. [1 ,2 ]
机构
[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Shandong, Peoples R China
[2] Shandong Univ Finance & Econ, Shandong Prov Key Lab Digital Media Technol, Jinan 250014, Shandong, Peoples R China
关键词
Cross-media retrieval; Multi-modality data; Multi-grained data; Multi-margin triplet loss; Margin-set;
D O I
10.24200/sci.2023.59834.6456
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Traditional cross-media retrieval methods mainly focus on coarse-grained data that reflect global characteristics while ignoring the fine-grained descriptions of local details. Meanwhile, traditional methods cannot accurately describe the correlations between the anchor and the irrelevant data. This paper aims to solve the abovementioned problems by proposing to fuse coarse-grained and fine-grained features and a multi-margin triplet loss based on a dual-framework. (1) Framework I: A multi-grained data fusion framework based on Deep Belief Network, and (2) Framework II: A multi-modality data fusion framework based on the multi-margin triplet loss function. In Framework I, the coarse-grained and fine-grained features fused by the joint Restricted Boltzmann Machine are input into Framework II. In Framework II, we innovatively propose the multi-margin triplet loss. The data, which belong to different modalities and semantic categories, are stepped away from the anchor in a multi-margin way. Experimental results show that the proposed method achieves better cross-media retrieval performance than other methods with different datasets. Furthermore, the ablation experiments verify that our proposed multi-grained fusion strategy and the multi-margin triplet loss function are effective. (c) 2023 Sharif University of Technology. All rights reserved.
引用
收藏
页码:1645 / 1669
页数:25
相关论文
共 50 条
  • [11] Learning to disentangle and fuse for fine-grained multi-modality ship image retrieval
    Xiong, Wei
    Xiong, Zhenyu
    Xu, Pingliang
    Cui, Yaqi
    Li, Haoran
    Huang, Linzhou
    Yang, Ruining
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [12] Multi-grained unsupervised evidence retrieval for question answering
    You, Hao
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (28): : 21247 - 21257
  • [13] Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval
    Zhang, Hong
    Liu, Yun
    Ma, Zhigang
    NEUROCOMPUTING, 2013, 119 : 10 - 16
  • [14] Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion
    Zhou, Yangming
    Yang, Yuzhou
    Ying, Qichao
    Qian, Zhenxing
    Zhang, Xinpeng
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 343 - 352
  • [15] Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders
    Dai, Zuozhuo
    Cheng, Kaihui
    Shao, Fangtao
    Dong, Zilong
    Zhu, Siyu
    PATTERN RECOGNITION, 2025, 159
  • [16] Semantic retrieval with enhanced matchmaking and multi-modality ontology
    Wang, Huan
    Chia, Liang-Tien
    Liu, Song
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 516 - 519
  • [17] Multi-modality deep forest for hand motion recognition via fusing sEMG and acceleration signals
    Fang, Yinfeng
    Lu, Huiqiao
    Liu, Han
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (04) : 1119 - 1131
  • [18] Multi-modality deep forest for hand motion recognition via fusing sEMG and acceleration signals
    Yinfeng Fang
    Huiqiao Lu
    Han Liu
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 1119 - 1131
  • [19] Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval
    Han, Ning
    Chen, Jingjing
    Zhang, Hao
    Wang, Huanwen
    Chen, Hao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [20] Image retrieval ++—web image retrieval with an enhanced multi-modality ontology
    Huan Wang
    Liang-Tien Chia
    Song Liu
    Multimedia Tools and Applications, 2008, 39 : 189 - 215