Cross-media Multi-level Alignment with Relation Attention Network

被引:0
|
作者
Qi, Jinwei [1 ]
Peng, Yuxin [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.
引用
收藏
页码:892 / 898
页数:7
相关论文
共 50 条
  • [21] Multi-level attention fusion network assisted by relative entropy alignment for multimodal speech emotion recognition
    Lei, Jianjun
    Wang, Jing
    Wang, Ying
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8478 - 8490
  • [22] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
    Nian, Fudong
    Ding, Ling
    Hu, Yuxia
    Gu, Yanhong
    MATHEMATICS, 2022, 10 (18)
  • [23] Multi-step Domain Adaption Image Classification Network via Attention Mechanism and Multi-level Feature Alignment
    Xiang, Yaoci
    Zhao, Chong
    Wei, Xing
    Lu, Yang
    Liu, Shaofan
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, WASA 2021, PT III, 2021, 12939 : 11 - 19
  • [24] Discrete Semantic Alignment Hashing for Cross-Media Retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (12) : 4896 - 4907
  • [25] Multi-level adversarial attention cross-modal hashing
    Wang, Benhui
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [26] Self-Attention based fine-grained cross-media hybrid network
    Shan, Wei
    Huang, Dan
    Wang, Jiangtao
    Zou, Feng
    Li, Suwen
    PATTERN RECOGNITION, 2022, 130
  • [27] Fine-grained Cross-media Representation Learning with Deep Quantization Attention Network
    Liang, Meiyu
    Du, Junping
    Liu, Wu
    Xue, Zhe
    Geng, Yue
    Yang, Congxian
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1313 - 1321
  • [28] Multi-level spatial attention network for image data segmentation
    Guo, Jun
    Jiang, Zhixiong
    Jiang, Dingchao
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2021, 14 (03) : 289 - 299
  • [29] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [30] Wavelet Multi-Level Attention Capsule Network for Texture Classification
    Tao, Zhiyong
    Wei, Tong
    Li, Jie
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1215 - 1219