Cross-media Multi-level Alignment with Relation Attention Network

被引:0
|
作者
Qi, Jinwei [1 ]
Peng, Yuxin [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.
引用
收藏
页码:892 / 898
页数:7
相关论文
共 50 条
  • [41] Cross-Media Fine-Grained Representation Learning Based on Multi-modal Graph and Adversarial Hash Attention Network
    Liang M.
    Wang X.
    Du J.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (03): : 195 - 206
  • [42] Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling
    Zhou, Suping
    Jia, Jia
    Yin, Yufeng
    Li, Xiang
    Yao, Yang
    Zhang, Ying
    Ye, Zeyang
    Lei, Kehua
    Huang, Yan
    Shen, Jialie
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1322 - 1330
  • [43] Cross-media Image-Text Retrieval Based on Two-Level Network
    Li, Zhixin
    Ling, Feng
    Zhang, Fengqi
    Zhang, Canlong
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 211 - 222
  • [44] Cross-media search method based on complementary attention and generative adversarial network for social networks
    Shi, Lei
    Du, Junping
    Cheng, Gang
    Liu, Xia
    Xiong, Zenggang
    Luo, Jia
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (08) : 4393 - 4416
  • [45] A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition
    Zhang, Xiaoheng
    Cui, Weigang
    Hu, Bin
    Li, Yang
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1553 - 1566
  • [46] Multi-Level Attention Interactive Network for Cloud and Snow Detection Segmentation
    Ding, Li
    Xia, Min
    Lin, Haifeng
    Hu, Kai
    REMOTE SENSING, 2024, 16 (01)
  • [47] Multi-level Attention Feature Network for Few-shot Learning
    Wang R.
    Han M.
    Yang J.
    Xue L.
    Hu M.
    Yang, Juan (yangjuan@hfut.edu.cn), 1600, Science Press (42): : 772 - 778
  • [48] Multi-level Attention Feature Network for Few-shot Learning
    Wang Ronggui
    Han Mengya
    Yang Juan
    Xue Lixia
    Hu Min
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (03) : 772 - 778
  • [49] MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis
    Xu, Liwen
    Wang, Zhengtao
    Wu, Bin
    Lui, Simon
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9469 - 9478
  • [50] Attention Guided Multi-level Feedback Network for Camouflage Object Detection
    Tang, Qiuyan
    Ye, Jialin
    Chen, Fukang
    Yuan, Xia
    PATTERN RECOGNITION, ACPR 2021, PT I, 2022, 13188 : 226 - 239