Cross-media Multi-level Alignment with Relation Attention Network

被引:0
|
作者
Qi, Jinwei [1 ]
Peng, Yuxin [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.
引用
收藏
页码:892 / 898
页数:7
相关论文
共 50 条
  • [1] MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism
    Peng, Yuxin
    Qi, Jinwei
    Zhuo, Yunkan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2728 - 2741
  • [2] Cross-media Hash Retrieval Using Multi-head Attention Network
    Li, Zhixin
    Ling, Feng
    Xu, Chuansheng
    Zhang, Canlong
    Ma, Huifang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1290 - 1297
  • [3] Multi-Level Alignment Network for Cross-Domain Ship Detection
    Xu, Chujie
    Zheng, Xiangtao
    Lu, Xiaoqiang
    REMOTE SENSING, 2022, 14 (10)
  • [4] MLAN: Multi-Level Attention Network
    Qin, Peinuan
    Wang, Qinxuan
    Zhang, Yue
    Wei, Xueyao
    Gao, Meiguo
    IEEE ACCESS, 2022, 10 : 105437 - 105446
  • [5] Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval
    Yuan, Yuxin
    Peng, Yuxin
    MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 405 - 416
  • [6] Visual Relation Detection with Multi-Level Attention
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 121 - 129
  • [7] Cross-Media Alignment of Names and Faces
    Pham, Phi The
    Moens, Marie-Francine
    Tuytelaars, Tinne
    IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (01) : 13 - 27
  • [8] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
    Dong, Jianfeng
    Long, Zhongzi
    Mao, Xiaofeng
    Lin, Changting
    He, Yuan
    Ji, Shouling
    NEUROCOMPUTING, 2021, 440 : 207 - 219
  • [9] Semantic enhancement and multi-level alignment network for cross-modal retrieval
    Chen, Jia
    Zhang, Hong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (40) : 88221 - 88243
  • [10] Matching images and texts with multi-head attention network for cross-media hashing retrieval
    Li, Zhixin
    Xie, Xiumin
    Ling, Feng
    Ma, Huifang
    Shi, Zhiping
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106