Cross-media Multi-level Alignment with Relation Attention Network

被引：0

作者：

Qi, Jinwei ^{[1
]}

Peng, Yuxin ^{[1
]}

Yuan, Yuxin ^{[1
]}

机构：

[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2018年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid growth of multimedia data, such as image and text, it is a highly challenging problem to effectively correlate and retrieve the data of different media types. Naturally, when correlating an image with textual description, people focus on not only the alignment between discriminative image regions and key words, but also the relations lying in the visual and textual context. Relation understanding is essential for cross-media correlation learning, which is ignored by prior cross-media retrieval works. To address the above issue, we propose Cross-media Relation Attention Network (CRAN) with multi-level alignment. First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types. We aim to not only exploit cross-media fine-grained local information, but also capture the intrinsic relation information, which can provide complementary hints for correlation learning. Second, we propose cross-media multi-level alignment to explore global, local and relation alignments across different media types, which can mutually boost to learn more precise cross-media correlation. We conduct experiments on 2 cross-media datasets, and compare with 10 state-of-the-art methods to verify the effectiveness of proposed approach.

引用

页码：892 / 898

页数：7

共 50 条

[1] MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism
Peng, Yuxin
Qi, Jinwei
Zhuo, Yunkan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2728 - 2741
[2] Cross-media Hash Retrieval Using Multi-head Attention Network
Li, Zhixin
Ling, Feng
Xu, Chuansheng
Zhang, Canlong
Ma, Huifang
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1290 - 1297
[3] Multi-Level Alignment Network for Cross-Domain Ship Detection
Xu, Chujie
Zheng, Xiangtao
Lu, Xiaoqiang
REMOTE SENSING, 2022, 14 (10)
[4] MLAN: Multi-Level Attention Network
Qin, Peinuan
Wang, Qinxuan
Zhang, Yue
Wei, Xueyao
Gao, Meiguo
IEEE ACCESS, 2022, 10 : 105437 - 105446
[5] Recursive Pyramid Network with Joint Attention for Cross-Media Retrieval
Yuan, Yuxin
Peng, Yuxin
MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 405 - 416
[6] Visual Relation Detection with Multi-Level Attention
Zheng, Sipeng
Chen, Shizhe
Jin, Qin
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 121 - 129
[7] Cross-Media Alignment of Names and Faces
Pham, Phi The
Moens, Marie-Francine
Tuytelaars, Tinne
IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (01) : 13 - 27
[8] Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval
Dong, Jianfeng
Long, Zhongzi
Mao, Xiaofeng
Lin, Changting
He, Yuan
Ji, Shouling
NEUROCOMPUTING, 2021, 440 : 207 - 219
[9] Semantic enhancement and multi-level alignment network for cross-modal retrieval
Chen, Jia
Zhang, Hong
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (40) : 88221 - 88243
[10] Matching images and texts with multi-head attention network for cross-media hashing retrieval
Li, Zhixin
Xie, Xiumin
Ling, Feng
Ma, Huifang
Shi, Zhiping
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106

← 1 2 3 4 5 →