Hierarchical cross-modal contextual attention network for visual grounding

被引：0

作者：

Xin Xu

Gang Lv

Yining Sun

Yuxia Hu

Fudong Nian

机构：

[1] Hefei University,School of Advanced Manufacturing Engineering

[2] Hefei Comprehensive National Science Center,Institute of Artificial Intelligence

[3] Anhui Jianzhu University,Anhui International Joint Research Center for Ancient Architecture Intellisencing and Multi

[4] University of Science and Technology of China,Dimensional Modeling

[5] Chinese Academy of Sciences,School of Information Science and Technology

来源：

Multimedia Systems | 2023年 / 29卷

关键词：

Visual grounding; Transformer; Multi-modal attention; Deep learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper explores the task of visual grounding (VG), which aims to localize regions of an image through sentence queries. The development of VG has significantly advanced with Transformer-based frameworks, which can capture image and text contexts without proposals. However, previous research has rarely explored hierarchical semantics and cross-interactions between two uni-modal encoders. Therefore, this paper proposes a Hierarchical Cross-modal Contextual Attention Network (HCCAN) for the VG task. The HCCAN model utilizes a visual-guided text contextual attention module, a text-guided visual contextual attention module, and a Transformer-based multi-modal feature fusion module. This approach not only captures intra-modality and inter-modality relationships through self-attention mechanisms but also captures the hierarchical semantics of textual and visual content in a common space. Experiments conducted on four standard benchmarks, including Flickr30K Entities and RefCOCO, RefCOCO+, RefCOCOg, demonstrate the effectiveness of the proposed method. The code is publicly available at https://www.github.com/cutexin66/HCCAN.

引用

页码：2073 / 2083

页数：10

共 50 条

[21] Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval
Ji, Zhong
Lin, Zhigang
Wang, Haoran
Pang, Yanwei
Li, Xuelong
PATTERN RECOGNITION, 2024, 151
[22] Asymmetric cross-modal attention network with multimodal augmented mixup for medical visual question answering
Li, Yong
Yang, Qihao
Wang, Fu Lee
Lee, Lap-Kei
Qu, Yingying
Hao, Tianyong
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 144
[23] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
Wu, Hongchang
Guan, Ziyu
Zhi, Tao
zhao, Wei
Xu, Cai
Han, Hong
Yang, Yarning
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272
[24] BCAN: Bidirectional Correct Attention Network for Cross-Modal Retrieval
Liu, Yang
Liu, Hong
Wang, Huaqiu
Meng, Fanyang
Liu, Mengyuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14247 - 14258
[25] Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
Ilinykh, Nikolai
Dobnik, Simon
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 4062 - 4073
[26] Iterative graph attention memory network for cross-modal retrieval
Dong, Xinfeng
Zhang, Huaxiang
Dong, Xiao
Lu, Xu
KNOWLEDGE-BASED SYSTEMS, 2021, 226
[27] Heterogeneous Attention Network for Effective and Efficient Cross-modal Retrieval
Yu, Tan
Yang, Yi
Li, Yi
Liu, Lin
Fei, Hongliang
Li, Ping
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1146 - 1156
[28] Visual question answering with attention transfer and a cross-modal gating mechanism
Li, Wei
Sun, Jianhui
Liu, Ge
Zhao, Linglan
Fang, Xiangzhong
PATTERN RECOGNITION LETTERS, 2020, 133 (133) : 334 - 340
[29] Cross-modal attention guided visual reasoning for referring image segmentation
Zhang, Wenjing
Hu, Mengnan
Tan, Quange
Zhou, Qianli
Wang, Rong
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
[30] Cross-modal attention guided visual reasoning for referring image segmentation
Wenjing Zhang
Mengnan Hu
Quange Tan
Qianli Zhou
Rong Wang
Multimedia Tools and Applications, 2023, 82 : 28853 - 28872

← 1 2 3 4 5 →