Enhancing Remote Sensing Visual Question Answering: A Mask-Based Dual-Stream Feature Mutual Attention Network

被引：2

作者：

Li, Yangyang ^{[1
]}

Ma, Yunfei ^{[1
]}

Liu, Guangyuan ^{[2
]}

Wei, Qiang ^{[1
]}

Chen, Yanqiao ^{[3
]}

Shang, Ronghua ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China

[2] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China

[3] 54th Res Inst China Elect Technol Grp Corp, Shijiazhuang 050081, Peoples R China

来源：

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2024年 / 21卷

关键词：

Feature extraction; Vectors; Task analysis; Question answering (information retrieval); Visualization; Remote sensing; Interference; Attention; dual-stream feature extraction; mask mechanism; visual question answering on remote sensing;

D O I：

10.1109/LGRS.2024.3389042

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers in different RSIs processing fields. The current methods face challenges in both fully using the global and local information of the image to interact with the question information and addressing the issue of interclass interference. To address these challenges, this letter proposes a remote sensing visual question answering (RSVQA) mask-based dual-stream feature mutual attention network (MADNet). First, the dual-stream feature extraction module of the image is used to obtain image features, and the deep and shallow layer feature encoding module is used to obtain question features. Second, the attention mechanism is introduced and combined with the pointwise multiplication method to use the dual-stream features that were extracted in the earlier step. Finally, an answer relevance modulation module based on a binary mask vector is implemented to filter out irrelevant answers. In the experiments, the performance of the proposed strategy is evaluated using two datasets collected by aerial and Sentinel-2 sensors. In our study, we propose a model that outperforms previous approaches, achieving a 6.89% increase in overall accuracy (OA) over the baseline. This enhancement is notable for its persistence, even when the training data are reduced by half, as evidenced by our experiments on the low-resolution (LR) dataset.

引用

页码：1 / 5

页数：5

共 50 条

[21] A lightweight network based on dual-stream feature fusion and dual-domain attention for white blood cells segmentation
Luo, Yang
Wang, Yingwei
Zhao, Yongda
Guan, Wei
Shi, Hanfeng
Fu, Chong
Jiang, Hongyang
FRONTIERS IN ONCOLOGY, 2023, 13
[22] Reverse Attention Dual-Stream Network for Extracting Laver Aquaculture Areas From GF-1 Remote Sensing Images
Cui, Binge
Zhao, Yanli
Yang, Mingkai
Huang, Ling
Lu, Yan
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 5271 - 5283
[23] Infrared image fault diagnosis based on dual-stream attention convolution network
Lu, Dong
Yang, Jing
Ming, Lyu
Zhang, Jie
ENGINEERING RESEARCH EXPRESS, 2024, 6 (02):
[24] Scale-guided Fusion Inference Network for Remote Sensing Visual Question Answering
Zhao E.-Y.
Song N.
Nie J.
Wang X.
Zheng C.-Y.
Wei Z.-Q.
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (05): : 2133 - 2149
[25] Text-Guided Dual-Branch Attention Network for Visual Question Answering
Li, Mengfei
Gu, Li
Ji, Yi
Liu, Chunping
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 750 - 760
[26] A Frequency Attention-Based Dual-Stream Network for Image Inpainting Forensics
Wang, Hongquan
Zhu, Xinshan
Ren, Chao
Zhang, Lan
Ma, Shugen
MATHEMATICS, 2023, 11 (12)
[27] Enhanced Dual-Stream Point Cloud Feature Extraction Network with Mask Improvement for Human Activity Recognition
Lin, Zhiying
Zhu, Chenliang
Deng, Peiwei
Gao, Zhibin
Lin, Hezhi
Huang, Lianfen
2024 13TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS, ICCCAS 2024, 2024, : 536 - 542
[28] A Video Action Recognition Method via Dual-Stream Feature Fusion Neural Network with Attention
Han, Jianmin
Li, Jie
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2024, 32 (04) : 673 - 694
[29] Transformer guidance dual-stream network for salient object detection in optical remote sensing images
Zhang, Yi
Guo, Jichang
Yue, Huihui
Yin, Xiangjun
Zheng, Sida
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 17733 - 17747
[30] Dual-stream shadow detection network: biologically inspired shadow detection for remote sensing images
Li, Dawei
Wang, Sifan
Xiang, Shiyu
Li, Jinsheng
Yang, Yanping
Tang, Xue-Song
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 10039 - 10049

← 1 2 3 4 5 →