Enhancing Remote Sensing Visual Question Answering: A Mask-Based Dual-Stream Feature Mutual Attention Network

被引:2
|
作者
Li, Yangyang [1 ]
Ma, Yunfei [1 ]
Liu, Guangyuan [2 ]
Wei, Qiang [1 ]
Chen, Yanqiao [3 ]
Shang, Ronghua [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
[2] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China
[3] 54th Res Inst China Elect Technol Grp Corp, Shijiazhuang 050081, Peoples R China
关键词
Feature extraction; Vectors; Task analysis; Question answering (information retrieval); Visualization; Remote sensing; Interference; Attention; dual-stream feature extraction; mask mechanism; visual question answering on remote sensing;
D O I
10.1109/LGRS.2024.3389042
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers in different RSIs processing fields. The current methods face challenges in both fully using the global and local information of the image to interact with the question information and addressing the issue of interclass interference. To address these challenges, this letter proposes a remote sensing visual question answering (RSVQA) mask-based dual-stream feature mutual attention network (MADNet). First, the dual-stream feature extraction module of the image is used to obtain image features, and the deep and shallow layer feature encoding module is used to obtain question features. Second, the attention mechanism is introduced and combined with the pointwise multiplication method to use the dual-stream features that were extracted in the earlier step. Finally, an answer relevance modulation module based on a binary mask vector is implemented to filter out irrelevant answers. In the experiments, the performance of the proposed strategy is evaluated using two datasets collected by aerial and Sentinel-2 sensors. In our study, we propose a model that outperforms previous approaches, achieving a 6.89% increase in overall accuracy (OA) over the baseline. This enhancement is notable for its persistence, even when the training data are reduced by half, as evidenced by our experiments on the low-resolution (LR) dataset.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [1] Mutual Attention Inception Network for Remote Sensing Visual Question Answering
    Zheng, Xiangtao
    Wang, Binqiang
    Du, Xingqian
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [2] Modular dual-stream visual fusion network for visual question answering
    Xue, Lixia
    Wang, Wenhao
    Wang, Ronggui
    Yang, Juan
    VISUAL COMPUTER, 2025, 41 (01): : 549 - 562
  • [3] Multi-scale dual-stream visual feature extraction and graph reasoning for visual question answering
    Yusuf, Abdulganiyu Abdu
    Feng, Chong
    Mao, Xianling
    Li, Xinyan
    Haruna, Yunusa
    Duma, Ramadhani Ally
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [4] DSAMR: Dual-Stream Attention Multi-hop Reasoning for knowledge-based visual question answering
    Sun, Yanhan
    Zhu, Zhenfang
    Zuo, Zicheng
    Li, Kefeng
    Gong, Shuai
    Qi, Jiangtao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [5] Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
    Shehzad, Faheem
    Minutolo, Aniello
    Esposito, Massimo
    IEEE Access, 2024, 12 : 195561 - 195574
  • [6] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [7] A multi-scale contextual attention network for remote sensing visual question answering
    Feng, Jiangfan
    Wang, Hui
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 126
  • [8] Boundary-Aware Feature Fusion With Dual-Stream Attention for Remote Sensing Small Object Detection
    Song, Jingnan
    Zhou, Mingliang
    Luo, Jun
    Pu, Huayan
    Feng, Yong
    Wei, Xuekai
    Jia, Weijia
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [9] Dual-Stream Feature Collaboration Perception Network for Salient Object Detection in Remote Sensing Images
    Li, Hongli
    Chen, Xuhui
    Mei, Liye
    Yang, Wei
    ELECTRONICS, 2024, 13 (18)
  • [10] Dual-stream VO: Visual Odometry Based on LSTM Dual-Stream Convolutional Neural Network
    Luo, Yuan
    Zeng, YongChao
    Lv, RunZhe
    Wang, WenHao
    ENGINEERING LETTERS, 2022, 30 (03) : 926 - 934