Enhancing Remote Sensing Visual Question Answering: A Mask-Based Dual-Stream Feature Mutual Attention Network

被引:2
|
作者
Li, Yangyang [1 ]
Ma, Yunfei [1 ]
Liu, Guangyuan [2 ]
Wei, Qiang [1 ]
Chen, Yanqiao [3 ]
Shang, Ronghua [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
[2] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China
[3] 54th Res Inst China Elect Technol Grp Corp, Shijiazhuang 050081, Peoples R China
关键词
Feature extraction; Vectors; Task analysis; Question answering (information retrieval); Visualization; Remote sensing; Interference; Attention; dual-stream feature extraction; mask mechanism; visual question answering on remote sensing;
D O I
10.1109/LGRS.2024.3389042
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers in different RSIs processing fields. The current methods face challenges in both fully using the global and local information of the image to interact with the question information and addressing the issue of interclass interference. To address these challenges, this letter proposes a remote sensing visual question answering (RSVQA) mask-based dual-stream feature mutual attention network (MADNet). First, the dual-stream feature extraction module of the image is used to obtain image features, and the deep and shallow layer feature encoding module is used to obtain question features. Second, the attention mechanism is introduced and combined with the pointwise multiplication method to use the dual-stream features that were extracted in the earlier step. Finally, an answer relevance modulation module based on a binary mask vector is implemented to filter out irrelevant answers. In the experiments, the performance of the proposed strategy is evaluated using two datasets collected by aerial and Sentinel-2 sensors. In our study, we propose a model that outperforms previous approaches, achieving a 6.89% increase in overall accuracy (OA) over the baseline. This enhancement is notable for its persistence, even when the training data are reduced by half, as evidenced by our experiments on the low-resolution (LR) dataset.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [41] DSNet: dual-stream network for fine-grained ship classification in optical remote sensing images
    Yang, Shijun
    Zhang, Xiang
    Zhao, Wanqing
    Hu, Qiyao
    Luo, Hangzai
    Liu, Cheng
    Zhong, Sheng
    Peng, Jinye
    REMOTE SENSING LETTERS, 2024, 15 (08) : 792 - 804
  • [42] Enhanced salient object detection in remote sensing images via dual-stream semantic interactive network
    Ge, Yanliang
    Liang, Taichuan
    Ren, Junchao
    Chen, Jiaxue
    Bi, Hongbo
    VISUAL COMPUTER, 2024,
  • [43] Adaptive Dual-Stream Sparse Transformer Network for Salient Object Detection in Optical Remote Sensing Images
    Zhao, Jie
    Jia, Yun
    Ma, Lin
    Yu, Lidan
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 5173 - 5192
  • [44] Integrating category-related key regions with a dual-stream network for remote sensing scene classification
    Xiao, Fen
    Li, Xiang
    Li, Wei
    Shi, Junjie
    Zhang, Ningru
    Gao, Xieping
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [45] Enhanced salient object detection in remote sensing images via dual-stream semantic interactive network
    Ge, Yanliang
    Liang, Taichuan
    Ren, Junchao
    Chen, Jiaxue
    Bi, Hongbo
    Visual Computer, 2024,
  • [46] Dual-Stream Network of Vision Mamba and CNN with Auto-Scaling for Remote Sensing Image Segmentation
    Song, Shitao
    Liu, Ye
    Su, Jintao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT IV, 2025, 15034 : 62 - 75
  • [47] DTHNet: Dual-Stream Network Based on Transformer and High-Resolution Representation for Shadow Extraction from Remote Sensing Imagery
    Zhang, Shuang
    Cao, Yungang
    Sui, Baikai
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [48] Remote Sensing Image Change Detection Transformer Network Based on Dual-Feature Mixed Attention
    Song, Xinyang
    Hua, Zhen
    Li, Jinjiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [49] Pedestrian Behavior Recognition Based on Improved Dual-stream Network with Differential Feature in Surveillance Video
    Tan, Yonghong
    Zhou, Xuebin
    Chen, Aiwu
    Zhou, Songqing
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [50] OECA-Net: A co-attention network for visual question answering based on OCR scene text feature enhancement
    Feng Yan
    Wushouer Silamu
    Yachuang Chai
    Yanbing Li
    Multimedia Tools and Applications, 2024, 83 : 7085 - 7096