Enhancing Remote Sensing Visual Question Answering: A Mask-Based Dual-Stream Feature Mutual Attention Network

被引:2
|
作者
Li, Yangyang [1 ]
Ma, Yunfei [1 ]
Liu, Guangyuan [2 ]
Wei, Qiang [1 ]
Chen, Yanqiao [3 ]
Shang, Ronghua [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Sch Artificial Intelligence, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Peoples R China
[2] Chinese Acad Sci, Natl Space Sci Ctr, Beijing 100190, Peoples R China
[3] 54th Res Inst China Elect Technol Grp Corp, Shijiazhuang 050081, Peoples R China
关键词
Feature extraction; Vectors; Task analysis; Question answering (information retrieval); Visualization; Remote sensing; Interference; Attention; dual-stream feature extraction; mask mechanism; visual question answering on remote sensing;
D O I
10.1109/LGRS.2024.3389042
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The visual question answering (VQA) method applied to remote sensing images (RSIs) can complete the interaction of image information and text information, which avoids professional barriers in different RSIs processing fields. The current methods face challenges in both fully using the global and local information of the image to interact with the question information and addressing the issue of interclass interference. To address these challenges, this letter proposes a remote sensing visual question answering (RSVQA) mask-based dual-stream feature mutual attention network (MADNet). First, the dual-stream feature extraction module of the image is used to obtain image features, and the deep and shallow layer feature encoding module is used to obtain question features. Second, the attention mechanism is introduced and combined with the pointwise multiplication method to use the dual-stream features that were extracted in the earlier step. Finally, an answer relevance modulation module based on a binary mask vector is implemented to filter out irrelevant answers. In the experiments, the performance of the proposed strategy is evaluated using two datasets collected by aerial and Sentinel-2 sensors. In our study, we propose a model that outperforms previous approaches, achieving a 6.89% increase in overall accuracy (OA) over the baseline. This enhancement is notable for its persistence, even when the training data are reduced by half, as evidenced by our experiments on the low-resolution (LR) dataset.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [31] Dual-Stream Feature Extraction Network Based on CNN and Transformer for Building Extraction
    Xia, Liegang
    Mi, Shulin
    Zhang, Junxia
    Luo, Jiancheng
    Shen, Zhanfeng
    Cheng, Yubin
    REMOTE SENSING, 2023, 15 (10)
  • [32] Dual-Stream Input Gabor Convolution Network for Building Change Detection in Remote Sensing Images
    He, Fuyun
    Zeng, Xuqing
    Wu, Rongqing
    Hu, Jieyuan
    Bai, Qiuyi
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VI, ICIC 2024, 2024, 14867 : 3 - 14
  • [33] Boundary-Aware Dual-Stream Network for VHR Remote Sensing Images Semantic Segmentation
    Nong, Zhixian
    Su, Xin
    Liu, Yi
    Zhan, Zongqian
    Yuan, Qiangqiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 5260 - 5268
  • [34] Dual-stream shadow detection network: biologically inspired shadow detection for remote sensing images
    Dawei Li
    Sifan Wang
    Shiyu Xiang
    Jinsheng Li
    Yanping Yang
    Xue-Song Tang
    Neural Computing and Applications, 2022, 34 : 10039 - 10049
  • [35] Transformer guidance dual-stream network for salient object detection in optical remote sensing images
    Yi Zhang
    Jichang Guo
    Huihui Yue
    Xiangjun Yin
    Sida Zheng
    Neural Computing and Applications, 2023, 35 : 17733 - 17747
  • [36] Visual question answering model based on graph neural network and contextual attention
    Sharma, Himanshu
    Jalal, Anand Singh
    IMAGE AND VISION COMPUTING, 2021, 110
  • [37] Improving visual question answering for remote sensing via alternate-guided attention and combined loss
    Feng, Jiangfan
    Tang, Etao
    Zeng, Maimai
    Gu, Zhujun
    Kou, Pinglang
    Zheng, Wei
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2023, 122
  • [38] A New Deepfake Detection Method Based on Compound Scaling Dual-Stream Attention Network
    Wang, Shuya
    Du, Chenjun
    Chen, Yunfang
    EAI Endorsed Transactions on Pervasive Health and Technology, 2024, 10
  • [39] A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering
    Huang, Xiaofei
    Gong, Hongfang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (02) : 832 - 845
  • [40] Edge-Guided Dual-Stream Network for Plastic Greenhouse Extraction From Remote Sensing Image
    Zhang, Xiaoping
    Cheng, Bo
    Liang, Chenbin
    Wang, Guizhou
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62