Cross-Modal Visual Question Answering for Remote Sensing Data

被引:1
|
作者
Felix, Rafael [1 ]
Repasky, Boris [1 ,2 ]
Hodge, Samuel [1 ]
Zolfaghari, Reza [3 ]
Abbasnejad, Ehsan [2 ]
Sherrah, Jamie [2 ]
机构
[1] Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Lockheed Martin Australia STELaRLab, Mawson Lakes, Australia
[3] Def Sci & Technol Grp, Canberra, ACT, Australia
关键词
Visual Question Answering; Deep learning; Natural Language Processing; Convolution Neural Networks; Recurrent Neural Networks; OpenStreetMap; CLASSIFICATION;
D O I
10.1109/DICTA52665.2021.9647287
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While querying of structured geo-spatial data such as Google Maps has become commonplace, there remains a wealth of unstructured information in overhead imagery that is largely inaccessible to users. This information can be made accessible using machine learning for Visual Question Answering (VQA) about remote sensing imagery. We propose a novel method for Earth observation based on answering natural language questions about satellite images that uses cross-modal attention between image objects and text. The image is encoded with an object-centric feature space, with self-attention between objects, and the question is encoded with a language transformer network. The image and question representations are fed to a crossmodal transformer network that uses cross-attention between the image and text modalities to generate the answer. Our method is applied to the RSVQA remote sensing dataset and achieves a significant accuracy increase over the previous benchmark.
引用
收藏
页码:57 / 65
页数:9
相关论文
共 50 条
  • [41] CAPTURING GLOBAL AND LOCAL INFORMATION IN REMOTE SENSING VISUAL QUESTION ANSWERING
    Guo, Yan
    Huang, Yuancheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6340 - 6343
  • [42] RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
    Wang, Yuduo
    Ghamisi, Pedram
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [43] Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering
    Liu, Gang
    He, Jinlong
    Li, Pengfei
    Zhao, Zixu
    Zhong, Shenjun
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 160
  • [44] Cross-Modal Compositional Learning for Multilabel Remote Sensing Image Classification
    Guo, Jie
    Jiao, Shuchang
    Sun, Hao
    Song, Bin
    Chi, Yuhao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 5810 - 5823
  • [45] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)
  • [46] Cross-Modal Adaptation for Object Detection in Infrared Remote Sensing Imagery
    Wang, Zeyu
    Li, Shuaiting
    Huang, Kejie
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22
  • [47] HGR MAXIMAL CORRELATION AUGMENTED CROSS-MODAL REMOTE SENSING RETRIEVAL
    Wang, Zhuoyue
    Wang, Xueqian
    Li, Gang
    Li, Chengxi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5053 - 5056
  • [48] RSMoDM: Multimodal Momentum Distillation Model for Remote Sensing Visual Question Answering
    Li, Pengfei
    Liu, Gang
    He, Jinlong
    Meng, Xiangxu
    Zhong, Shenjun
    Chen, Xun
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 16799 - 16814
  • [49] OPEN-ENDED VISUAL QUESTION ANSWERING MODEL FOR REMOTE SENSING IMAGES
    Alsaleh, Sara O.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Al Zuair, Mansour
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 2848 - 2851
  • [50] Language Query-Based Transformer With Multiscale Cross-Modal Alignment for Visual Grounding on Remote Sensing Images
    Lan, Meng
    Rong, Fu
    Jiao, Hongzan
    Gao, Zhi
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13