Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引:0
|
作者
Wang, Qi [1 ]
Yang, Zhigang [1 ]
Ni, Weiping [2 ]
Wu, Junzheng [2 ]
Li, Qiang [1 ]
机构
[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[2] Northwest Inst Nucl Technol, Dept Remote Sensing, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Rivers; Buildings; Sports; Roads; Green buildings; Diamonds; Semantics; Feeds; Boats; Logic gates; Attention mechanism; cross view; image captioning; remote sensing; ATTENTION; MODELS;
D O I
10.1109/TGRS.2024.3502805
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S-2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet .
引用
收藏
页数:12
相关论文
共 50 条
  • [31] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
    Hua, Wenyi
    Liu, Jia
    Liu, Fang
    Zhang, Wenhua
    An, Jiaqi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502
  • [32] Improved SegFormer Remote Sensing Image Semantic Segmentation Network
    Zhang, Hao
    He, Lingmin
    Pan, Chen
    Computer Engineering and Applications, 2023, 59 (24) : 248 - 258
  • [33] SPATIAL-SEMANTIC ATTENTION FOR GROUNDED IMAGE CAPTIONING
    Hu, Wenzhe
    Wang, Lanxiao
    Xu, Linfeng
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 61 - 65
  • [34] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
  • [35] MIGN: Multiscale Image Generation Network for Remote Sensing Image Semantic Segmentation
    Nie, Jie
    Wang, Chenglong
    Yu, Shusong
    Shi, Jinjin
    Lv, Xiaowei
    Wei, Zhiqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5601 - 5613
  • [36] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
    Cai, Chen
    Wang, Yi
    Yap, Kim-Hui
    REMOTE SENSING, 2023, 15 (23)
  • [37] Incorporating object counts into remote sensing image captioning
    Ni, Zihao
    Zong, Zhaoyun
    Ren, Peng
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [38] Multi-scale Attentive Fusion Network for Remote Sensing Image Change Captioning
    Chen, Cai
    Wang, Yi
    Yap, Kim-Hui
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [39] Human Communication-Inspired Semantic-View Collaborative Network for Multispectral Remote Sensing Image Retrieval
    Wu, Nan
    Jin, Wei
    Fu, Randi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 11230 - 11245
  • [40] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    REMOTE SENSING, 2022, 14 (12)