Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引：0

作者：

Wang, Qi ^{[1
]}

Yang, Zhigang ^{[1
]}

Ni, Weiping ^{[2
]}

Wu, Junzheng ^{[2
]}

Li, Qiang ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China

[2] Northwest Inst Nucl Technol, Dept Remote Sensing, Xian 710072, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Rivers; Buildings; Sports; Roads; Green buildings; Diamonds; Semantics; Feeds; Boats; Logic gates; Attention mechanism; cross view; image captioning; remote sensing; ATTENTION; MODELS;

D O I：

10.1109/TGRS.2024.3502805

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S-2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet .

引用

页数：12

共 50 条

[21] A Systematic Survey of Remote Sensing Image Captioning
Zhao, Beigeng
IEEE ACCESS, 2021, 9 : 154086 - 154111
[22] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
Liu, Chenyang
Yang, Jiajun
Qi, Zipeng
Zou, Zhengxia
Shi, Zhenwei
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
[23] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
Zhu, Yongshuo
Li, Lu
Chen, Keyan
Liu, Chenyang
Zhou, Fugen
Shi, Zhenwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[24] Dense semantic embedding network for image captioning
Xiao, Xinyu
Wang, Lingfeng
Ding, Kun
Xiang, Shiming
Pan, Chunhong
PATTERN RECOGNITION, 2019, 90 : 285 - 296
[25] A Context Semantic Auxiliary Network for Image Captioning
Li, Jianying
Shao, Xiangjun
INFORMATION, 2023, 14 (07)
[26] Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion
Zhao, An
Yang, Wenzhong
Chen, Danny
Wei, Fuyuan
ELECTRONICS, 2024, 13 (18)
[27] A progressive segmentation network for navigable areas with semantic-spatial information flow
Li, Wei
Liao, Muxin
Zou, Wenbin
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
[28] Remote sensing image semantic segmentation network based on ENet
Wang, Yiqin
JOURNAL OF ENGINEERING-JOE, 2022, 2022 (12): : 1219 - 1227
[29] Semantic Segmentation of Remote Sensing Image Based on Neural Network
Wang Ende
Qi Kai
Li Xuepeng
Peng Liangyu
ACTA OPTICA SINICA, 2019, 39 (12)
[30] Context Aggregation Network for Remote Sensing Image Semantic Segmentation
Zhang, Changxing
Bai, Xiangyu
Wang, Dapeng
Zhou, KeXin
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)

← 1 2 3 4 5 →