Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning

被引：0

作者：

Wang, Qi ^{[1
]}

Yang, Zhigang ^{[1
]}

Ni, Weiping ^{[2
]}

Wu, Junzheng ^{[2
]}

Li, Qiang ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China

[2] Northwest Inst Nucl Technol, Dept Remote Sensing, Xian 710072, Peoples R China

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2024年 / 62卷

基金：

中国国家自然科学基金;

关键词：

Rivers; Buildings; Sports; Roads; Green buildings; Diamonds; Semantics; Feeds; Boats; Logic gates; Attention mechanism; cross view; image captioning; remote sensing; ATTENTION; MODELS;

D O I：

10.1109/TGRS.2024.3502805

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Image captioning is a fundamental vision-language task with wide-ranging applications in daily life. The existing methods often struggle to accurately interpret the semantic information in remote sensing images due to the complexity of backgrounds. Target region masks can effectively reflect the shape characteristics of targets and their potential interrelationships. Therefore, incorporating and fully integrating these features can significantly improve the quality of generated captions. However, researchers are hindered by the lack of relevant datasets that contain corresponding object masks. It is natural to ask the following: how to efficiently introduce and utilize object masks? In this article, we provide potential target masks for the publicly available remote sensing image caption (RSIC) datasets, enabling models to utilize the regional features of targets for RSIC. Meanwhile, a novel RSIC algorithm is proposed that combines regional positional features with fine-grained semantic information, abbreviated as S-2 CPNet. To effectively capture the semantic information from image and position relationship from mask, respectively, the semantic and spatial feature enhancement submodules are introduced at the ends of encoder branches, respectively. Furthermore, the cross-view feature fusion module is designed to integrate regional features and semantic information efficiently. Then, a target recognition decoder is developed to enhance the ability of model to identify and describe critical targets in images. Finally, we improve the caption generation decoder by adaptively merging textual information with visual features to generate more accurate descriptions. Our model achieves satisfactory results on three RSIC datasets compared with the existing method. The related datasets and code will be open-sourced in https://github.com/CVer-Yang/SSCPNet .

引用

页数：12

共 50 条

[41] Exploring region features in remote sensing image captioning
Zhao, Kai
Xiong, Wei
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 127
[42] Cooperative Connection Transformer for Remote Sensing Image Captioning
Zhao, Kai
Xiong, Wei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[43] Context Prior based Semantic-Spatial Graph Network for Human Parsingq
Hao, Huaqing
Liu, Weibin
Xing, Weiwei
NEUROCOMPUTING, 2021, 457 : 13 - 25
[44] Semantic-spatial guided context propagation network for camouflaged object detection
Ren, Junchao
Zhang, Qiao
Kang, Bingbing
Zhong, Yuxi
He, Min
Ge, Yanliang
Bi, Hongbo
APPLIED INTELLIGENCE, 2025, 55 (05)
[45] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
Wang, Qi
Huang, Wei
Zhang, Xueting
Li, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922
[46] Cascade Semantic Prompt Alignment Network for Image Captioning
Li, Jingyu
Zhang, Lei
Zhang, Kun
Hu, Bo
Xie, Hongtao
Mao, Zhendong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5266 - 5281
[47] High-Order Semantic Decoupling Network for Remote Sensing Image Semantic Segmentation
Zheng, Chengyu
Nie, Jie
Wang, Zhaoxin
Song, Ning
Wang, Jingyu
Wei, Zhiqiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[48] Collaborative Learning Network for Change Detection and Semantic Segmentation of Remote Sensing Images
Zhu, Jiahang
Zhou, Yuan
Xu, Nuo
Huo, Chunlei
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
[49] Collaborative Network for Super-Resolution and Semantic Segmentation of Remote Sensing Images
Zhang, Qian
Yang, Guang
Zhang, Guixu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[50] AFNet: Adaptive Fusion Network for Remote Sensing Image Semantic Segmentation
Liu, Rui
Mi, Li
Chen, Zhenzhong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (09): : 7871 - 7886

← 1 2 3 4 5 →