Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引：0

作者：

Zhao, An ^{[1
]}

Yang, Wenzhong ^{[1
,2
]}

Chen, Danny ^{[1
]}

Wei, Fuyuan ^{[1
]}

机构：

[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China

[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China

来源：

ELECTRONICS | 2024年 / 13卷 / 18期

基金：

中国国家自然科学基金;

关键词：

remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;

D O I：

10.3390/electronics13183605

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.

引用

页数：17

共 50 条

[41] Spatiotemporal Remote-Sensing Image Fusion With Patch-Group Compressed Sensing
Li, Lei
Liu, Peng
Wu, Jie
Wang, Lizhe
He, Guojin
IEEE ACCESS, 2020, 8 (08): : 209199 - 209211
[42] Deep Hash Remote-Sensing Image Retrieval Assisted by Semantic Cues
Liu, Pingping
Liu, Zetong
Shan, Xue
Zhou, Qiuzhan
REMOTE SENSING, 2022, 14 (24)
[43] Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning
Huang, Wei
Wang, Qi
Li, Xuelong
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) : 436 - 440
[44] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
Cai, Chen
Wang, Yi
Yap, Kim-Hui
REMOTE SENSING, 2023, 15 (23)
[45] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
Ren, Zihao
Gou, Shuiping
Guo, Zhang
Mao, Shasha
Li, Ruimin
REMOTE SENSING, 2022, 14 (12)
[46] Captioning Remote Sensing Images Using Transformer Architecture
Nanal, Wrucha
Hajiarbabi, Mohammadreza
2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
[47] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
Yuan, Zhenghang
Li, Xuelong
Wang, Qi
IEEE ACCESS, 2020, 8 (08): : 2608 - 2620
[48] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
Hua, Wenyi
Liu, Jia
Liu, Fang
Zhang, Wenhua
An, Jiaqi
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502
[49] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
Fan, Lili
Zhou, Yu
Liu, Hongmei
Li, Yunjie
Cao, Dongpu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
[50] Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation
He, Xin
Zhou, Yong
Zhao, Jiaqi
Zhang, Di
Yao, Rui
Xue, Yong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

← 1 2 3 4 5 →