Enhanced Transformer for Remote-Sensing Image Captioning with Positional-Channel Semantic Fusion

被引:0
|
作者
Zhao, An [1 ]
Yang, Wenzhong [1 ,2 ]
Chen, Danny [1 ]
Wei, Fuyuan [1 ]
机构
[1] Xinjiang Univ, Sch Comp Sci & Technol, Urumqi 830017, Peoples R China
[2] Xinjiang Univ, Xinjiang Key Lab Multilingual Informat Technol, Urumqi 830017, Peoples R China
基金
中国国家自然科学基金;
关键词
remote-sensing image captioning; semantic information and relationship; spatial and channel dependencies; semantic fusion;
D O I
10.3390/electronics13183605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Remote-sensing image captioning (RSIC) aims to generate descriptive sentences for ages by capturing both local and global semantic information. This task is challenging due to the diverse object types and varying scenes in ages. To address these challenges, we propose a positional-channel semantic fusion transformer (PCSFTr). The PCSFTr model employs scene classification to initially extract visual features and learn semantic information. A novel positional-channel multi-headed self-attention (PCMSA) block captures spatial and channel dependencies simultaneously, enriching the semantic information. The feature fusion (FF) module further enhances the understanding of semantic relationships. Experimental results show that PCSFTr significantly outperforms existing methods. Specifically, the BLEU-4 index reached 78.42% in UCM-caption, 54.42% in RSICD, and 69.01% in NWPU-captions. This research provides new insights into RSIC by offering a more comprehensive understanding of semantic information and relationships within images and improving the performance of image captioning models.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Spatiotemporal Remote-Sensing Image Fusion With Patch-Group Compressed Sensing
    Li, Lei
    Liu, Peng
    Wu, Jie
    Wang, Lizhe
    He, Guojin
    IEEE ACCESS, 2020, 8 (08): : 209199 - 209211
  • [42] Deep Hash Remote-Sensing Image Retrieval Assisted by Semantic Cues
    Liu, Pingping
    Liu, Zetong
    Shan, Xue
    Zhou, Qiuzhan
    REMOTE SENSING, 2022, 14 (24)
  • [43] Denoising-Based Multiscale Feature Fusion for Remote Sensing Image Captioning
    Huang, Wei
    Wang, Qi
    Li, Xuelong
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2021, 18 (03) : 436 - 440
  • [44] Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning
    Cai, Chen
    Wang, Yi
    Yap, Kim-Hui
    REMOTE SENSING, 2023, 15 (23)
  • [45] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    REMOTE SENSING, 2022, 14 (12)
  • [46] Captioning Remote Sensing Images Using Transformer Architecture
    Nanal, Wrucha
    Hajiarbabi, Mohammadreza
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 413 - 418
  • [47] Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
    Yuan, Zhenghang
    Li, Xuelong
    Wang, Qi
    IEEE ACCESS, 2020, 8 (08): : 2608 - 2620
  • [48] STAIR FUSION NETWORK FOR REMOTE SENSING IMAGE SEMANTIC SEGMENTATION
    Hua, Wenyi
    Liu, Jia
    Liu, Fang
    Zhang, Wenhua
    An, Jiaqi
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5499 - 5502
  • [49] Combining Swin Transformer With UNet for Remote Sensing Image Semantic Segmentation
    Fan, Lili
    Zhou, Yu
    Liu, Hongmei
    Li, Yunjie
    Cao, Dongpu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 11
  • [50] Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation
    He, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Zhang, Di
    Yao, Rui
    Xue, Yong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60