Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning

被引:8
|
作者
Meng, Lingwu [1 ]
Wang, Jing [2 ]
Yang, Yang [1 ]
Xiao, Liang [1 ,3 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Nanjing Univ Sci & Technol, Minist Educ, Key Lab Intelligent Percept & Syst High Dimens In, Nanjing 210094, Peoples R China
基金
中国博士后科学基金;
关键词
Feature extraction; Transformers; Remote sensing; Task analysis; Iron; Decoding; Convolutional neural networks; Image captioning; prior knowledge; remote sensing; transformer; NETWORK;
D O I
10.1109/TGRS.2023.3328181
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Remote sensing image (RSI) captioning aims to generate meaningful and grammatically accurate sentences for RSIs. However, in comparison to natural image captioning, RSI captioning encounters additional challenges due to the unique characteristics of RSIs. The first challenge arises from the abundance of objects present in these images. As the number of objects increases, it becomes increasingly difficult to determine the main focus of the description. Moreover, the objects in RSIs often share similar appearances, which further complicates the generation of accurate descriptions. To overcome these challenges, we propose a prior knowledge-guided transformer (PKG-Transformer) for RSI captioning. First, scene-level and object-level features are extracted in a multilevel feature extraction (MFE) module. To further refine and enhance the extracted multilevel features, we introduce a feature enhancement (FE) module. This module utilizes a combination of graph neural networks and attention mechanisms to capture the correlation and difference between different objects or scene regions. Moreover, we propose a prior knowledge augmented attention (PKA) mechanism to select the objects that are more relevant to the scene regions by establishing the relationships between them. This attention mechanism is seamlessly integrated into the transformer structure, providing valuable prior knowledge that promotes the caption generation process. Extensive experiments on three RSI captioning datasets verify the superiority of the proposed method. Compared with the baseline methods, the proposed method achieves more impressive performance. The code will be publicly available at https://github.com/One-paper-luck/PKG-Transformer
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [1] Region-guided transformer for remote sensing image captioning
    Zhao, Kai
    Xiong, Wei
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [2] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    REMOTE SENSING, 2022, 14 (12)
  • [3] Cooperative Connection Transformer for Remote Sensing Image Captioning
    Zhao, Kai
    Xiong, Wei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [4] Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
    Kandala, Hitesh
    Saha, Sudipan
    Banerjee, Biplab
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [5] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [6] A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Meng, Ran
    Yang, Yang
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 15
  • [7] Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer
    Zhuang, Shuo
    Wang, Ping
    Wang, Gang
    Wang, Di
    Chen, Jinyong
    Gao, Feng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] A Lightweight Sparse Focus Transformer for Remote Sensing Image Change Captioning
    Sun, Dongwei
    Bao, Yajie
    Liu, Junmin
    Cao, Xiangyong
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 18727 - 18738
  • [9] Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
    Liu, Chenyang
    Zhao, Rui
    Shi, Zhenwei
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [10] From Plane to Hierarchy: Deformable Transformer for Remote Sensing Image Captioning
    Du, Runyan
    Cao, Wei
    Zhang, Wenkai
    Zhi, Guo
    Sun, Xian
    Li, Shuoke
    Li, Jihao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7704 - 7717