Learning consensus-aware semantic knowledge for remote sensing image captioning

被引:15
|
作者
Li, Yunpeng [1 ]
Zhang, Xiangrong [1 ]
Cheng, Xina [1 ]
Tang, Xu [1 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;
D O I
10.1016/j.patcog.2023.109893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Vision-Enhanced and Consensus-Aware Transformer for Image Captioning
    Cao, Shan
    An, Gaoyun
    Zheng, Zhenxing
    Wang, Zhiyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7005 - 7018
  • [2] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
    Li, Yunpeng
    Zhang, Xiangrong
    Gu, Jing
    Li, Chen
    Wang, Xin
    Tang, Xu
    Jiao, Licheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
    Zhu, Yongshuo
    Li, Lu
    Chen, Keyan
    Liu, Chenyang
    Zhou, Fugen
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [4] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
  • [5] Semantic-Aware Dense Representation Learning for Remote Sensing Image Change Detection
    Chen, Hao
    Li, Wenyuan
    Chen, Song
    Shi, Zhenwei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
    Meng, Lingwu
    Wang, Jing
    Yang, Yang
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
  • [7] Multi-label semantic feature fusion for remote sensing image captioning
    Wang, Shuang
    Ye, Xiutiao
    Gu, Yu
    Wang, Jihui
    Meng, Yun
    Tian, Jingxian
    Hou, Biao
    Jiao, Licheng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
  • [8] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
    Liu, Chenyang
    Yang, Jiajun
    Qi, Zipeng
    Zou, Zhengxia
    Shi, Zhenwei
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
  • [9] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
    Wang, Qi
    Yang, Zhigang
    Ni, Weiping
    Wu, Junzheng
    Li, Qiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [10] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
    Li, Zhengxin
    Zhao, Wenzhe
    Du, Xuanyi
    Zhou, Guangyao
    Zhang, Songlin
    REMOTE SENSING, 2024, 16 (01)