Learning consensus-aware semantic knowledge for remote sensing image captioning

被引：15

作者：

Li, Yunpeng ^{[1
]}

Zhang, Xiangrong ^{[1
]}

Cheng, Xina ^{[1
]}

Tang, Xu ^{[1
]}

Jiao, Licheng ^{[1
]}

机构：

[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 145卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal understanding; Visual-semantic interaction; Remote sensing image captioning; Graph convolutional network;

D O I：

10.1016/j.patcog.2023.109893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tremendous progresses have been made in remote sensing image captioning (RSIC) task in recent years, yet there still some unresolved problems: (1) facing the gap between the visual features and semantic concepts, (2) reasoning the higher-level relationships between semantic concepts. In this work, we focus on injecting high-level visual-semantic interaction into RSIC model. Firstly, the semantic concept extractor (SCE), end-to end trainable, precisely captures the semantic concepts contained in the RSIs. In particular, the visual-semantic co-attention (VSCA) is designed to grain coarse concept-related regions and region-related concepts for multi modal interaction. Furthermore, we incorporate the two types of attentive vectors with semantic-level relational features into a consensus exploitation (CE) block for learning cross-modal consensus-aware knowledge. The experiments on three benchmark data sets show the superiority of our approach compared with the reference methods.

引用

页数：12

共 50 条

[1] Vision-Enhanced and Consensus-Aware Transformer for Image Captioning
Cao, Shan
An, Gaoyun
Zheng, Zhenxing
Wang, Zhiyong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7005 - 7018
[2] Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
Li, Yunpeng
Zhang, Xiangrong
Gu, Jing
Li, Chen
Wang, Xin
Tang, Xu
Jiao, Licheng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[3] Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance
Zhu, Yongshuo
Li, Lu
Chen, Keyan
Liu, Chenyang
Zhou, Fugen
Shi, Zhenwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[4] Meta captioning: A meta learning based remote sensing image captioning framework
Yang, Qiaoqiao
Ni, Zihao
Ren, Peng
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
[5] Semantic-Aware Dense Representation Learning for Remote Sensing Image Change Detection
Chen, Hao
Li, Wenyuan
Chen, Song
Shi, Zhenwei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[6] Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning
Meng, Lingwu
Wang, Jing
Yang, Yang
Xiao, Liang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 13
[7] Multi-label semantic feature fusion for remote sensing image captioning
Wang, Shuang
Ye, Xiutiao
Gu, Yu
Wang, Jihui
Meng, Yun
Tian, Jingxian
Hou, Biao
Jiao, Licheng
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 : 1 - 18
[8] PROGRESSIVE SCALE-AWARE NETWORK FOR REMOTE SENSING IMAGE CHANGE CAPTIONING
Liu, Chenyang
Yang, Jiajun
Qi, Zipeng
Zou, Zhengxia
Shi, Zhenwei
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6668 - 6671
[9] Semantic-Spatial Collaborative Perception Network for Remote Sensing Image Captioning
Wang, Qi
Yang, Zhigang
Ni, Weiping
Wu, Junzheng
Li, Qiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[10] Cross-Modal Retrieval and Semantic Refinement for Remote Sensing Image Captioning
Li, Zhengxin
Zhao, Wenzhe
Du, Xuanyi
Zhou, Guangyao
Zhang, Songlin
REMOTE SENSING, 2024, 16 (01)

← 1 2 3 4 5 →