Modeling graph-structured contexts for image captioning

被引：15

作者：

Li, Zhixin ^{[1
]}

Wei, Jiahui ^{[1
]}

Huang, Feicheng ^{[1
]}

Ma, Huifang ^{[2
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 129卷

基金：

中国国家自然科学基金;

关键词：

Image captioning; Transformer; Scene graph; Reinforcement learning; Attention mechanism; ATTENTION;

D O I：

10.1016/j.imavis.2022.104591

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of image captioning has been significantly improved recently through deep neural network ar-chitectures combining with attention mechanisms and reinforcement learning optimization. Exploring visual re-lationships and interactions between different objects appearing in the image, however, is far from being investigated. In this paper, we present a novel approach that combines scene graphs with Transformer, which we call SGT, to explicitly encode available visual relationships between detected objects. Specifically, we pretrain an scene graph generation model to predict graph representations for images. After that, for each graph node, a Graph Convolutional Network (GCN) is employed to acquire relationship knowledge by aggregating the informa-tion of its local neighbors. As we train the captioning model, we feed the potential relation-aware information into the Transformer to generate descriptive sentence. Experiments on the MSCOCO dataset and the Flickr30k dataset validate the superiority of our SGT model, which can realize state-of-the-art results in terms of all the standard evaluation metrics.(c) 2022 Elsevier B.V. All rights reserved.

引用

页数：10

共 50 条

[1] A pedagogical view on software modeling and graph-structured diagrams
Tamai, Tetsuo
Software Engineering Education in the Modern Age, 2006, 4309 : 59 - 70
[2] MODELING LOCAL AND GLOBAL CONTEXTS FOR IMAGE CAPTIONING
Yao, Peng
Li, Jiangyun
Guo, Longteng
Liu, Jing
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[3] Graph-Structured Swin-Transformer for Learned Image Compression
Wang, Lilong
Shi, Yunhui
Wang, Jin
Yin, Baocai
Ling, Nam
2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 592 - 592
[4] Graph-Structured Visual Imitation
Sieb, Maximilian
Xian, Zhou
Huang, Audrey
Kroemer, Oliver
Fragkiadaki, Katerina
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[5] The case for graph-structured representations
Sanders, KE
Kettler, BP
Hendler, JA
CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 1997, 1266 : 245 - 254
[6] Querying graph-structured data
Cheng, Jiefeng
Yu, Jeffrey Xu
2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 23 - 27
[7] Reproductive value in graph-structured populations
Maciejewski, Wes
JOURNAL OF THEORETICAL BIOLOGY, 2014, 340 : 285 - 293
[8] Nested graph-structured representations for cases
Macedo, L
Cardoso, A
ADVANCES IN CASE-BASED REASONING, 1998, 1488 : 1 - 12
[9] A Graph-Structured Dataset for Wikipedia Research
Aspert, Nicolas
Miz, Volodymyr
Ricaud, Benjamin
Vandergheynst, Pierre
COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1188 - 1193
[10] Inference and Search on Graph-Structured Spaces
Wu C.M.
Schulz E.
Gershman S.J.
Computational Brain & Behavior, 2021, 4 (2) : 125 - 147

← 1 2 3 4 5 →