Modeling graph-structured contexts for image captioning

被引:15
|
作者
Li, Zhixin [1 ]
Wei, Jiahui [1 ]
Huang, Feicheng [1 ]
Ma, Huifang [2 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Transformer; Scene graph; Reinforcement learning; Attention mechanism; ATTENTION;
D O I
10.1016/j.imavis.2022.104591
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of image captioning has been significantly improved recently through deep neural network ar-chitectures combining with attention mechanisms and reinforcement learning optimization. Exploring visual re-lationships and interactions between different objects appearing in the image, however, is far from being investigated. In this paper, we present a novel approach that combines scene graphs with Transformer, which we call SGT, to explicitly encode available visual relationships between detected objects. Specifically, we pretrain an scene graph generation model to predict graph representations for images. After that, for each graph node, a Graph Convolutional Network (GCN) is employed to acquire relationship knowledge by aggregating the informa-tion of its local neighbors. As we train the captioning model, we feed the potential relation-aware information into the Transformer to generate descriptive sentence. Experiments on the MSCOCO dataset and the Flickr30k dataset validate the superiority of our SGT model, which can realize state-of-the-art results in terms of all the standard evaluation metrics.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] A pedagogical view on software modeling and graph-structured diagrams
    Tamai, Tetsuo
    Software Engineering Education in the Modern Age, 2006, 4309 : 59 - 70
  • [2] MODELING LOCAL AND GLOBAL CONTEXTS FOR IMAGE CAPTIONING
    Yao, Peng
    Li, Jiangyun
    Guo, Longteng
    Liu, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [3] Graph-Structured Swin-Transformer for Learned Image Compression
    Wang, Lilong
    Shi, Yunhui
    Wang, Jin
    Yin, Baocai
    Ling, Nam
    2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 592 - 592
  • [4] Graph-Structured Visual Imitation
    Sieb, Maximilian
    Xian, Zhou
    Huang, Audrey
    Kroemer, Oliver
    Fragkiadaki, Katerina
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [5] The case for graph-structured representations
    Sanders, KE
    Kettler, BP
    Hendler, JA
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, 1997, 1266 : 245 - 254
  • [6] Querying graph-structured data
    Cheng, Jiefeng
    Yu, Jeffrey Xu
    2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 23 - 27
  • [7] Reproductive value in graph-structured populations
    Maciejewski, Wes
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 340 : 285 - 293
  • [8] Nested graph-structured representations for cases
    Macedo, L
    Cardoso, A
    ADVANCES IN CASE-BASED REASONING, 1998, 1488 : 1 - 12
  • [9] A Graph-Structured Dataset for Wikipedia Research
    Aspert, Nicolas
    Miz, Volodymyr
    Ricaud, Benjamin
    Vandergheynst, Pierre
    COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2019 ), 2019, : 1188 - 1193
  • [10] Inference and Search on Graph-Structured Spaces
    Wu C.M.
    Schulz E.
    Gershman S.J.
    Computational Brain & Behavior, 2021, 4 (2) : 125 - 147