Transformer networks with adaptive inference for scene graph generation

被引:1
|
作者
Wang, Yini [1 ]
Gao, Yongbin [1 ]
Yu, Wenjun [1 ]
Guo, Ruyan [1 ]
Wan, Weibing [1 ]
Yang, Shuqun [1 ]
Huang, Bo [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;
D O I
10.1007/s10489-022-04022-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.
引用
收藏
页码:9621 / 9633
页数:13
相关论文
共 50 条
  • [21] Attention redirection transformer with semantic oriented learning for unbiased scene graph generation
    Zhang, Ruonan
    An, Gaoyun
    Cen, Yigang
    Ruan, Qiuqi
    PATTERN RECOGNITION, 2025, 158
  • [22] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
    Pu, Tao
    Chen, Tianshui
    Wu, Hefeng
    Lu, Yongyi
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
  • [23] End-to-End Video Scene Graph Generation With Temporal Propagation Transformer
    Zhang, Yong
    Pan, Yingwei
    Yao, Ting
    Huang, Rui
    Mei, Tao
    Chen, Chang-Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1613 - 1625
  • [24] Transformer-based Scene Graph Generation Network With Relational Attention Module
    Yamamoto, Takuma
    Obinata, Yuya
    Nakayama, Osafumi
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2034 - 2041
  • [25] BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation
    Dhingra, Naina
    Ritter, Florian
    Kunz, Andreas
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2150 - 2159
  • [26] Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference
    Lu, Jiale
    Chen, Lianggangxu
    Song, Youqi
    Lin, Shaohui
    Wang, Changbo
    He, Gaoqi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4877 - 4885
  • [27] CAPTIONING TRANSFORMER WITH SCENE GRAPH GUIDING
    Chen, Haishun
    Wang, Ying
    Yang, Xin
    Li, Jie
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2538 - 2542
  • [28] Type-adaptive graph Transformer for heterogeneous information networks
    Tang, Yuxin
    Huang, Yanzhe
    Hou, Jingyi
    Liu, Zhijie
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11496 - 11509
  • [29] Transformer-Based Graph Neural Networks for Outfit Generation
    Becattini, Federico
    Teotini, Federico Maria
    Bimbo, Alberto Del
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (01) : 213 - 223
  • [30] Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
    Lyu, Xinyu
    Gao, Lianli
    Zeng, Pengpeng
    Shen, Heng Tao
    Song, Jingkuan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13921 - 13940