Transformer networks with adaptive inference for scene graph generation

被引:1
|
作者
Wang, Yini [1 ]
Gao, Yongbin [1 ]
Yu, Wenjun [1 ]
Guo, Ruyan [1 ]
Wan, Weibing [1 ]
Yang, Shuqun [1 ]
Huang, Bo [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;
D O I
10.1007/s10489-022-04022-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.
引用
收藏
页码:9621 / 9633
页数:13
相关论文
共 50 条
  • [1] Transformer networks with adaptive inference for scene graph generation
    Yini Wang
    Yongbin Gao
    Wenjun Yu
    Ruyan Guo
    Weibing Wan
    Shuqun Yang
    Bo Huang
    Applied Intelligence, 2023, 53 : 9621 - 9633
  • [2] Multimodal graph inference network for scene graph generation
    Jingwen Duan
    Weidong Min
    Deyu Lin
    Jianfeng Xu
    Xin Xiong
    Applied Intelligence, 2021, 51 : 8768 - 8783
  • [3] Multimodal graph inference network for scene graph generation
    Duan, Jingwen
    Min, Weidong
    Lin, Deyu
    Xu, Jianfeng
    Xiong, Xin
    APPLIED INTELLIGENCE, 2021, 51 (12) : 8768 - 8783
  • [4] RelTR: Relation Transformer for Scene Graph Generation
    Cong, Yuren
    Yang, Michael Ying
    Rosenhahn, Bodo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 11169 - 11183
  • [5] Vision Relation Transformer for Unbiased Scene Graph Generation
    Sudhakaran, Gopika
    Dhami, Devendra Singh
    Kersting, Kristian
    Roth, Stefan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21825 - 21836
  • [6] A Novel End-to-End Transformer for Scene Graph Generation
    Ren, Chengkai
    Liu, Xiuhua
    Cao, Mengyuan
    Zhang, Jian
    Wang, Hongwei
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
    Cong, Yuren
    Liao, Wentong
    Ackermann, Hanno
    Rosenhahn, Bodo
    Yang, Michael Ying
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16352 - 16362
  • [8] SGTR: End-to-end Scene Graph Generation with Transformer
    Li, Rongjie
    Zhang, Songyang
    He, Xuming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19464 - 19474
  • [9] Dynamic Scene Graph Generation via Temporal Prior Inference
    Wang, Shuang
    Gao, Lianli
    Lyu, Xinyu
    Guo, Yuyu
    Zeng, Pengpeng
    Song, Jingkuan
    MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia, 2022, : 5793 - 5801
  • [10] Dynamic Gated Graph Neural Networks for Scene Graph Generation
    Khademi, Mahmoud
    Schulte, Oliver
    COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 669 - 685