Transformer networks with adaptive inference for scene graph generation

被引:1
|
作者
Wang, Yini [1 ]
Gao, Yongbin [1 ]
Yu, Wenjun [1 ]
Guo, Ruyan [1 ]
Wan, Weibing [1 ]
Yang, Shuqun [1 ]
Huang, Bo [1 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;
D O I
10.1007/s10489-022-04022-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.
引用
收藏
页码:9621 / 9633
页数:13
相关论文
共 50 条
  • [31] Graph Transformer Networks
    Yun, Seongjun
    Jeong, Minbyul
    Kim, Raehyun
    Kang, Jaewoo
    Kim, Hyunwoo J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [32] SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation
    Anh Duc Bui
    Han, Soyeon Caren
    Poon, Josiah
    AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 87 - 101
  • [33] SGFormer: Semantic Graph Transformer for Point Cloud-Based 3D Scene Graph Generation
    Lv, Changsheng
    Qi, Mengshi
    Li, Xia
    Yang, Zhengyuan
    Ma, Huadong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4035 - 4043
  • [34] Deep relational self-Attention networks for scene graph generation
    Li, Ping
    Yu, Zhou
    Zhan, Yibing
    PATTERN RECOGNITION LETTERS, 2022, 153 : 200 - 206
  • [35] Deep relational self-Attention networks for scene graph generation
    Li, Ping
    Yu, Zhou
    Zhan, Yibing
    Pattern Recognition Letters, 2022, 153 : 200 - 206
  • [36] Unconditional Scene Graph Generation
    Garg, Sarthak
    Dhamo, Helisa
    Farshad, Azade
    Musatian, Sabrina
    Navab, Nassir
    Tombari, Federico
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16342 - 16351
  • [37] Iterative Scene Graph Generation
    Khandelwal, Siddhesh
    Sigal, Leonid
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [38] Panoptic Scene Graph Generation
    Yang, Jingkang
    Ang, Yi Zhe
    Guo, Zujin
    Zhou, Kaiyang
    Zhang, Wayne
    Liu, Ziwei
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 178 - 196
  • [39] SFormer-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR
    Pei, Jialun
    Guo, Diandian
    Zhang, Jingyang
    Lin, Manxi
    Jin, Yueming
    Heng, Pheng-Ann
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 361 - 372
  • [40] SGT plus plus : Improved Scene Graph-Guided Transformer for Surgical Report Generation
    Lin, Chen
    Zhu, Zhenfeng
    Zhao, Yawei
    Zhang, Ying
    He, Kunlun
    Zhao, Yao
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (04) : 1337 - 1346