Transformer networks with adaptive inference for scene graph generation

被引：1

作者：

Wang, Yini ^{[1
]}

Gao, Yongbin ^{[1
]}

Yu, Wenjun ^{[1
]}

Guo, Ruyan ^{[1
]}

Wan, Weibing ^{[1
]}

Yang, Shuqun ^{[1
]}

Huang, Bo ^{[1
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;

D O I：

10.1007/s10489-022-04022-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.

引用

页码：9621 / 9633

页数：13

共 50 条

[41] Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation
Chen, Lianggangxu
Lu, Jiale
Song, Youqi
Wang, Changbo
He, Gaoqi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2888 - 2897
[42] Graph R-CNN for Scene Graph Generation
Yang, Jianwei
Lu, Jiasen
Lee, Stefan
Batra, Dhruv
Parikh, Devi
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 690 - 706
[43] A Graph-Transformer Network for Scene Text Detection
Wu, Yongrong
Lin, Jingyu
Chen, Houjin
Chen, Dinghao
Yang, Lvqing
Xiahou, Jianbing
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 680 - 690
[44] Scene Graph Generation: A comprehensive survey
Li, Hongsheng
Zhu, Guangming
Zhang, Liang
Jiang, Youliang
Dang, Yixuan
Hou, Haoran
Shen, Peiyi
Zhao, Xia
Shah, Syed Afaq Ali
Bennamoun, Mohammed
NEUROCOMPUTING, 2024, 566
[45] Relation Regularized Scene Graph Generation
Guo, Yuyu
Gao, Lianli
Song, Jingkuan
Wang, Peng
Sebe, Nicu
Shen, Heng Tao
Li, Xuelong
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (07) : 5961 - 5972
[46] Unbiased Scene Graph Generation in Videos
Nag, Sayak
Min, Kyle
Tripathi, Subama
Roy-Chowdhury, Amit K.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22803 - 22813
[47] Fully Convolutional Scene Graph Generation
Liu, Hengyue
Yan, Ning
Mortazavi, Masood
Bhanu, Bir
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11541 - 11551
[48] Review on scene graph generation methods
Monesh, S.
Senthilkumar, N. C.
MULTIAGENT AND GRID SYSTEMS, 2024, 20 (02) : 129 - 160
[49] Adversarial Attacks on Scene Graph Generation
Zhao, Mengnan
Zhang, Lihe
Wang, Wei
Kong, Yuqiu
Yin, Baocai
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 3210 - 3225
[50] Panoptic Video Scene Graph Generation
Yang, Jingkang
Peng, Wenxuan
Li, Xiangtai
Guo, Zujin
Chen, Liangyu
Li, Bo
Ma, Zheng
Zhou, Kaiyang
Zhang, Wayne
Loy, Chen Change
Liu, Ziwei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18675 - 18685

← 1 2 3 4 5 →