Transformer networks with adaptive inference for scene graph generation

被引：1

作者：

Wang, Yini ^{[1
]}

Gao, Yongbin ^{[1
]}

Yu, Wenjun ^{[1
]}

Guo, Ruyan ^{[1
]}

Wan, Weibing ^{[1
]}

Yang, Shuqun ^{[1
]}

Huang, Bo ^{[1
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Scene graph generation; Image-to-text translation; Visual relationship detection; Computer vision;

D O I：

10.1007/s10489-022-04022-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding a visual scene requires not only identifying single objects in isolation but also inferring the relationships and interactions between object pairs. In this study, we propose a novel scene graph generation framework based on Transformer to convert image data into linguistic descriptions characterized as nodes and edges of a graph describing the information of the given image. The proposed model consists of three components. First, we propose an enhanced object detection module with bidirectional long short-term memory (Bi-LSTM) for object-to-object information exchange to generate the classification probabilities for object bounding boxes and classes. Second, we introduce a novel context information capture module containing Transformer layers that outputs object categories containing object context as well as edge information for specific object pairs with context. Finally, since the relationship frequencies follow a long-tailed distribution, an adaptive inference module with a special feature fusion strategy is designed to soften the distribution and perform adaptive reasoning about relationship classification based on the visual appearance of object pairs. We have conducted detailed experiments on three popular open-source datasets, namely, Visual Genome, OpenImages, and Visual Relationship Detection, and have performed ablation experiments on each module, demonstrating significant improvements under different settings and in terms of various metrics.

引用

页码：9621 / 9633

页数：13

共 50 条

[21] Attention redirection transformer with semantic oriented learning for unbiased scene graph generation
Zhang, Ruonan
An, Gaoyun
Cen, Yigang
Ruan, Qiuqi
PATTERN RECOGNITION, 2025, 158
[22] Spatial–Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
Pu, Tao
Chen, Tianshui
Wu, Hefeng
Lu, Yongyi
Lin, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 556 - 568
[23] End-to-End Video Scene Graph Generation With Temporal Propagation Transformer
Zhang, Yong
Pan, Yingwei
Yao, Ting
Huang, Rui
Mei, Tao
Chen, Chang-Wen
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1613 - 1625
[24] Transformer-based Scene Graph Generation Network With Relational Attention Module
Yamamoto, Takuma
Obinata, Yuya
Nakayama, Osafumi
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2034 - 2041
[25] BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation
Dhingra, Naina
Ritter, Florian
Kunz, Andreas
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2150 - 2159
[26] Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference
Lu, Jiale
Chen, Lianggangxu
Song, Youqi
Lin, Shaohui
Wang, Changbo
He, Gaoqi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4877 - 4885
[27] CAPTIONING TRANSFORMER WITH SCENE GRAPH GUIDING
Chen, Haishun
Wang, Ying
Yang, Xin
Li, Jie
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2538 - 2542
[28] Type-adaptive graph Transformer for heterogeneous information networks
Tang, Yuxin
Huang, Yanzhe
Hou, Jingyi
Liu, Zhijie
APPLIED INTELLIGENCE, 2024, 54 (22) : 11496 - 11509
[29] Transformer-Based Graph Neural Networks for Outfit Generation
Becattini, Federico
Teotini, Federico Maria
Bimbo, Alberto Del
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (01) : 213 - 223
[30] Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
Lyu, Xinyu
Gao, Lianli
Zeng, Pengpeng
Shen, Heng Tao
Song, Jingkuan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13921 - 13940

← 1 2 3 4 5 →