Structured Neural Motifs: Scene Graph Parsing via Enhanced Context

被引：3

作者：

Li, Yiming ^{[1
,4
]}

Yang, Xiaoshan ^{[2
,3
,4
]}

Xu, Changsheng ^{[1
,2
,3
,4
]}

机构：

[1] HeFei Univ Technol, Hefei, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Beijing, Peoples R China

[4] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2020), PT II | 2020年 / 11962卷

基金：

中国国家自然科学基金;

关键词：

Scene graph; Deep learning; LSTMs;

D O I：

10.1007/978-3-030-37734-2_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene graph is one kind of structured representation of the visual content in an image. It is helpful for complex visual understanding tasks such as image captioning, visual question answering and semantic image retrieval. Since the real-world images always have multiple object instances and complex relationships, the context information is extremely important for scene graph generation. It has been noted that the context dependencies among different nodes in the scene graph are asymmetric, which meas it is highly possible to directly predict relationship labels based on object labels but not vice-versa. Based on this finding, the existing motifs network has successfully exploited the context patterns among object nodes and the dependencies between the object nodes and the relation nodes. However, the spatial information and the context dependencies among relation nodes are neglected. In this work, we propose Structured Motif Network (StrcMN) which predicts object labels and pairwise relationships by mining more complete global context features. The experiments show that our model significantly outperforms previous methods on the VRD and Visual Genome datasets.

引用

页码：175 / 188

页数：14

共 50 条

[1] Neural Motifs: Scene Graph Parsing with Global Context
Zellers, Rowan
Yatskar, Mark
Thomson, Sam
Choi, Yejin
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5831 - 5840
[2] Learning to transfer focus of graph neural network for scene graph parsing
Jiang, Junjie
He, Zaixing
Zhang, Shuyou
Zhao, Xinyue
Tan, Jianrong
PATTERN RECOGNITION, 2021, 112
[3] Re:PolyWorld - A Graph Neural Network for Polygonal Scene Parsing
Zorzi, Stefano
Fraundorfer, Friedrich
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16716 - 16725
[4] Structured-NeRF: Hierarchical Scene Graph with Neural Representation
Zhong, Zhide
Cao, Jiakai
Gu, Songen
Xie, Sirui
Luo, Liyi
Zhao, Hao
Zhou, Guyue
Li, Haoang
Yang, Zike
COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 184 - 201
[5] Adaptive Context Network for Scene Parsing
Fu, Jun
Liu, Jing
Wang, Yuhang
Li, Yong
Bao, Yongjun
Tang, Jinhui
Lu, Hanqing
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6747 - 6756
[6] Scene Parsing with Global Context Embedding
Hung, Wei-Chih
Tsai, Yi-Hsuan
Shen, Xiaohui
Lin, Zhe
Sunkavalli, Kalyan
Lu, Xin
Yang, Ming-Hsuan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2650 - 2658
[7] Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
Fan, Heng
Chu, Peng
Latecki, Longin Jan
Ling, Haibin
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1816 - 1825
[8] Graphical Contrastive Losses for Scene Graph Parsing
Zhang, Ji
Shih, Kevin J.
Elgammal, Ahmed
Tao, Andrew
Catanzaro, Bryan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11527 - 11535
[9] Interaction via Bi-directional Graph of Semantic Region Affinity for Scene Parsing
Ding, Henghui
Zhang, Hui
Liu, Jun
Li, Jiaxin
Feng, Zijian
Jiang, Xudong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15828 - 15838
[10] Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
Mi, Jinpeng
Lyu, Jianzhi
Tang, Song
Li, Qingdu
Zhang, Jianwei
FRONTIERS IN NEUROROBOTICS, 2020, 14

← 1 2 3 4 5 →