Video-based spatio-temporal scene graph generation with efficient self-supervision tasks

被引：0

作者：

Lianggangxu Chen

Yiqing Cai

Changhong Lu

Changbo Wang

Gaoqi He

机构：

[1] Chongqing Institute of East China Normal University,Chongqing Key Laboratory of Precision Optics

[2] East China Normal University,School of Computer Science and Technology

[3] East China Normal University,School of Mathematical Sciences

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Spatio-temporal scene graphs generation; Self-supervision; Local relation-aware attention;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Spatio-temporal Scene Graphs Generation (STSGG) aims to extract a sequence of graph-based semantic representations for high-level visual tasks. Existing works often fail to exploit the strong temporal correlation and the details of local features, which leads to the inability to distinguish the action between dynamic relation (e.g., drinking) and static relation (e.g., holding). Furthermore, due to bad long-tailed bias, the prediction results are troubled by inaccurate tail predicates classifications. To address these issues, a slowfast local-aware attention (SFLA) Network is proposed for temporal modeling in STSGG. First, a two-branch network is used to extract static and dynamic relation features respectively. Second, a local relation-aware attention (LRA) module is proposed to attach higher importance to the crucial elements in the local relationship. Third, three novel self-supervision prediction tasks are proposed, that is, spatial location, human attention state, and distance variation. Such self-supervision tasks are trained simultaneously with the main model to alleviate the long-tailed bias problem and enhance feature discrimination. Systematic experiments show that our method achieves state-of-the-art performance in the recently proposed Action Genome (AG) dataset and the popular ImageNet Video dataset.

引用

页码：38947 / 38966

页数：19

共 50 条

[21] Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information
Xu, Jinchang
Dong, Yuan
Ma, Lilei
Bai, Hongliang
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2833 - 2838
[22] Video-based salient object detection via spatio-temporal difference and coherence
Huang, Lei
Luo, Bin
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (09) : 10685 - 10699
[23] Exploring the Spatio-Temporal Aware Graph for video captioning
Xue, Ping
Zhou, Bing
IET COMPUTER VISION, 2022, 16 (05) : 456 - 467
[24] Exploring Spatio–Temporal Graph Convolution for Video-Based Human–Object Interaction Recognition
Wang, Ning
Zhu, Guangming
Li, Hongsheng
Feng, Mingtao
Zhao, Xia
Ni, Lan
Shen, Peiyi
Mei, Lin
Zhang, Liang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5814 - 5827
[25] Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Thoker, Fida Mohammad
Doughty, Hazel
Snoek, Cees G. M.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13766 - 13777
[26] Video-Based Pedestrian Re-Identification by Adaptive Spatio-Temporal Appearance Model
Zhang, Wei
Ma, Bingpeng
Liu, Kan
Huang, Rui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (04) : 2042 - 2054
[27] Spatio-Temporal Scene Analysis Based on Graph Algorithms to Determine Rigid and Articulated Objects
Kieneke, Stephan
Steffens, Markus
Aufderheide, Dominik
Krybus, Werner
Kohring, Christine
Morton, Danny
COMPUTER VISION/COMPUTER GRAPHICS COLLABORATION TECHNIQUES, PROCEEDINGS, 2009, 5496 : 254 - +
[28] Video action detection by learning graph-based spatio-temporal interactions
Tomei, Matteo
Baraldi, Lorenzo
Calderara, Simone
Bronzin, Simone
Cucchiara, Rita
COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
[29] Spatio-Temporal Graph-based Semantic Compositional Network for Video Captioning
Li, Shun
Zhang, Ze-Fan
Ji, Yi
Li, Ying
Liu, Chun-Ping
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[30] Video Segmentation Using Iterated Graph Cuts Based on Spatio-temporal Volumes
Nagahashi, Tomoyuki
Fujiyoshi, Hironobu
Kanade, Takeo
COMPUTER VISION - ACCV 2009, PT II, 2010, 5995 : 655 - +

← 1 2 3 4 5 →