Video-based spatio-temporal scene graph generation with efficient self-supervision tasks

被引:0
|
作者
Lianggangxu Chen
Yiqing Cai
Changhong Lu
Changbo Wang
Gaoqi He
机构
[1] Chongqing Institute of East China Normal University,Chongqing Key Laboratory of Precision Optics
[2] East China Normal University,School of Computer Science and Technology
[3] East China Normal University,School of Mathematical Sciences
来源
关键词
Spatio-temporal scene graphs generation; Self-supervision; Local relation-aware attention;
D O I
暂无
中图分类号
学科分类号
摘要
Spatio-temporal Scene Graphs Generation (STSGG) aims to extract a sequence of graph-based semantic representations for high-level visual tasks. Existing works often fail to exploit the strong temporal correlation and the details of local features, which leads to the inability to distinguish the action between dynamic relation (e.g., drinking) and static relation (e.g., holding). Furthermore, due to bad long-tailed bias, the prediction results are troubled by inaccurate tail predicates classifications. To address these issues, a slowfast local-aware attention (SFLA) Network is proposed for temporal modeling in STSGG. First, a two-branch network is used to extract static and dynamic relation features respectively. Second, a local relation-aware attention (LRA) module is proposed to attach higher importance to the crucial elements in the local relationship. Third, three novel self-supervision prediction tasks are proposed, that is, spatial location, human attention state, and distance variation. Such self-supervision tasks are trained simultaneously with the main model to alleviate the long-tailed bias problem and enhance feature discrimination. Systematic experiments show that our method achieves state-of-the-art performance in the recently proposed Action Genome (AG) dataset and the popular ImageNet Video dataset.
引用
收藏
页码:38947 / 38966
页数:19
相关论文
共 50 条
  • [31] Random Generation of a Locally Consistent Spatio-Temporal Graph
    Leborgne, Aurelie
    Kirandjiska, Marija
    Le Ber, Florence
    GRAPH-BASED REPRESENTATION AND REASONING (ICCS 2021), 2021, 12879 : 155 - 169
  • [32] Learning dual disentangled representation with self-supervision for temporal knowledge graph reasoning
    Xiao, Yao
    Zhou, Guangyou
    Xie, Zhiwen
    Liu, Jin
    Huang, Jimmy Xiangji
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
  • [33] Video Synopsis Generation Using Spatio-Temporal Groups
    Ahmed, A.
    Kar, S.
    Dogra, D. P.
    Patnaik, R.
    Lee, S.
    Choi, H.
    Kim, I.
    2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING APPLICATIONS (ICSIPA), 2017, : 512 - 517
  • [34] Video Generation for High Spatio-temporal Resolution Imaging
    Imagawa, T.
    Azuma, T.
    Nobori, K.
    Motomura, H.
    2009 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 2009, : 151 - 152
  • [35] ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
    Wang, Yaohui
    Bilinski, Piotr
    Bremond, Francois
    Dantcheva, Antitza
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1149 - 1158
  • [36] Scene Spatio-Temporal Graph Convolutional Network for Pedestrian Intention Estimation
    Naik, Abhilash Y.
    Bighashdel, Ariyan
    Jancura, Pavol
    Dubbelman, Gijs
    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, : 874 - 881
  • [37] Video Anomaly Detection via self-supervised and spatio-temporal proxy tasks learning
    Yang, Qingyang
    Wang, Chuanxu
    Liu, Peng
    Jiang, Zitai
    Li, Jiajiong
    PATTERN RECOGNITION, 2025, 158
  • [38] Efficient probabilistic spatio-temporal video object segmentation
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 807 - +
  • [39] An efficient approach for video retrieval by spatio-temporal features
    Kumar, G. S. Naveen
    Reddy, V. S. K.
    INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2019, 23 (04) : 311 - 316
  • [40] A self-supervised spatio-temporal attention network for video-based 3D infant pose estimation
    Yin, Wang
    Chen, Linxi
    Huang, Xinrui
    Huang, Chunling
    Wang, Zhaohong
    Bian, Yang
    Wan, You
    Zhou, Yuan
    Han, Tongyan
    Yi, Ming
    MEDICAL IMAGE ANALYSIS, 2024, 96