Video-based spatio-temporal scene graph generation with efficient self-supervision tasks

被引:0
|
作者
Lianggangxu Chen
Yiqing Cai
Changhong Lu
Changbo Wang
Gaoqi He
机构
[1] Chongqing Institute of East China Normal University,Chongqing Key Laboratory of Precision Optics
[2] East China Normal University,School of Computer Science and Technology
[3] East China Normal University,School of Mathematical Sciences
来源
关键词
Spatio-temporal scene graphs generation; Self-supervision; Local relation-aware attention;
D O I
暂无
中图分类号
学科分类号
摘要
Spatio-temporal Scene Graphs Generation (STSGG) aims to extract a sequence of graph-based semantic representations for high-level visual tasks. Existing works often fail to exploit the strong temporal correlation and the details of local features, which leads to the inability to distinguish the action between dynamic relation (e.g., drinking) and static relation (e.g., holding). Furthermore, due to bad long-tailed bias, the prediction results are troubled by inaccurate tail predicates classifications. To address these issues, a slowfast local-aware attention (SFLA) Network is proposed for temporal modeling in STSGG. First, a two-branch network is used to extract static and dynamic relation features respectively. Second, a local relation-aware attention (LRA) module is proposed to attach higher importance to the crucial elements in the local relationship. Third, three novel self-supervision prediction tasks are proposed, that is, spatial location, human attention state, and distance variation. Such self-supervision tasks are trained simultaneously with the main model to alleviate the long-tailed bias problem and enhance feature discrimination. Systematic experiments show that our method achieves state-of-the-art performance in the recently proposed Action Genome (AG) dataset and the popular ImageNet Video dataset.
引用
收藏
页码:38947 / 38966
页数:19
相关论文
共 50 条
  • [1] Video-based spatio-temporal scene graph generation with efficient self-supervision tasks
    Chen, Lianggangxu
    Cai, Yiqing
    Lu, Changhong
    Wang, Changbo
    He, Gaoqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38947 - 38966
  • [2] Meta Spatio-Temporal Debiasing for Video Scene Graph Generation
    Xu, Li
    Qu, Haoxuan
    Kuen, Jason
    Gu, Jiuxiang
    Liu, Jun
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 374 - 390
  • [3] Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
    Yuan, Liangzhe
    Qian, Rui
    Cui, Yin
    Gong, Boqing
    Schroff, Florian
    Yang, Ming-Hsuan
    Adam, Hartwig
    Liu, Ting
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13957 - 13966
  • [4] Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection
    Hegde, Deepti
    Lohit, Suhas
    Peng, Kuan-Chuan
    Jones, Michael J.
    Patel, Vishal M.
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 475 - 491
  • [5] Spatio-Temporal Self-supervision for Few-Shot Action Recognition
    Yu, Wanchuan
    Guo, Hanyu
    Yan, Yan
    Li, Jie
    Wang, Hanzi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 84 - 96
  • [6] Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
    Chu, Mengyu
    Xie, You
    Mayer, Jonas
    Leal-Taix, Laura
    Thuerey, Nils
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04):
  • [7] Spatio-temporal keypoints for video-based face recognition
    Franco, A.
    Maio, D.
    Turroni, F.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 489 - 494
  • [8] A Spatio-Temporal Attentive Network for Video-Based Crowd Counting
    Avvenuti, Marco
    Bongiovanni, Marco
    Ciampi, Luca
    Falchi, Fabrizio
    Gennaro, Claudio
    Messina, Nicola
    2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
  • [9] Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling
    Zhao, Yu
    Fei, Hao
    Cao, Yixin
    Li, Bobo
    Zhang, Meishan
    Wei, Jianguo
    Zhang, Min
    Chua, Tat-Seng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5281 - 5291
  • [10] Time Is MattEr: Temporal Self-supervision for Video Transformers
    Yun, Sukmin
    Kim, Jaehyung
    Han, Dongyoon
    Song, Hwanjun
    Ha, Jung-Woo
    Shin, Jinwoo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,