Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] A puzzle questions form training for self-supervised skeleton-based action recognition
    Moutik, Oumaima
    Sekkat, Hiba
    Tchakoucht, Taha Ait
    El Kari, Badr
    Alaoui, Ahmed El Hilali
    IMAGE AND VISION COMPUTING, 2024, 148
  • [42] Self-supervised temporal autoencoder for egocentric action segmentation
    Zhang, Mingming
    Liu, Dong
    Hu, Shizhe
    Yan, Xiaoqiang
    Sun, Zhongchuan
    Ye, Yangdong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [43] Learning Action Representations for Self-supervised Visual Exploration
    Oh, Changjae
    Cavallaro, Andrea
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 5873 - 5879
  • [44] Self-Supervised Learning of Action Affordances as Interaction Modes!
    Wang, Liquan
    Dvornik, Nikita
    Dubeau, Rafael
    Mittal, Mayank
    Garg, Animesh
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7279 - 7286
  • [45] Self-Supervised Learning for Semi-Supervised Temporal Action Proposal
    Wang, Xiang
    Zhang, Shiwei
    Qing, Zhiwu
    Shao, Yuanjie
    Gao, Changxin
    Sang, Nong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1905 - 1914
  • [46] Self-Supervised Sub-Action Parsing Network for Semi-Supervised Action Quality Assessment
    Gedamu, Kumie
    Ji, Yanli
    Yang, Yang
    Shao, Jie
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6057 - 6070
  • [47] Enhancing Face Recognition with Self-Supervised 3D Reconstruction
    He, Mingjie
    Zhang, Jie
    Shan, Shiguang
    Chen, Xilin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4052 - 4061
  • [48] 3-D ACTION
    BALDAZO, R
    BYTE, 1995, 20 (12): : 123 - &
  • [49] Cross-stream contrastive learning for self-supervised skeleton-based action recognition
    Li, Ding
    Tang, Yongqiang
    Zhang, Zhizhong
    Zhang, Wensheng
    IMAGE AND VISION COMPUTING, 2023, 135
  • [50] Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition
    Liu, Mengyuan
    Liu, Hong
    Guo, Tianyu
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2024, 54 (06) : 743 - 752