Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Self-supervised 3D Skeleton Action Representation Learning with Motion Consistency and Continuity
    Su, Yukun
    Lin, Guosheng
    Wu, Qingyao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13308 - 13318
  • [32] Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition
    Chang, Yanan
    Wang, Shangfei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20385 - 20394
  • [33] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
    Lin, Lilang
    Zhang, Jiahang
    Liu, Jiaying
    Proceedings - IEEE International Symposium on Circuits and Systems, 2023, 2023-May
  • [34] Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-Supervised Action Recognition
    Guo, Tianyu
    Liu, Hong
    Chen, Zhan
    Liu, Mengyuan
    Wang, Tao
    Ding, Runwei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 762 - 770
  • [35] Self-Supervised Video Pose Representation Learning for Occlusion-Robust Action Recognition
    Yang, Di
    Wang, Yaohui
    Dantcheva, Antitza
    Garattoni, Lorenzo
    Francesca, Gianpiero
    Bremond, Francois
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [36] ATOM: Self-supervised human action recognition using atomic motion representation learning
    Degardin, Bruno
    Lopes, Vasco
    Proenca, Hugo
    IMAGE AND VISION COMPUTING, 2023, 137
  • [37] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
    Hu, Jinhua
    Hou, Yonghong
    Guo, Zihui
    Gao, Jiajun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
  • [38] Modeling the Relative Visual Tempo for Self-supervised Skeleton-based Action Recognition
    Zhu, Yisheng
    Han, Hu
    Yu, Zhengtao
    Liu, Guangcan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13867 - 13876
  • [39] SOS! Self-supervised Learning over Sets of Handled Objects in Egocentric Action Recognition
    Escorcia, Victor
    Guerrero, Ricardo
    Zhu, Xiatian
    Martinez, Brais
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 604 - 620
  • [40] Bayesian Contrastive Learning with Manifold Regularization for Self-Supervised Skeleton Based Action Recognition
    Lin, Lilang
    Zhang, Jiahang
    Liu, Jiaying
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,