Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers
    Xiao, Yao
    Xiang, Hua
    Wang, Tongxi
    Wang, Yiju
    IEEE ACCESS, 2024, 12 : 134133 - 134143
  • [2] View Enhanced Jigsaw Puzzle for Self-Supervised Feature Learning in 3D Human Action Recognition
    You, Wei
    Wang, Xue
    IEEE ACCESS, 2022, 10 : 36385 - 36396
  • [3] View Enhanced Jigsaw Puzzle for Self-Supervised Feature Learning in 3D Human Action Recognition
    You, Wei
    Wang, Xue
    IEEE Access, 2022, 10 : 36385 - 36396
  • [4] Supervised and Self-Supervised Learning for Assembly Line Action Recognition
    Indris, Christopher
    Ibrahim, Fady
    Ibrahem, Hatem
    Bramesfeld, Gotz
    Huo, Jie
    Ahmad, Hafiz Mughees
    Hayat, Syed Khizer
    Wang, Guanghui
    JOURNAL OF IMAGING, 2025, 11 (01)
  • [5] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
  • [6] Contrastive Self-Supervised Learning for Skeleton Action Recognition
    Gao, Xuehao
    Yang, Yang
    Du, Shaoyi
    NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 51 - 61
  • [7] SPAct: Self-supervised Privacy Preservation for Action Recognition
    Dave, Ishan Rajendrakumar
    Chen, Chen
    Shah, Mubarak
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20132 - 20141
  • [8] Motion Guided Attention Learning for Self-Supervised 3D Human Action Recognition
    Yang, Yang
    Liu, Guangjun
    Gao, Xuehao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8623 - 8634
  • [9] Attention-guided mask learning for self-supervised 3D action recognition
    Zhang, Haoyuan
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (06) : 7487 - 7496
  • [10] How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition
    Ben Tanfous, Amor
    Zerroug, Aimen
    Linsley, Drew
    Serre, Thomas
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2888 - 2897