Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

被引:0
|
作者
Ye, Kenan [1 ,2 ]
Zhao, Brian Nlong [3 ]
Liang, Shuang [1 ,2 ]
Yao, Han [1 ,2 ]
Jia, Wenzhen [1 ,2 ]
机构
[1] Tongji Univ, Sch Comp Sci & Technol, Shanghai 200092, Peoples R China
[2] Engn Res Ctr, Minist Educ, Key Software Technol Smart City Percept & Planning, Shanghai 201804, Peoples R China
[3] Univ Southern Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Skeleton; Joints; Three-dimensional displays; Videos; Semantics; Contrastive learning; Symbols; Kernel; Encoding; Attention mechanisms; self-supervised learning; skeleton-based action recognition;
D O I
10.1109/TCSS.2024.3525083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
3-D action recognition become a fast-pacing field in recent years. However, traditional approaches have limitations. They either focus on modeling overly detailed yet redundant information by reconstructing the coordinates of each body joint. Alternatively, they treat actions as a whole, overlooking the spatial and temporal variations in the semantic locality of actions. To address these limitations, we propose representing long-term actions as contexts of short-term actions organized by locality-aware graphs. In our framework, we take the inspiration that the continuity of motion and pose variations generate higher correlations. These correlations occur among spatio-temporally adjacent joints. Built upon this, we craft short-term actions as embeddings using spatio-temporal graph convolutions. This graph-based encoding not only captures richer high-level semantics but also maintains an awareness of the topology. To capture long-term action dynamics effectively, we integrate a graph convolutional gated recurrent unit (GraphGRU) for the fusion of action embeddings. Additionally, we introduce the context-aware topological attention (CTA) mechanism. Positioned between embedding encoding and context aggregation phases, CTA amplifies the features of context-relevant nodes. Lastly, we create self-supervision by contrasting predicted embeddings with actual encoded embeddings. This approach explicitly learns changes in dynamics to obtain distinct embeddings. Empirical evaluations demonstrate that our approach outperforms mainstream unsupervised 3-D action recognition methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] SELF-SUPERVISED CONTRASTIVE LEARNING FOR AUDIO-VISUAL ACTION RECOGNITION
    Liu, Yang
    Tan, Ying
    Lan, Haoyuan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1000 - 1004
  • [22] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Hu, Yongjian
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524
  • [23] Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
    Su, Yukun
    Lin, Guosheng
    Sun, Ruizhou
    Hao, Yun
    Wu, Qingyao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 769 - 778
  • [24] Improving self-supervised action recognition from extremely augmented skeleton sequences
    Guo, Tianyu
    Liu, Mengyuan
    Liu, Hong
    Wang, Guoquan
    Li, Wenhao
    PATTERN RECOGNITION, 2024, 150
  • [25] Joint facial action unit recognition and self-supervised optical flow estimation
    Shao, Zhiwen
    Zhou, Yong
    Li, Feiran
    Zhu, Hancheng
    Liu, Bing
    PATTERN RECOGNITION LETTERS, 2024, 181 : 70 - 76
  • [26] Self-Supervised Learning via Multi-Transformation Classification for Action Recognition
    Duc-Quang Vu
    Ngan Le
    Wang, Jia-Ching
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [27] Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
    Planamente, Mirco
    Bottino, Andrea
    Caputo, Barbara
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8751 - 8758
  • [28] Data-Efficient Masked Video Modeling for Self-supervised Action Recognition
    Li, Qiankun
    Huang, Xiaolong
    Wan, Zhifan
    Hu, Lanqing
    Wu, Shuzhe
    Zhang, Jie
    Shan, Shiguang
    Wang, Zengfu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2723 - 2733
  • [29] Self-supervised Learning for Unintentional Action Prediction
    Zatsarynna, Olga
    Abu Farha, Yazan
    Gall, Juergen
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 429 - 444
  • [30] Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition
    Yan, Jingwei
    Wang, Jingjing
    Li, Qiang
    Wang, Chunmao
    Pu, Shiliang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1038 - 1046