Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition

被引:0
|
作者
He, Zhiquan [1 ,2 ]
Lv, Jiantu [2 ]
Fang, Shizhang [2 ]
机构
[1] Guangdong Key Lab Intelligent Informat Proc, Shenzhen, Peoples R China
[2] Shenzhen Univ, Guangdong Multimedia Informat Serv Engn Technol Re, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Contrastive learning; Action recognition;
D O I
10.1016/j.neucom.2024.127495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition is one of the basic researches in computer vision. In recent years, the unsupervised contrastive learning paradigm has achieved great success in skeleton-based action recognition. However, previous work often treated input skeleton sequences as a whole when performing comparisons, lacking fine-grained representation contrast learning. Therefore, we propose a contrastive learning method for Representation Modeling with Multi-domain D ecoupling (RMMD), which extracts the most significant representations from input skeleton sequences in the temporal domain, spatial domain and frequency domain, respectively. Specifically, in the temporal and spatial domains, we propose a multi-level spatiotemporal mining reconstruction module (STMR) that iteratively reconstructs the original input skeleton sequences to highlight spatiotemporal representations under different actions. At the same time, we introduce position encoding and a global adaptive attention matrix, balancing both global and local information, and effectively modeling the spatiotemporal dependencies between joints. In the frequency domain, we use the discrete cosine transform (DCT) to achieve temporal-frequency conversion, discard part of the interference information, and use the frequency self-attention (FSA) and multi-level aggregation perceptron (MLAP) to deeply explore the frequency domain representation. The fusion of the temporal domain, spatial domain and frequency domain representations makes our model more discriminative in representing different actions. Besides, we verify the effectiveness of the model on the NTU RGB+D and PKU-MMD datasets. Extensive experiments show that our method outperforms existing unsupervised methods and achieves significant performance improvements in downstream tasks such as action recognition and action retrieval.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Enhanced decoupling graph convolution network for skeleton-based action recognition
    Gu, Yue
    Yu, Qiang
    Xue, Wanli
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (29) : 73289 - 73304
  • [22] A High Invariance Motion Representation for Skeleton-Based Action Recognition
    Guo, Songrui
    Pan, Huawei
    Tan, Guanghua
    Chen, Lin
    Gao, Chunming
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (08)
  • [23] Skeleton-based action recognition with extreme learning machines
    Chen, Xi
    Koskela, Markus
    NEUROCOMPUTING, 2015, 149 : 387 - 396
  • [24] Exploring incomplete decoupling modeling with window and cross-window mechanism for skeleton-based action recognition
    Li, Shengze
    Xiang, Xin
    Fang, Jihong
    Zhang, Jun
    Cheng, Songsong
    Wang, Ke
    KNOWLEDGE-BASED SYSTEMS, 2023, 281
  • [25] Unsupervised Representation Learning with Long-Term Dynamics for Skeleton Based Action Recognition
    Zheng, Nenggan
    Wen, Jun
    Liu, Risheng
    Long, Liangqu
    Dai, Jianhua
    Gong, Zhefeng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2644 - 2651
  • [26] Multi-Grained Temporal Segmentation Attention Modeling for Skeleton-Based Action Recognition
    Lv, Jinrong
    Gong, Xun
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 927 - 931
  • [27] Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Shu, Xiangbo
    Xu, Binqian
    Zhang, Liyan
    Tang, Jinhui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7559 - 7576
  • [28] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
    Wang, Peng
    Wen, Jun
    Si, Chenyang
    Qian, Yuntao
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
  • [29] InfoGCN plus plus : Learning Representation by Predicting the Future for Online Skeleton-Based Action Recognition
    Chi, Seunggeun
    Chi, Hyung-Gun
    Huang, Qixing
    Ramani, Karthik
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 514 - 528
  • [30] Deep Progressive Reinforcement Learning for Skeleton-based Action Recognition
    Tang, Yansong
    Tian, Yi
    Lu, Jiwen
    Li, Peiyang
    Zhou, Jie
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5323 - 5332