OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning

被引:0
|
作者
Usman, Muhammad [1 ,2 ,3 ]
Cao, Wenming [1 ,2 ,3 ]
Huang, Zhao [4 ]
Zhong, Jianqi [1 ,2 ,3 ]
Ji, Ruiya [5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
[3] Shenzhen Univ, Shenzhen 518060, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle NE1 8ST, England
[5] Queen Mary Univ London, Dept Comp Sci, London E1 4NS, England
基金
中国国家自然科学基金;
关键词
skeleton-based action representation learning; unsupervised learning; hierarchical contrastive learning; one-to-many; GRAPH CONVOLUTIONAL NETWORKS; LSTM;
D O I
10.3390/ai5040106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition has become crucial in computer vision, with growing applications in surveillance, human-computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning (OTM-HC) framework maps the input into multi-layered feature vectors, creating a hierarchical contrast representation that captures various granularities within a human skeleton sequence temporal and spatial domains. Using sequence-to-sequence (Seq2Seq) transformer encoders and downsampling modules, OTM-HC can distinguish between multiple levels of action representations, such as instance, domain, clip, and part levels. Each level contributes significantly to a comprehensive understanding of action representations. The OTM-HC model design is adaptable, ensuring smooth integration with advanced Seq2Seq encoders. We tested the OTM-HC framework across four datasets, demonstrating improved performance over state-of-the-art models. Specifically, OTM-HC achieved improvements of 0.9% and 0.6% on NTU60, 0.4% and 0.7% on NTU120, and 0.7% and 0.3% on PKU-MMD I and II, respectively, surpassing previous leading approaches across these datasets. These results showcase the robustness and adaptability of our model for various skeleton-based action recognition tasks.
引用
收藏
页码:2170 / 2186
页数:17
相关论文
共 48 条
  • [41] Contrast-Reconstruction Representation Learning for Self-Supervised Skeleton-Based Action Recognition
    Wang, Peng
    Wen, Jun
    Si, Chenyang
    Qian, Yuntao
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6224 - 6238
  • [42] Representation modeling learning with multi-domain decoupling for unsupervised skeleton-based action recognition
    He, Zhiquan
    Lv, Jiantu
    Fang, Shizhang
    NEUROCOMPUTING, 2024, 582
  • [43] InfoGCN plus plus : Learning Representation by Predicting the Future for Online Skeleton-Based Action Recognition
    Chi, Seunggeun
    Chi, Hyung-Gun
    Huang, Qixing
    Ramani, Karthik
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 514 - 528
  • [44] Contrastive Learning with Cross-Part Bidirectional Distillation for Self-supervised Skeleton-Based Action Recognition
    Yang, Huaigang
    Zhang, Qieshi
    Ren, Ziliang
    Yuan, Huaqiang
    Zhang, Fuyong
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2024, 14
  • [45] Skeleton-Based Dynamic Hand Gesture Recognition Using an Enhanced Network with One-Shot Learning
    Ma, Chunyong
    Zhang, Shengsheng
    Wang, Anni
    Qi, Yongyang
    Chen, Ge
    APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [46] Spatial-Temporal Adaptive Metric Learning Network for One-Shot Skeleton-Based Action Recognition
    Li, Xuanfeng
    Lu, Jian
    Chen, Xiaogai
    Zhang, Xiaodan
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 321 - 325
  • [47] Contrastive 3D Human Skeleton Action Representation Learning via CrossMoCo With Spatiotemporal Occlusion Mask Data Augmentation
    Zeng, Qinyang
    Liu, Chengju
    Liu, Ming
    Chen, Qijun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 (1564-1574) : 1564 - 1574
  • [48] Learning to recognise 3D human action from a new skeleton-based representation using deep convolutional neural networks
    Huy-Hieu Pham
    Khoudour, Louahdi
    Crouzil, Alain
    Zegers, Pablo
    Velastin, Sergio A.
    IET COMPUTER VISION, 2019, 13 (03) : 319 - 328