OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning

被引:0
|
作者
Usman, Muhammad [1 ,2 ,3 ]
Cao, Wenming [1 ,2 ,3 ]
Huang, Zhao [4 ]
Zhong, Jianqi [1 ,2 ,3 ]
Ji, Ruiya [5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Guangdong Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
[3] Shenzhen Univ, Shenzhen 518060, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle NE1 8ST, England
[5] Queen Mary Univ London, Dept Comp Sci, London E1 4NS, England
基金
中国国家自然科学基金;
关键词
skeleton-based action representation learning; unsupervised learning; hierarchical contrastive learning; one-to-many; GRAPH CONVOLUTIONAL NETWORKS; LSTM;
D O I
10.3390/ai5040106
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition has become crucial in computer vision, with growing applications in surveillance, human-computer interaction, and healthcare. Traditional approaches often use broad feature representations, which may miss subtle variations in timing and movement within action sequences. Our proposed One-to-Many Hierarchical Contrastive Learning (OTM-HC) framework maps the input into multi-layered feature vectors, creating a hierarchical contrast representation that captures various granularities within a human skeleton sequence temporal and spatial domains. Using sequence-to-sequence (Seq2Seq) transformer encoders and downsampling modules, OTM-HC can distinguish between multiple levels of action representations, such as instance, domain, clip, and part levels. Each level contributes significantly to a comprehensive understanding of action representations. The OTM-HC model design is adaptable, ensuring smooth integration with advanced Seq2Seq encoders. We tested the OTM-HC framework across four datasets, demonstrating improved performance over state-of-the-art models. Specifically, OTM-HC achieved improvements of 0.9% and 0.6% on NTU60, 0.4% and 0.7% on NTU120, and 0.7% and 0.3% on PKU-MMD I and II, respectively, surpassing previous leading approaches across these datasets. These results showcase the robustness and adaptability of our model for various skeleton-based action recognition tasks.
引用
收藏
页码:2170 / 2186
页数:17
相关论文
共 48 条
  • [21] Reconstruction-driven contrastive learning for unsupervised skeleton-based human action recognition
    Liu, Xing
    Gao, Bo
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [22] Global and Local Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
    Hu, Jinhua
    Hou, Yonghong
    Guo, Zihui
    Gao, Jiajun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10578 - 10589
  • [23] Spatiotemporal Decouple-and-Squeeze Contrastive Learning for Semisupervised Skeleton-Based Action Recognition
    Xu, Binqian
    Shu, Xiangbo
    Zhang, Jiachao
    Dai, Guangzhao
    Song, Yan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11035 - 11048
  • [24] Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition
    Liu, Hongda
    Wang, Yunlong
    Ren, Min
    Hu, Junxing
    Luo, Zhengquan
    Hou, Guangqi
    Sun, Zhenan
    MACHINE INTELLIGENCE RESEARCH, 2025,
  • [25] SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition
    Liu, Ruyi
    Liu, Yi
    Wu, Mengyao
    Xin, Wentian
    Miao, Qiguang
    Liu, Xiangzeng
    Lie, Long
    PATTERN RECOGNITION, 2025, 162
  • [26] Cross-stream contrastive learning for self-supervised skeleton-based action recognition
    Li, Ding
    Tang, Yongqiang
    Zhang, Zhizhong
    Zhang, Wensheng
    IMAGE AND VISION COMPUTING, 2023, 135
  • [27] Adaptive multi-level graph convolution with contrastive learning for skeleton-based action recognition
    Geng, Pei
    Li, Haowei
    Wang, Fuyun
    Lyu, Lei
    SIGNAL PROCESSING, 2022, 201
  • [28] Enhanced view-independent representation method for skeleton-based human action recognition
    Jiang Y.
    Lu L.
    Xu J.
    International Journal of Information and Communication Technology, 2021, 19 (02) : 201 - 218
  • [29] Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network
    Si, Chenyang
    Jing, Ya
    Wang, Wei
    Wang, Liang
    Tan, Tieniu
    PATTERN RECOGNITION, 2020, 107
  • [30] Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding
    Sun, Shengkai
    Liu, Daizong
    Dong, Jianfeng
    Qu, Xiaoye
    Gao, Junyu
    Yang, Xun
    Wang, Xun
    Wang, Meng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2973 - 2984