Modeling the skeleton-language uncertainty for 3D action recognition

被引:1
|
作者
Wang, Mingdao [1 ]
Zhang, Xianlin [2 ]
Chen, Siqi [1 ]
Li, Xueming [2 ]
Zhang, Yue [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100000, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing, Peoples R China
关键词
Uncertainty; Multimodal model; 3D skeleton-based action recognition; NETWORKS;
D O I
10.1016/j.neucom.2024.128426
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human 3D skeleton-based action recognition has received increasing interest in recent years. Inspired by the excellent ability of the multi-modal model, some pioneer attempts to employ diverse modalities, i.e., skeleton and language, to construct the skeleton-language model and have shown compelling results. Yet, these attempts model the data representation as deterministic point estimation, ignoring a key issue that descriptions of similar motions are uncertain and ambiguous, which brings about restricted comprehension of complex concept hierarchies and impoverished cross-modal alignment reliability. To tackle this challenge, this paper proposes a new Uncertain Skeleton-Language Learning Framework (USLLF) to capture the semantic ambiguity among diverse modalities in a probabilistic manner for the first time. USLLF consists of both inter- and intra-modal uncertainties. Specifically, first, we integrate the language (text) generated by ChatGPT with the generic skeleton-based network and develop a deterministic multi-modal baseline, which can be easily achieved via any off-the-shelf skeleton and text encoders. Then, based on this baseline, we explicitly model the intra-modal (skeleton/language) uncertainties as the Gaussian distributions using the new uncertainty networks capable of learning the distributional embeddings of modalities. Following this, these embeddings are aligned and formulated as inter-modal (skeleton-language) uncertainty using both the contrastive and negative log- likelihood objectives to alleviate the cross-modal alignment error. Experimental results on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets show that our approach outperforms the proposed baseline and achieves comparable performance with a high inference efficiency compared to the state-of-the-art methods. Besides, we also deliver insightful analyses on how learned uncertainty reduces the impact of uncertain and ambiguous data on model performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A new Bayesian modeling for 3D human-object action recognition
    Maurice, Camille
    Madrigal, Francisco
    Monin, Andre
    Lerasle, Frederic
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [42] Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs
    Ke, Shian-Ru
    Hoang Le Uyen Thuc
    Hwang, Jenq-Neng
    Yoo, Jang-Hee
    Choi, Kyoung-Ho
    ETRI JOURNAL, 2014, 36 (04) : 661 - 671
  • [43] Key-Skeleton-Pattern Mining on 3D Skeletons Represented by Lie Group for Action Recognition
    Li, Guang
    Liu, Kai
    Ding, Wenwen
    Cheng, Fei
    Chen, Boyang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [44] MCTD: Motion-Coordinate-Time Descriptor for 3D Skeleton-Based Action Recognition
    Liang, Qi
    Wang, Feng
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 577 - 587
  • [45] Contrastive Mask Learning for Self-Supervised 3D Skeleton-Based Action Recognition
    Zhang, Haoyuan
    SENSORS, 2025, 25 (05)
  • [46] Image representation of pose-transition feature for 3D skeleton-based action recognition
    Thien Huynh-The
    Hua, Cam-Hao
    Trung-Thanh Ngo
    Kim, Dong-Seong
    INFORMATION SCIENCES, 2020, 513 : 112 - 126
  • [47] Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network
    Ding, Wenwen
    Ding, Chongyang
    Li, Guang
    Liu, Kai
    IEEE ACCESS, 2021, 9 : 54078 - 54089
  • [48] Action Recognition Based on 3D Skeleton and LSTM for the Monitoring of Construction Workers' Safety Harness Usage
    Guo, Hongling
    Zhang, Zhitian
    Yu, Run
    Sun, Yakang
    Li, Heng
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2023, 149 (04)
  • [49] Action Recognition from 3D Skeleton Sequences using Deep Networks on Lie Group Features
    Rhif, Manel
    Wannous, Hazem
    Farah, Imed Riadh
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3427 - 3432
  • [50] Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition
    Mahasseni, Behrooz
    Todorovic, Sinisa
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3054 - 3062