Modeling the skeleton-language uncertainty for 3D action recognition

被引:1
|
作者
Wang, Mingdao [1 ]
Zhang, Xianlin [2 ]
Chen, Siqi [1 ]
Li, Xueming [2 ]
Zhang, Yue [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100000, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing, Peoples R China
关键词
Uncertainty; Multimodal model; 3D skeleton-based action recognition; NETWORKS;
D O I
10.1016/j.neucom.2024.128426
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human 3D skeleton-based action recognition has received increasing interest in recent years. Inspired by the excellent ability of the multi-modal model, some pioneer attempts to employ diverse modalities, i.e., skeleton and language, to construct the skeleton-language model and have shown compelling results. Yet, these attempts model the data representation as deterministic point estimation, ignoring a key issue that descriptions of similar motions are uncertain and ambiguous, which brings about restricted comprehension of complex concept hierarchies and impoverished cross-modal alignment reliability. To tackle this challenge, this paper proposes a new Uncertain Skeleton-Language Learning Framework (USLLF) to capture the semantic ambiguity among diverse modalities in a probabilistic manner for the first time. USLLF consists of both inter- and intra-modal uncertainties. Specifically, first, we integrate the language (text) generated by ChatGPT with the generic skeleton-based network and develop a deterministic multi-modal baseline, which can be easily achieved via any off-the-shelf skeleton and text encoders. Then, based on this baseline, we explicitly model the intra-modal (skeleton/language) uncertainties as the Gaussian distributions using the new uncertainty networks capable of learning the distributional embeddings of modalities. Following this, these embeddings are aligned and formulated as inter-modal (skeleton-language) uncertainty using both the contrastive and negative log- likelihood objectives to alleviate the cross-modal alignment error. Experimental results on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets show that our approach outperforms the proposed baseline and achieves comparable performance with a high inference efficiency compared to the state-of-the-art methods. Besides, we also deliver insightful analyses on how learned uncertainty reduces the impact of uncertain and ambiguous data on model performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Accurate and Real-time Human Action Recognition Based on 3D Skeleton
    Chen, Hongzhao
    Wang, Guijin
    He, Li
    2013 INTERNATIONAL CONFERENCE ON OPTICAL INSTRUMENTS AND TECHNOLOGY: OPTOELECTRONIC IMAGING AND PROCESSING TECHNOLOGY, 2013, 9045
  • [22] Recurrent Neural Network based Action Recognition from 3D Skeleton Data
    Shukla, Parul
    Biswas, Kanad K.
    Kalra, Prem K.
    2017 13TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY AND INTERNET-BASED SYSTEMS (SITIS), 2017, : 339 - 345
  • [23] Action recognition using kinematics posture feature on 3D skeleton joint locations
    Ahad, Md Atiqur Rahman
    Ahmed, Masud
    Das Antar, Anindya
    Makihara, Yasushi
    Yagi, Yasushi
    PATTERN RECOGNITION LETTERS, 2021, 145 (145) : 216 - 224
  • [24] AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement
    Guan, Shannan
    Lu, Haiyan
    Zhu, Linchao
    Fang, Gengfa
    NEUROCOMPUTING, 2022, 514 : 256 - 267
  • [25] HIF3D: Handwriting -Inspired Features for 3D skeleton-based action recognition
    Boulahia, Said Yacine
    Anquetil, Eric
    Kulpa, Richard
    Multon, Franck
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 985 - 990
  • [26] Understanding the Gap between 2D and 3D Skeleton-Based Action Recognition
    Elias, Petr
    Sedmidubsky, Jan
    Zezula, Pavel
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 192 - 195
  • [27] Behavior Recognition Based on 3D Skeleton Features
    Liu, W. T.
    Lu, T. W.
    Miao, S. J.
    Peng, L.
    Min, F.
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 760 - 765
  • [28] Enhancing Robustness of Viewpoint Changes in 3D Skeleton-Based Human Action Recognition
    Park, Jinyoon
    Kim, Chulwoong
    Kim, Seung-Chan
    MATHEMATICS, 2023, 11 (15)
  • [29] 3D HUMAN ACTION RECOGNITION BASED ON THE SPATIAL-TEMPORAL MOVING SKELETON DESCRIPTOR
    Yao, Hongxian
    Jiang, Xinghao
    Sun, Tanfeng
    Wang, Shilin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 937 - 942
  • [30] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
    Xu, Zhuoyan
    Xu, Jingke
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)