Modeling the skeleton-language uncertainty for 3D action recognition

被引：1

作者：

Wang, Mingdao ^{[1
]}

Zhang, Xianlin ^{[2
]}

Chen, Siqi ^{[1
]}

Li, Xueming ^{[2
]}

Zhang, Yue ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100000, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 608卷

关键词：

Uncertainty; Multimodal model; 3D skeleton-based action recognition; NETWORKS;

D O I：

10.1016/j.neucom.2024.128426

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human 3D skeleton-based action recognition has received increasing interest in recent years. Inspired by the excellent ability of the multi-modal model, some pioneer attempts to employ diverse modalities, i.e., skeleton and language, to construct the skeleton-language model and have shown compelling results. Yet, these attempts model the data representation as deterministic point estimation, ignoring a key issue that descriptions of similar motions are uncertain and ambiguous, which brings about restricted comprehension of complex concept hierarchies and impoverished cross-modal alignment reliability. To tackle this challenge, this paper proposes a new Uncertain Skeleton-Language Learning Framework (USLLF) to capture the semantic ambiguity among diverse modalities in a probabilistic manner for the first time. USLLF consists of both inter- and intra-modal uncertainties. Specifically, first, we integrate the language (text) generated by ChatGPT with the generic skeleton-based network and develop a deterministic multi-modal baseline, which can be easily achieved via any off-the-shelf skeleton and text encoders. Then, based on this baseline, we explicitly model the intra-modal (skeleton/language) uncertainties as the Gaussian distributions using the new uncertainty networks capable of learning the distributional embeddings of modalities. Following this, these embeddings are aligned and formulated as inter-modal (skeleton-language) uncertainty using both the contrastive and negative log- likelihood objectives to alleviate the cross-modal alignment error. Experimental results on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets show that our approach outperforms the proposed baseline and achieves comparable performance with a high inference efficiency compared to the state-of-the-art methods. Besides, we also deliver insightful analyses on how learned uncertainty reduces the impact of uncertain and ambiguous data on model performance.

引用

页数：14

共 50 条

[41] A new Bayesian modeling for 3D human-object action recognition
Maurice, Camille
Madrigal, Francisco
Monin, Andre
Lerasle, Frederic
2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[42] Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs
Ke, Shian-Ru
Hoang Le Uyen Thuc
Hwang, Jenq-Neng
Yoo, Jang-Hee
Choi, Kyoung-Ho
ETRI JOURNAL, 2014, 36 (04) : 661 - 671
[43] Key-Skeleton-Pattern Mining on 3D Skeletons Represented by Lie Group for Action Recognition
Li, Guang
Liu, Kai
Ding, Wenwen
Cheng, Fei
Chen, Boyang
MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
[44] MCTD: Motion-Coordinate-Time Descriptor for 3D Skeleton-Based Action Recognition
Liang, Qi
Wang, Feng
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 577 - 587
[45] Contrastive Mask Learning for Self-Supervised 3D Skeleton-Based Action Recognition
Zhang, Haoyuan
SENSORS, 2025, 25 (05)
[46] Image representation of pose-transition feature for 3D skeleton-based action recognition
Thien Huynh-The
Hua, Cam-Hao
Trung-Thanh Ngo
Kim, Dong-Seong
INFORMATION SCIENCES, 2020, 513 : 112 - 126
[47] Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network
Ding, Wenwen
Ding, Chongyang
Li, Guang
Liu, Kai
IEEE ACCESS, 2021, 9 : 54078 - 54089
[48] Action Recognition Based on 3D Skeleton and LSTM for the Monitoring of Construction Workers' Safety Harness Usage
Guo, Hongling
Zhang, Zhitian
Yu, Run
Sun, Yakang
Li, Heng
JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2023, 149 (04)
[49] Action Recognition from 3D Skeleton Sequences using Deep Networks on Lie Group Features
Rhif, Manel
Wannous, Hazem
Farah, Imed Riadh
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3427 - 3432
[50] Regularizing Long Short Term Memory with 3D Human-Skeleton Sequences for Action Recognition
Mahasseni, Behrooz
Todorovic, Sinisa
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3054 - 3062

← 1 2 3 4 5 →