Modeling the skeleton-language uncertainty for 3D action recognition

被引：1

作者：

Wang, Mingdao ^{[1
]}

Zhang, Xianlin ^{[2
]}

Chen, Siqi ^{[1
]}

Li, Xueming ^{[2
]}

Zhang, Yue ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100000, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Digital Media & Design Arts, Beijing, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 608卷

关键词：

Uncertainty; Multimodal model; 3D skeleton-based action recognition; NETWORKS;

D O I：

10.1016/j.neucom.2024.128426

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human 3D skeleton-based action recognition has received increasing interest in recent years. Inspired by the excellent ability of the multi-modal model, some pioneer attempts to employ diverse modalities, i.e., skeleton and language, to construct the skeleton-language model and have shown compelling results. Yet, these attempts model the data representation as deterministic point estimation, ignoring a key issue that descriptions of similar motions are uncertain and ambiguous, which brings about restricted comprehension of complex concept hierarchies and impoverished cross-modal alignment reliability. To tackle this challenge, this paper proposes a new Uncertain Skeleton-Language Learning Framework (USLLF) to capture the semantic ambiguity among diverse modalities in a probabilistic manner for the first time. USLLF consists of both inter- and intra-modal uncertainties. Specifically, first, we integrate the language (text) generated by ChatGPT with the generic skeleton-based network and develop a deterministic multi-modal baseline, which can be easily achieved via any off-the-shelf skeleton and text encoders. Then, based on this baseline, we explicitly model the intra-modal (skeleton/language) uncertainties as the Gaussian distributions using the new uncertainty networks capable of learning the distributional embeddings of modalities. Following this, these embeddings are aligned and formulated as inter-modal (skeleton-language) uncertainty using both the contrastive and negative log- likelihood objectives to alleviate the cross-modal alignment error. Experimental results on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets show that our approach outperforms the proposed baseline and achieves comparable performance with a high inference efficiency compared to the state-of-the-art methods. Besides, we also deliver insightful analyses on how learned uncertainty reduces the impact of uncertain and ambiguous data on model performance.

引用

页数：14

共 50 条

[31] Human skeleton representation for 3D action recognition based on complex network coding and LSTM
Shen, Xiangpei
Ding, Yanrui
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 82
[32] A three-stream fusion network for 3D skeleton-based action recognition
Fang, Ming
Liu, Qi
Ren, Jianping
Li, Jie
Du, Xinning
Liu, Shuhua
MULTIMEDIA SYSTEMS, 2025, 31 (02)
[33] Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition
Banerjee, Avinandan
Singh, Pawan Kumar
Sarkar, Ram
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2206 - 2216
[34] Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information
Tasnim, Nusrat
Islam, Md. Mahbubul
Baek, Joong-Hwan
INVENTIONS, 2020, 5 (03) : 1 - 15
[35] Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints
Caetano, Carlos
Bremond, Francois
Schwartz, William Robson
2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, : 16 - 23
[36] Rethinking the ST-GCNs for 3D skeleton-based human action recognition
Peng, Wei
Shi, Jingang
Varanka, Tuomas
Zhao, Guoying
NEUROCOMPUTING, 2021, 454 : 45 - 53
[37] 3D TRAJECTORIES FOR ACTION RECOGNITION
Koperski, Michal
Bilinski, Piotr
Bremond, Francois
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 4176 - 4180
[38] Language-guided temporal primitive modeling for skeleton-based action recognition
Pan, Qingzhe
Xie, Xuemei
NEUROCOMPUTING, 2025, 613
[39] SKELETON-BASED MODELING OF 3D SURFACES
Jankauskas, Kestutis
Noreika, Algirdas
INFORMATION TECHNOLOGIES' 2009, 2009, : 235 - 242
[40] Hankelet-based dynamical systems modeling for 3D action recognition
Lo Presti, Liliana
La Cascia, Marco
Sclaroff, Stan
Camps, Octavia
IMAGE AND VISION COMPUTING, 2015, 44 : 29 - 43

← 1 2 3 4 5 →