Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation

被引：4

作者：

Rehman, Sajid Ur ^{[1
]}

Yasin, Aman Ullah ^{[1
]}

Ul Haq, Ehtisham ^{[1
]}

Ali, Moazzam ^{[1
]}

Kim, Jungsuk ^{[2
,3
]}

Mehmood, Asif ^{[2
]}

机构：

[1] Air Univ, Dept Creat Technol, Islamabad 44000, Pakistan

[2] Gachon Univ, Coll IT Convergence, Dept Biomed Engn, 1342 Seongnamdaero, Seongnam Si 13120, South Korea

[3] Cellico Co, Res & Dev Lab, Seongnam Si 13449, South Korea

来源：

SENSORS | 2024年 / 24卷 / 14期

基金：

新加坡国家研究基金会;

关键词：

Human Activity Recognition (HAR); two-stream network; skeletal extraction; 2+1 dimensional convolutional neural network (2+1D CNN); spatiotemporal feature extraction; multimodal fusion; UTD Multimodal Human Action Dataset (UTD MHAD); pose estimation; deep learning;

D O I：

10.3390/s24144646

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.

引用

页数：22

共 4 条

[1] Multimodal Cue Integration through Hypotheses Verification for RGB-D Object Recognition and 6DOF Pose Estimation
Aldoma, A.
Tombari, F.
Prankl, J.
Richtsfeld, A.
Di Stefano, L.
Vincze, M.
2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 2104 - 2111
[2] Enhancing Data-Driven Algorithms for Human Pose Estimation and Action Recognition Through Simulation
Ludl, Dennis
Gulde, Thomas
Curio, Cristobal
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (09) : 3990 - 3999
[3] Human activity recognition with analysis of angles between skeletal joints using a RGB-depth sensor
Ince, Omer Faruk
Ince, Ibrahim Furkan
Yildirim, Mustafa Eren
Park, Jang Sik
Song, Jong Kwan
Yoon, Byung Woo
ETRI JOURNAL, 2020, 42 (01) : 78 - 89
[4] Enhancing Human Activity Recognition through Deep Learning: Comparative Analysis of Single Frame CNN and Convolutional LSTM Models
Kumar, Manoj R.
Murugan, Bala M. S.
Pooja, S.
2024 9TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS ENGINEERING, ICCRE 2024, 2024, : 400 - 405

← 1 →