Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation

被引:4
|
作者
Rehman, Sajid Ur [1 ]
Yasin, Aman Ullah [1 ]
Ul Haq, Ehtisham [1 ]
Ali, Moazzam [1 ]
Kim, Jungsuk [2 ,3 ]
Mehmood, Asif [2 ]
机构
[1] Air Univ, Dept Creat Technol, Islamabad 44000, Pakistan
[2] Gachon Univ, Coll IT Convergence, Dept Biomed Engn, 1342 Seongnamdaero, Seongnam Si 13120, South Korea
[3] Cellico Co, Res & Dev Lab, Seongnam Si 13449, South Korea
基金
新加坡国家研究基金会;
关键词
Human Activity Recognition (HAR); two-stream network; skeletal extraction; 2+1 dimensional convolutional neural network (2+1D CNN); spatiotemporal feature extraction; multimodal fusion; UTD Multimodal Human Action Dataset (UTD MHAD); pose estimation; deep learning;
D O I
10.3390/s24144646
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.
引用
收藏
页数:22
相关论文
共 4 条
  • [1] Multimodal Cue Integration through Hypotheses Verification for RGB-D Object Recognition and 6DOF Pose Estimation
    Aldoma, A.
    Tombari, F.
    Prankl, J.
    Richtsfeld, A.
    Di Stefano, L.
    Vincze, M.
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 2104 - 2111
  • [2] Enhancing Data-Driven Algorithms for Human Pose Estimation and Action Recognition Through Simulation
    Ludl, Dennis
    Gulde, Thomas
    Curio, Cristobal
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (09) : 3990 - 3999
  • [3] Human activity recognition with analysis of angles between skeletal joints using a RGB-depth sensor
    Ince, Omer Faruk
    Ince, Ibrahim Furkan
    Yildirim, Mustafa Eren
    Park, Jang Sik
    Song, Jong Kwan
    Yoon, Byung Woo
    ETRI JOURNAL, 2020, 42 (01) : 78 - 89
  • [4] Enhancing Human Activity Recognition through Deep Learning: Comparative Analysis of Single Frame CNN and Convolutional LSTM Models
    Kumar, Manoj R.
    Murugan, Bala M. S.
    Pooja, S.
    2024 9TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS ENGINEERING, ICCRE 2024, 2024, : 400 - 405