Multi-modal hybrid hierarchical classification approach with transformers to enhance complex human activity recognition

被引：3

作者：

Ezzeldin, Mustafa ^{[1
]}

Ghoneim, Amr S. ^{[2
]}

Abdelhamid, Laila ^{[3
]}

Atia, Ayman ^{[4
]}

机构：

[1] Helwan Univ, Fac Comp & Aritifal Intelligence, Software Engn Dept, Cairo, Egypt

[2] Helwan Univ, Fac Comp & Aritifal Intelligence, Comp Sci Dept, Cairo, Egypt

[3] Helwan Univ, Fac Comp & Aritifal Intelligence, Informat Syst Dept, Cairo, Egypt

[4] Helwan Univ, October Univ Modern Sci & Arts MSA, HCI LAB FCAI, Fac Comp Sci, Cairo, Egypt

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 12期

关键词：

Human Activity Recognition (HAR); Complex HAR; Multi-modal; Hierarchical classification; Transformers;

D O I：

10.1007/s11760-024-03552-z

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Human Activity Recognition (HAR) plays a crucial role in computer vision and signal processing, with extensive applications in domains such as security, surveillance, and healthcare. Traditional machine learning (ML) approaches for HAR have achieved commendable success but face limitations such as reliance on handcrafted features, sensitivity to noise, and challenges in handling complex temporal dependencies. These limitations have spurred interest in deep learning (DL) and hybrid models that address these issues more effectively. Thus, DL has emerged as a powerful approach for HAR, surpassing the performance of traditional methods. In this paper, a multi-modal hybrid hierarchical classification approach is proposed. It combines DL transformers with traditional ML techniques to improve both the accuracy and efficiency of HAR. The proposed classifier is evaluated based on the different activities of four widely used benchmark datasets: PAMAP2, CASAS, UCI HAR, and UCI HAPT. The experimental results demonstrate that the hybrid hierarchical classifier achieves remarkable accuracy rates of 99.69%, 97.4%, 98.7%, and 98.6%, respectively, outperforming traditional classification methods and significantly reducing training time compared to sequential LSTM models.

引用

页码：9375 / 9385

页数：11

共 50 条

[31] Multi-modal Transformer for Indoor Human Action Recognition
Do, Jeonghyeok
Kim, Munchurl
2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1155 - 1160
[32] Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition
Pratishtha Verma
Animesh Sah
Rajeev Srivastava
Multimedia Systems, 2020, 26 : 671 - 685
[33] Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition
Verma, Pratishtha
Sah, Animesh
Srivastava, Rajeev
MULTIMEDIA SYSTEMS, 2020, 26 (06) : 671 - 685
[34] MMTSA: Multi-Modal Temporal Segment Attention Network for Efficient Human Activity Recognition
Gao, Ziqi
Wang, Yuntao
Chen, Jianguo
Xing, Junliang
Patel, Shwetak
Liu, Xin
Shi, Yuanchun
PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2023, 7 (03):
[35] Multi-modal Multi-label Emotion Recognition with Heterogeneous Hierarchical Message Passing
Zhang, Dong
Ju, Xincheng
Zhang, Wei
Li, Junhui
Li, Shoushan
Zhu, Qiaoming
Zhou, Guodong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14338 - 14346
[36] Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete Data
Liu, Linfeng
Liu, Siyu
Zhang, Lu
To, Xuan Vinh
Nasrallah, Fatima
Chandra, Shekhar S.
NEUROIMAGE, 2023, 277
[37] A multi-modal approach for high-dimensional feature recognition
Kushan Ahmadian
Marina Gavrilova
The Visual Computer, 2013, 29 : 123 - 130
[38] A multi-modal approach for high-dimensional feature recognition
Ahmadian, Kushan
Gavrilova, Marina
VISUAL COMPUTER, 2013, 29 (02): : 123 - 130
[39] On the Impact of Wireless Multimedia Network for Multi-Modal Activity Recognition
Yamashita, Akika
Lua, Eng Keong
Oguchi, Masato
2014 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2014,
[40] Empirical Mode Decomposition Based Multi-Modal Activity Recognition
Hu, Lingyue
Zhao, Kailong
Zhou, Xueling
Ling, Bingo Wing-Kuen
Liao, Guozhao
SENSORS, 2020, 20 (21) : 1 - 15

← 1 2 3 4 5 →