Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7

被引：0

作者：

Lueangwitchajaroen, Pitiwat ^{[1
]}

Watcharapinchai, Sitapa ^{[1
]}

Tepsan, Worawit ^{[2
]}

Sooksatra, Sorn ^{[1
]}

机构：

[1] Natl Sci & Technol Dev Agcy, Natl Elect & Comp Technol Ctr, Pathum Thani 12120, Thailand

[2] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand

来源：

JOURNAL OF IMAGING | 2024年 / 10卷 / 12期

关键词：

human action recognition; fusion method; multi-level fusion;

D O I：

10.3390/jimaging10120320

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source. In our work, we used RGB frames from the NTU RGB+D dataset as the sole data source. From these frames, we extracted 2D skeleton coordinates and optical flow frames using pre-trained models. We evaluated our multi-level fusion approach with EfficientNet-B7 as a case study, and our methods demonstrated significant improvement, achieving 91.5% in NTU RGB+D 60 dataset accuracy compared to single-modality and single-view models. Despite their simplicity, our methods are also comparable to other state-of-the-art approaches.

引用

页数：16

共 50 条

[1] Human Action Recognition Based On Multi-level Feature Fusion
Xu, Y. Y.
Xiao, G. Q.
Tang, X. Q.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 353 - 355
[2] Action Recognition Method Based on Multi-Level Feature Fusion and Temporal Extension
Wu, Haoyuan
Xiong, Xin
Min, Weidong
Zhao, Haoyu
Wang, Wenxiang
Computer Engineering and Applications, 2023, 59 (07) : 134 - 142
[3] An InSAR Interferogram Filtering Method Based on Multi-Level Feature Fusion CNN
Yang, Wang
He, Yi
Yao, Sheng
Zhang, Lifeng
Cao, Shengpeng
Wen, Zhiqing
SENSORS, 2022, 22 (16)
[4] Spatio-temporal Multi-level Fusion for Human Action Recognition
Manh-Hung Lu
Thi-Oanh Nguyen
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
[5] CHAN: Skeleton based action recognition by multi-level feature learning
Lu, Jian
Gong, Yinghao
Zhou, Yanran
Ma, Chengxian
Huang, Tingting
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (06)
[6] Multimodal feature fusion for CNN-based gait recognition: an empirical comparison
Castro, Francisco M.
Marin-Jimenez, Manuel J.
Guil, Nicolas
de la Blanca, Nicolas
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (17): : 14173 - 14193
[7] Multimodal feature fusion for CNN-based gait recognition: an empirical comparison
Francisco M. Castro
Manuel J. Marín-Jiménez
Nicolás Guil
Nicolás Pérez de la Blanca
Neural Computing and Applications, 2020, 32 : 14173 - 14193
[8] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
SONG Wei
LIU Ningning
YANG Guosheng
YANG Pei
中国通信, 2015, 12(S2) (S2) : 93 - 102
[9] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
Song Wei
Liu Ningning
Yang Guosheng
Yang Pei
CHINA COMMUNICATIONS, 2015, 12 (02) : 93 - 102
[10] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
SONG Wei
LIU Ningning
YANG Guosheng
YANG Pei
China Communications, 2015, (S2) : 93 - 102

← 1 2 3 4 5 →