Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7

被引:0
|
作者
Lueangwitchajaroen, Pitiwat [1 ]
Watcharapinchai, Sitapa [1 ]
Tepsan, Worawit [2 ]
Sooksatra, Sorn [1 ]
机构
[1] Natl Sci & Technol Dev Agcy, Natl Elect & Comp Technol Ctr, Pathum Thani 12120, Thailand
[2] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand
关键词
human action recognition; fusion method; multi-level fusion;
D O I
10.3390/jimaging10120320
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Accurate human action recognition is becoming increasingly important across various fields, including healthcare and self-driving cars. A simple approach to enhance model performance is incorporating additional data modalities, such as depth frames, point clouds, and skeleton information, while previous studies have predominantly used late fusion techniques to combine these modalities, our research introduces a multi-level fusion approach that combines information at early, intermediate, and late stages together. Furthermore, recognizing the challenges of collecting multiple data types in real-world applications, our approach seeks to exploit multimodal techniques while relying solely on RGB frames as the single data source. In our work, we used RGB frames from the NTU RGB+D dataset as the sole data source. From these frames, we extracted 2D skeleton coordinates and optical flow frames using pre-trained models. We evaluated our multi-level fusion approach with EfficientNet-B7 as a case study, and our methods demonstrated significant improvement, achieving 91.5% in NTU RGB+D 60 dataset accuracy compared to single-modality and single-view models. Despite their simplicity, our methods are also comparable to other state-of-the-art approaches.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Human Action Recognition Based On Multi-level Feature Fusion
    Xu, Y. Y.
    Xiao, G. Q.
    Tang, X. Q.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 353 - 355
  • [2] Action Recognition Method Based on Multi-Level Feature Fusion and Temporal Extension
    Wu, Haoyuan
    Xiong, Xin
    Min, Weidong
    Zhao, Haoyu
    Wang, Wenxiang
    Computer Engineering and Applications, 2023, 59 (07) : 134 - 142
  • [3] An InSAR Interferogram Filtering Method Based on Multi-Level Feature Fusion CNN
    Yang, Wang
    He, Yi
    Yao, Sheng
    Zhang, Lifeng
    Cao, Shengpeng
    Wen, Zhiqing
    SENSORS, 2022, 22 (16)
  • [4] Spatio-temporal Multi-level Fusion for Human Action Recognition
    Manh-Hung Lu
    Thi-Oanh Nguyen
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 298 - 305
  • [5] CHAN: Skeleton based action recognition by multi-level feature learning
    Lu, Jian
    Gong, Yinghao
    Zhou, Yanran
    Ma, Chengxian
    Huang, Tingting
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (06)
  • [6] Multimodal feature fusion for CNN-based gait recognition: an empirical comparison
    Castro, Francisco M.
    Marin-Jimenez, Manuel J.
    Guil, Nicolas
    de la Blanca, Nicolas
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (17): : 14173 - 14193
  • [7] Multimodal feature fusion for CNN-based gait recognition: an empirical comparison
    Francisco M. Castro
    Manuel J. Marín-Jiménez
    Nicolás Guil
    Nicolás Pérez de la Blanca
    Neural Computing and Applications, 2020, 32 : 14173 - 14193
  • [8] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    SONG Wei
    LIU Ningning
    YANG Guosheng
    YANG Pei
    中国通信, 2015, 12(S2) (S2) : 93 - 102
  • [9] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    Song Wei
    Liu Ningning
    Yang Guosheng
    Yang Pei
    CHINA COMMUNICATIONS, 2015, 12 (02) : 93 - 102
  • [10] A Novel Human Action Recognition Algorithm Based on Decision Level Multi-Feature Fusion
    SONG Wei
    LIU Ningning
    YANG Guosheng
    YANG Pei
    China Communications, 2015, (S2) : 93 - 102