Multi-stream 3D CNN structure for human action recognition trained by limited data

被引:25
|
作者
Chenarlogh, Vahid Ashkani [1 ]
Razzazi, Farbod [1 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
关键词
object recognition; image motion analysis; image classification; cameras; feature extraction; learning (artificial intelligence); video signal processing; image sequences; convolutional neural nets; multistream 3D CNN structure; human action recognition; training performance; training data case; optical flows; vertical directions; three-dimensional CNNs; four-stream 3D CNNs; single-stream model; two-stream architecture; four-stream architecture; information channels; separate streams; action recognition system; data set; four-stream structure; convolutional neural network architectures; optical flow; recognition rate; IXMAS; FEATURES;
D O I
10.1049/iet-cvi.2018.5088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two-stream, and four-stream 3D CNNs. In the single-stream model, the authors applied four channels of information from each frame to a single stream. In the two-stream architecture, the authors applied optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In the four-stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four-stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0-4, respectively, using four-stream structure (88.05% recognition rate in average).
引用
收藏
页码:338 / 344
页数:7
相关论文
共 50 条
  • [41] Multi-Stream 3D latent feature clustering for abnormality detection in videos
    Asad, Mujtaba
    Jiang, He
    Yang, Jie
    Tu, Enmei
    Malik, Aftab Ahmad
    APPLIED INTELLIGENCE, 2022, 52 (01) : 1126 - 1143
  • [42] Driving behaviour recognition from still images by using multi-stream fusion CNN
    Hu, Yaocong
    Lu, Mingqi
    Lu, Xiaobo
    MACHINE VISION AND APPLICATIONS, 2019, 30 (05) : 851 - 865
  • [43] Multi-Stream 3D latent feature clustering for abnormality detection in videos
    Mujtaba Asad
    He Jiang
    Jie Yang
    Enmei Tu
    Aftab Ahmad Malik
    Applied Intelligence, 2022, 52 : 1126 - 1143
  • [44] An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images
    Slade, Sam
    Zhang, Li
    Yu, Yonghong
    Lim, Chee Peng
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11): : 9205 - 9231
  • [45] An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images
    Sam Slade
    Li Zhang
    Yonghong Yu
    Chee Peng Lim
    Neural Computing and Applications, 2022, 34 : 9205 - 9231
  • [46] Gaze-Assisted Multi-Stream Deep Neural Network for Action Recognition
    Liu, Yinan
    Wu, Qingbo
    Tang, Liangzhi
    Shi, Hengcan
    IEEE ACCESS, 2017, 5 : 19432 - 19441
  • [47] Action Recognition Using 3D CNN and LSTM for Video Analytics
    Umamakeswari, A.
    Angelus, Jonah
    Kannan, Monicaa
    Rashikha
    Bragadeesh, S. A.
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 531 - 539
  • [48] Jointly Training of Binary 3D CNN Features for Action Recognition
    Cai, Yangang
    Xing, Peiyin
    Wang, Zhenyu
    Wang, Ronggang
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 446 - 446
  • [49] Multi-stream 3D video distribution over peer-to-peer networks
    Ding, Yan
    Liu, Jiangchuan
    Lian, Shiguo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2012, 27 (05) : 470 - 483
  • [50] An Improved Two-stream 3D Convolutional Neural Network for Human Action Recognition
    Chen, Jun
    Xu, Yuanping
    Zhang, Chaolong
    Xu, Zhijie
    Meng, Xiangxiang
    Wang, Jie
    2019 25TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), 2019, : 135 - 140