Multi-stream 3D CNN structure for human action recognition trained by limited data

被引:25
|
作者
Chenarlogh, Vahid Ashkani [1 ]
Razzazi, Farbod [1 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
关键词
object recognition; image motion analysis; image classification; cameras; feature extraction; learning (artificial intelligence); video signal processing; image sequences; convolutional neural nets; multistream 3D CNN structure; human action recognition; training performance; training data case; optical flows; vertical directions; three-dimensional CNNs; four-stream 3D CNNs; single-stream model; two-stream architecture; four-stream architecture; information channels; separate streams; action recognition system; data set; four-stream structure; convolutional neural network architectures; optical flow; recognition rate; IXMAS; FEATURES;
D O I
10.1049/iet-cvi.2018.5088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two-stream, and four-stream 3D CNNs. In the single-stream model, the authors applied four channels of information from each frame to a single stream. In the two-stream architecture, the authors applied optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In the four-stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four-stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0-4, respectively, using four-stream structure (88.05% recognition rate in average).
引用
收藏
页码:338 / 344
页数:7
相关论文
共 50 条
  • [31] Multi-stream Architecture with Symmetric Extended Visual Rhythms for Deep Learning Human Action Recognition
    Tacon, Hemerson
    Brito, Andre de Souza
    Chaves, Hugo de Lima
    Vieira, Marcelo Bernardes
    Villela, Saulo Moraes
    Maia, Helena de Almeida
    Concha, Darwin Ttito
    Pedrini, Helio
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 351 - 358
  • [32] Viewpoint guided multi-stream neural network for skeleton action recognition
    He, Yicheng
    Liang, Zixi
    He, Shaocong
    Wang, Yonghua
    Yin, Ming
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 6783 - 6802
  • [33] Motion saliency based multi-stream multiplier ResNets for action recognition
    Zong, Ming
    Wang, Ruili
    Chen, Xiubo
    Chen, Zhe
    Gong, Yuanhao
    IMAGE AND VISION COMPUTING, 2021, 107 (107)
  • [34] Skeleton Feature Fusion Based on Multi-Stream LSTM for Action Recognition
    Wang, Lei
    Zhao, Xu
    Liu, Yuncai
    IEEE ACCESS, 2018, 6 : 50788 - 50800
  • [35] Viewpoint guided multi-stream neural network for skeleton action recognition
    Yicheng He
    Zixi Liang
    Shaocong He
    Yonghua Wang
    Ming Yin
    Multimedia Tools and Applications, 2024, 83 : 6783 - 6802
  • [36] Action Recognition with Multi-stream Motion Modeling and Mutual Information Maximization
    Yang, Yuheng
    Chen, Haipeng
    Liu, Zhenguang
    Lyu, Yingda
    Zhang, Beibei
    Wu, Shuang
    Wang, Zhibo
    Ren, Kui
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1658 - 1666
  • [37] A Multi-Stream Sequence Learning Framework for Human Interaction Recognition
    Haroon, Umair
    Ullah, Amin
    Hussain, Tanveer
    Ullah, Waseem
    Sajjad, Muhammad
    Muhammad, Khan
    Lee, Mi Young
    Baik, Sung Wook
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (03) : 435 - 444
  • [38] Aggressive action recognition using 3D CNN architectures
    Saveliev, Anton
    Uzdiaev, Mikhail
    Dmitrii, Malov
    12TH INTERNATIONAL CONFERENCE ON THE DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2019), 2019, : 890 - 895
  • [39] Multi-Stream 3D latent feature clustering for abnormality detection in videos
    Asad, Mujtaba
    Jiang, He
    Yang, Jie
    Tu, Enmei
    Malik, Aftab Ahmad
    Applied Intelligence, 2022, 52 (01): : 1126 - 1143
  • [40] Driving behaviour recognition from still images by using multi-stream fusion CNN
    Yaocong Hu
    Mingqi Lu
    Xiaobo Lu
    Machine Vision and Applications, 2019, 30 : 851 - 865