Multi-stream 3D CNN structure for human action recognition trained by limited data

被引:25
|
作者
Chenarlogh, Vahid Ashkani [1 ]
Razzazi, Farbod [1 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Elect & Comp Engn, Tehran, Iran
关键词
object recognition; image motion analysis; image classification; cameras; feature extraction; learning (artificial intelligence); video signal processing; image sequences; convolutional neural nets; multistream 3D CNN structure; human action recognition; training performance; training data case; optical flows; vertical directions; three-dimensional CNNs; four-stream 3D CNNs; single-stream model; two-stream architecture; four-stream architecture; information channels; separate streams; action recognition system; data set; four-stream structure; convolutional neural network architectures; optical flow; recognition rate; IXMAS; FEATURES;
D O I
10.1049/iet-cvi.2018.5088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, the authors proposed a solution to improve the training performance in limited training data case for human action recognition. The authors proposed three different convolutional neural network (CNN) architectures for this purpose. At first, the authors generated four different channels of information by optical flows and gradients in the horizontal and vertical directions from each frame to apply to three-dimensional (3D) CNNs. Then, the authors proposed three architectures, which are single-stream, two-stream, and four-stream 3D CNNs. In the single-stream model, the authors applied four channels of information from each frame to a single stream. In the two-stream architecture, the authors applied optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In the four-stream architecture, the authors applied each one of the information channels to four separate streams. Evaluating the architectures in an action recognition system, the system was assessed on IXMAS, a data set which has been recorded simultaneously by five cameras. The authors showed that the results of four-stream architecture were better than other architectures, achieving 87.5, 91.66, 91.11, 88.05, and 81.94% recognition rates for cameras 0-4, respectively, using four-stream structure (88.05% recognition rate in average).
引用
收藏
页码:338 / 344
页数:7
相关论文
共 50 条
  • [21] Kinematics Features for 3D Action Recognition Using Two-Stream CNN
    Wang, Jiangliu
    Liu, Yunhui
    2018 13TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2018, : 1731 - 1736
  • [22] Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos
    Liu, Xiao
    Yang, Xudong
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 251 - 262
  • [23] Trained 3D Models for CNN based Object Recognition
    Sarkar, Kripasindhu
    Varanasi, Kiran
    Stricker, Didier
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 130 - 137
  • [24] Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition
    Tang, Yansong
    Wang, Zian
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 3001 - 3015
  • [25] 3D Skeletal Human Action Recognition Using a CNN Fusion Model
    Li, Meng
    Sun, Qiumei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [26] Towards 3D Human Action Recognition Using a Distilled CNN Model
    Ren, J.
    Reyes, N. H.
    Barczak, A. L. C.
    Scogings, C.
    Liu, M.
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2018, : 7 - 12
  • [27] End-to-End Speech Recognition Technology Based on Multi-Stream CNN
    Xiao, Hao
    Qiu, Yuan
    Fei, Rong
    Chen, Xiongbo
    Liu, Zuo
    Wu, Zongling
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1310 - 1315
  • [28] Human Fall Detection Using 3D Multi-Stream Convolutional Neural Networks with Fusion
    Alanazi, Thamer
    Muhammad, Ghulam
    DIAGNOSTICS, 2022, 12 (12)
  • [29] Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition
    Hong, Jongkwang
    Cho, Bora
    Hong, Yong Won
    Byun, Hyeran
    SENSORS, 2019, 19 (06)
  • [30] Human Action Recognition Using 3D Reconstruction Data
    Papadopoulos, Georgios Th
    Daras, Petros
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (08) : 1807 - 1823