Human Action Recognition Combined With Object Detection

被引:0
|
作者
Zhou B. [1 ]
Li J.-F. [1 ]
机构
[1] Institute of Automation, Faculty of Mechanical Engineering and Automation, Zhejiang Sci-Tech University, Hangzhou
来源
关键词
Action recognition; Computer vision; Convolutional neural network (CNN); Deep learning; Object detection;
D O I
10.16383/j.aas.c180848
中图分类号
学科分类号
摘要
Most of the research methods in the field of human action recognition extract relevant features from the original video frames. These methods introduce more or less redundant background information, which brings more noise to the neural network. In order to solve the problem of background information interference, large amount of redundant information in video frames, unbalanced sample classification and difficult classification of individual classes, this paper proposes a new algorithm for human action recognition combined with object detection. Firstly, the object detection mechanism is added in the process of human action recognition, so that the neural network has a focus on learning the motion information of the human body. Secondly, the video is segmentally and randomly sampled to establish long-term time domain modeling across the entire video segment. Finally, action recognition is performed through an improved neural network loss function. In this work, a large number of experimental analyses are performed on the popular human action recognition datasets UCF101 and HDBM51. The accuracy of human action recognition (RGB images only) is 96.0% and 75.3%, respectively, which is significantly higher than the state-of-the-art human action recognition algorithms. Copyright © 2020 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:1961 / 1970
页数:9
相关论文
共 31 条
  • [1] Zhu Hong-Lei, Zhu Chang-Sheng, Xu Zhi-Gang, Research advances on human activity recognition datasets, Acta Automatica Sinica, 44, 6, pp. 978-1004, (2018)
  • [2] Carreira J, Zisserman A., Quo vadis, action recognition? A new model and the kinetics dataset, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724-4733, (2017)
  • [3] Ng Y H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G., Beyond short snippets: Deep networks for video classification, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694-4702, (2015)
  • [4] Hara K, Kataoka H, Satoh Y., Can spatiotemporal 3d CNNs retrace the history of 2d CNNs and imagenet?, Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, (2018)
  • [5] Tran D, Ray J, Shou Z, Chang S F, Paluri M., Convnet architecture search for spatiotemporal feature learning, (2017)
  • [6] Wang H, Schmid C., Action recognition with improved trajectories, Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551-3558, (2013)
  • [7] Dalal N., Triggs B., Histograms of oriented gradients for human detection, Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, pp. 886-893, (2005)
  • [8] Chaudhry R., Ravichandran A., Hager G., Vidal R., Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1932-1939, (2009)
  • [9] Knopp J, Prasad M, Willems G, Timofte R, VanGool L., Hough transformand 3D SURF for robust threedimensional classification, Proceedings of the 11th European Conference on Computer Vision (ECCV2010), pp. 589-602, (2010)
  • [10] Sanchez J, Perronnin F, Mensink T, Verbeek J., Image classification with the fisher vector: Theory and practice, International Journal of Computer Vision, 105, 3, pp. 222-245, (2013)