Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition

被引:0
|
作者
Luo H. [1 ]
Tong K. [1 ]
机构
[1] School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou
来源
基金
中国国家自然科学基金;
关键词
Action recognition; Multi-model ensemble; Multiplication fusion; Spatiotemporal stream; Squeeze-and-excitation residual network;
D O I
10.11959/j.issn.1000-436x.2019194
中图分类号
学科分类号
摘要
Aiming at the shortcomings of shallow networks and general deep models in two-stream network structure, which could not effectively learn spatial and temporal information, a squeeze-and-excitation residual network was proposed for action recognition with a spatial stream and a temporal stream. Meanwhile, the long-term temporal dependence was captured by injecting the identity mapping kernel into the network as a temporal filter. Spatiotemporal feature multiplication fusion was used to further enhance the interaction between spatial information and temporal information of squeeze-and-excitation residual networks. Simultaneously, the influence of spatial-temporal stream multiplication fusion methods, times and locations on the performance of action recognition was studied. Given the limitations of performance achieved by a single model, three different strategies were proposed to generate multiple models, and the final recognition result was obtained by integrating these models through averaging and weighted averaging. The experimental results on the HMDB51 and UCF101 datasets show that the proposed spatiotemporal squeeze-and-excitation residual multiplier networks can effectively improve the performance of action recognition. © 2019, Editorial Board of Journal on Communications. All right reserved.
引用
收藏
页码:189 / 198
页数:9
相关论文
共 29 条
  • [1] Herath S., Harandi M., Porikli F., Going deeper into action recognition: a survey, Image and Vision Computing, 60, pp. 4-21, (2017)
  • [2] Hu Q., Qin L., Huang Q., Overview of human action recognition based on vision, Chinese Journal of Computers, 36, 12, pp. 2512-2524, (2013)
  • [3] Zhu Y., Zhao J.K., Wang Y.N., A review of human action recognition based on deep learning, ACTA Automatica Sinica, 42, 6, pp. 848-857, (2016)
  • [4] Luo H.L., Wang C.J., Lu F., Survey of video behavior recognition, Journal on Communications, 39, 6, pp. 173-184, (2018)
  • [5] Bobick A.F., Davis J.W., An appearance-based representation of action, International Conference on Pattern Recognition, pp. 307-312, (1996)
  • [6] Weinland D., Ronfard R., Boyer E., Free viewpoint action recognition using motion history volumes, Computer Vision and Image Understanding, 104, 2-3, pp. 249-257, (2006)
  • [7] Yilmaz A., Shah M., Actions sketch: a novel action representation, Computer Vision and Pattern Recognition, pp. 984-989, (2005)
  • [8] Wang H., Ullah M.M., Klaser A., Et al., Evaluation of local spatio-temporal features for action recognition, British Machine Vision Conference, pp. 1-11, (2009)
  • [9] Klaser A., Schmid C., Action recognition by dense trajectories, Computer Vision and Pattern Recognition, pp. 3169-3176, (2011)
  • [10] Wang H., Schmid C., Action recognition with improved trajectories, International Conference on Computer Vision, pp. 3551-3558, (2013)