Hierarchical Temporal Pooling for Efficient Online Action Recognition

被引:0
|
作者
Zhang, Can [1 ]
Zou, Yuexian [1 ,2 ]
Chen, Guang [1 ]
机构
[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China
[2] Peng Cheng Lab, Shenzhen, Peoples R China
来源
关键词
Action recognition; Hierarchical Temporal Pooling; Real-time;
D O I
10.1007/978-3-030-05710-7_39
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Action recognition in videos is a difficult and challenging task. Recent developed deep learning-based action recognition methods have achieved the state-of-the-art performance on several action recognition benchmarks. However, it is noted that these methods are inefficient since they are of large model size and require long runtime which restrict their practical applications. In this study, we focus on improving the accuracy and efficiency of action recognition following the two-stream ConvNets by investigating the effective video-level representations. Our motivation stems from the observation that redundant information widely exists in adjacent frames in the videos and humans do not recognize actions based on frame-level features. Therefore, to extract the effective video-level features, a Hierarchical Temporal Pooling (HTP) module is proposed and a two-stream action recognition network termed as HTP-Net (Two-stream) is developed, which is carefully designed to obtain effective video-level representations by hierarchically incorporating the temporal motion and spatial appearance features. It is worth noting that all two-stream action recognition methods using optical flow as one of the inputs are computationally inefficient since calculating optical flow is time-consuming. To improve the efficiency, in our study, we do not consider using optical flow but consider only raw RGB as input to our HTP-Net termed as HTP-Net (RGB) for a clear and concise presentation. Extensive experiments have been conducted on two benchmarks: UCF101 and HMDB51. Experimental results demonstrate that HTP-Net (Two-stream) achieves the state-of-the-art performance and HTP-Net (RGB) offers competitive action recognition accuracy but is approximately 1-2 orders of magnitude faster than other state-of-the-art single stream action recognition methods. Specifically, our HTP-Net (RGB) runs at 42 videos per second (vps) and 672 frames per second (fps) on an NVIDIA Titan X GPU, which enables real-time action recognition and is of great value in practical applications.
引用
收藏
页码:471 / 482
页数:12
相关论文
共 50 条
  • [1] Second-order Temporal Pooling for Action Recognition
    Anoop Cherian
    Stephen Gould
    International Journal of Computer Vision, 2019, 127 : 340 - 362
  • [2] Spatial-temporal pooling for action recognition in videos
    Wang, Jiaming
    Shao, Zhenfeng
    Huang, Xiao
    Lu, Tao
    Zhang, Ruiqian
    Lv, Xianwei
    NEUROCOMPUTING, 2021, 451 : 265 - 278
  • [3] Second-order Temporal Pooling for Action Recognition
    Cherian, Anoop
    Gould, Stephen
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 340 - 362
  • [4] Character Recognition Using Hierarchical Vector Quantization and Temporal Pooling
    Thornton, John
    Faichney, Jolon
    Blumenstein, Michael
    Hine, Trevor
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 562 - 572
  • [5] Hierarchical Gaussian descriptor based on local pooling for action recognition
    Xuan Son Nguyen
    Abdel-Illah Mouaddib
    Thanh Phuong Nguyen
    Machine Vision and Applications, 2019, 30 : 321 - 343
  • [6] Hierarchical Gaussian descriptor based on local pooling for action recognition
    Nguyen, Xuan Son
    Mouaddib, Abdel-Illah
    Thanh Phuong Nguyen
    MACHINE VISION AND APPLICATIONS, 2019, 30 (02) : 321 - 343
  • [7] Temporal Pyramid Pooling Based Relation Network for Action Recognition
    Zheng, Zhenxing
    An, Gaoyun
    Ruan, Qiuqi
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 644 - 647
  • [8] Online robust action recognition based on a hierarchical model
    Jiang, Xinbo
    Zhong, Fan
    Peng, Qunsheng
    Qin, Xueying
    VISUAL COMPUTER, 2014, 30 (09): : 1021 - 1033
  • [9] Online robust action recognition based on a hierarchical model
    Xinbo Jiang
    Fan Zhong
    Qunsheng Peng
    Xueying Qin
    The Visual Computer, 2014, 30 : 1021 - 1033
  • [10] Online Knowledge Distillation for Efficient Action Recognition
    Wang, Jiazheng
    Bian, Cunlin
    Zhou, Xian
    Lyu, Fan
    Niu, Zhibin
    Feng, Wei
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 177 - 181