Hierarchical Temporal Pooling for Efficient Online Action Recognition

被引：0

作者：

Zhang, Can ^{[1
]}

Zou, Yuexian ^{[1
,2
]}

Chen, Guang ^{[1
]}

机构：

[1] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

MULTIMEDIA MODELING (MMM 2019), PT I | 2019年 / 11295卷

关键词：

Action recognition; Hierarchical Temporal Pooling; Real-time;

D O I：

10.1007/978-3-030-05710-7_39

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition in videos is a difficult and challenging task. Recent developed deep learning-based action recognition methods have achieved the state-of-the-art performance on several action recognition benchmarks. However, it is noted that these methods are inefficient since they are of large model size and require long runtime which restrict their practical applications. In this study, we focus on improving the accuracy and efficiency of action recognition following the two-stream ConvNets by investigating the effective video-level representations. Our motivation stems from the observation that redundant information widely exists in adjacent frames in the videos and humans do not recognize actions based on frame-level features. Therefore, to extract the effective video-level features, a Hierarchical Temporal Pooling (HTP) module is proposed and a two-stream action recognition network termed as HTP-Net (Two-stream) is developed, which is carefully designed to obtain effective video-level representations by hierarchically incorporating the temporal motion and spatial appearance features. It is worth noting that all two-stream action recognition methods using optical flow as one of the inputs are computationally inefficient since calculating optical flow is time-consuming. To improve the efficiency, in our study, we do not consider using optical flow but consider only raw RGB as input to our HTP-Net termed as HTP-Net (RGB) for a clear and concise presentation. Extensive experiments have been conducted on two benchmarks: UCF101 and HMDB51. Experimental results demonstrate that HTP-Net (Two-stream) achieves the state-of-the-art performance and HTP-Net (RGB) offers competitive action recognition accuracy but is approximately 1-2 orders of magnitude faster than other state-of-the-art single stream action recognition methods. Specifically, our HTP-Net (RGB) runs at 42 videos per second (vps) and 672 frames per second (fps) on an NVIDIA Titan X GPU, which enables real-time action recognition and is of great value in practical applications.

引用

页码：471 / 482

页数：12

共 50 条

[1] Second-order Temporal Pooling for Action Recognition
Anoop Cherian
Stephen Gould
International Journal of Computer Vision, 2019, 127 : 340 - 362
[2] Spatial-temporal pooling for action recognition in videos
Wang, Jiaming
Shao, Zhenfeng
Huang, Xiao
Lu, Tao
Zhang, Ruiqian
Lv, Xianwei
NEUROCOMPUTING, 2021, 451 : 265 - 278
[3] Second-order Temporal Pooling for Action Recognition
Cherian, Anoop
Gould, Stephen
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 340 - 362
[4] Character Recognition Using Hierarchical Vector Quantization and Temporal Pooling
Thornton, John
Faichney, Jolon
Blumenstein, Michael
Hine, Trevor
AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 562 - 572
[5] Hierarchical Gaussian descriptor based on local pooling for action recognition
Xuan Son Nguyen
Abdel-Illah Mouaddib
Thanh Phuong Nguyen
Machine Vision and Applications, 2019, 30 : 321 - 343
[6] Hierarchical Gaussian descriptor based on local pooling for action recognition
Nguyen, Xuan Son
Mouaddib, Abdel-Illah
Thanh Phuong Nguyen
MACHINE VISION AND APPLICATIONS, 2019, 30 (02) : 321 - 343
[7] Temporal Pyramid Pooling Based Relation Network for Action Recognition
Zheng, Zhenxing
An, Gaoyun
Ruan, Qiuqi
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 644 - 647
[8] Online robust action recognition based on a hierarchical model
Jiang, Xinbo
Zhong, Fan
Peng, Qunsheng
Qin, Xueying
VISUAL COMPUTER, 2014, 30 (09): : 1021 - 1033
[9] Online robust action recognition based on a hierarchical model
Xinbo Jiang
Fan Zhong
Qunsheng Peng
Xueying Qin
The Visual Computer, 2014, 30 : 1021 - 1033
[10] Online Knowledge Distillation for Efficient Action Recognition
Wang, Jiazheng
Bian, Cunlin
Zhou, Xian
Lyu, Fan
Niu, Zhibin
Feng, Wei
2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 177 - 181

← 1 2 3 4 5 →