TBRNet: Two-Stream BiLSTM Residual Network for Video Action Recognition

被引：6

作者：

Wu, Xiao ^{[1
,2
]}

Ji, Qingge ^{[1
,2
]}

机构：

[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China

[2] Guangdong Key Lab Big Data Anal & Proc, Guangzhou 510006, Peoples R China

来源：

ALGORITHMS | 2020年 / 13卷 / 07期

关键词：

action recognition; bidirectional long short-term memory; residual connection; temporal attention mechanism; two-stream networks;

D O I：

10.3390/a13070169

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modeling spatiotemporal representations is one of the most essential yet challenging issues in video action recognition. Existing methods lack the capacity to accurately model either the correlations between spatial and temporal features or the global temporal dependencies. Inspired by the two-stream network for video action recognition, we propose an encoder-decoder framework named Two-Stream Bidirectional Long Short-Term Memory (LSTM) Residual Network (TBRNet) which takes advantage of the interaction between spatiotemporal representations and global temporal dependencies. In the encoding phase, the two-stream architecture, based on the proposed Residual Convolutional 3D (Res-C3D) network, extracts features with residual connections inserted between the two pathways, and then the features are fused to become the short-term spatiotemporal features of the encoder. In the decoding phase, those short-term spatiotemporal features are first fed into a temporal attention-based bidirectional LSTM (BiLSTM) network to obtain long-term bidirectional attention-pooling dependencies. Subsequently, those temporal dependencies are integrated with short-term spatiotemporal features to obtain global spatiotemporal relationships. On two benchmark datasets, UCF101 and HMDB51, we verified the effectiveness of our proposed TBRNet by a series of experiments, and it achieved competitive or even better results compared with existing state-of-the-art approaches.

引用

页码：1 / 21

页数：21

共 50 条

[31] Thermal infrared action recognition with two-stream shift Graph Convolutional Network
Liu, Jishi
Wang, Huanyu
Wang, Junnian
He, Dalin
Xu, Ruihan
Tang, Xiongfeng
MACHINE VISION AND APPLICATIONS, 2024, 35 (04)
[32] 3D Convolutional Two-Stream Network for Action Recognition in Videos
Li, Min
Qi, Yuezhu
Yang, Jian
Zhang, Yanfang
Ren, Junxing
Du, Hong
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1697 - 1701
[33] Two-stream Deep Representation for Human Action Recognition
Ghrab, Najla Bouarada
Fendri, Emna
Hammami, Mohamed
FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
[34] Improved two-stream model for human action recognition
Zhao, Yuxuan
Man, Ka Lok
Smith, Jeremy
Siddique, Kamran
Guan, Sheng-Uei
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
[35] Hidden Two-Stream Convolutional Networks for Action Recognition
Zhu, Yi
Lan, Zhenzhong
Newsam, Shawn
Hauptmann, Alexander
COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
[36] Two-Stream Dictionary Learning Architecture for Action Recognition
Xu, Ke
Jiang, Xinghao
Sun, Tanfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (03) : 567 - 576
[37] Two-Stream Gated Fusion ConvNets for Action Recognition
Zhu, Jiagang
Zou, Wei
Zhu, Zheng
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 597 - 602
[38] Two-Stream Convolutional Networks for Action Recognition in Videos
Simonyan, Karen
Zisserman, Andrew
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[39] Two-stream spatiotemporal networks for skeleton action recognition
Wang, Lei
Zhang, Jianwei
Yang, Shanmin
Gu, Song
IET IMAGE PROCESSING, 2023, 17 (11) : 3358 - 3370
[40] Improved two-stream model for human action recognition
Yuxuan Zhao
Ka Lok Man
Jeremy Smith
Kamran Siddique
Sheng-Uei Guan
EURASIP Journal on Image and Video Processing, 2020

← 1 2 3 4 5 →