Spatiotemporal distilled dense-connectivity network for video action recognition

被引：41

作者：

Hao, Wangli ^{[1
,3
]}

Zhang, Zhaoxiang ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China

[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 92卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;

D O I：

10.1016/j.patcog.2019.03.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：13 / 24

页数：12

共 50 条

[21] Deep Spatiotemporal Relation Learning With 3D Multi-Level Dense Fusion for Video Action Recognition
Zhang, Junxuan
Hu, Haifeng
IEEE ACCESS, 2019, 7 : 15222 - 15229
[22] Dense Semantics-Assisted Networks for Video Action Recognition
Luo, Haonan
Lin, Guosheng
Yao, Yazhou
Tang, Zhenmin
Wu, Qingyao
Hua, Xiansheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3073 - 3084
[23] A spatiotemporal and motion information extraction network for action recognition
Wang, Wei
Wang, Xianmin
Zhou, Mingliang
Wei, Xuekai
Li, Jing
Ren, Xiaojun
Zong, Xuemei
WIRELESS NETWORKS, 2024, 30 (06) : 5389 - 5405
[24] Action Keypoint Network for Efficient Video Recognition
Chen, Xu
Han, Yahong
Wang, Xiaohan
Sun, Yifan
Yang, Yi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4980 - 4993
[25] Binary Neural Network for Video Action Recognition
Han, Hongfeng
Lu, Zhiwu
Wen, Ji-Rong
MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 95 - 106
[26] Dense Dilated Network for Few Shot Action Recognition
Xu, Baohan
Ye, Hao
Zheng, Yingbin
Wang, Heng
Luwang, Tianyu
Jiang, Yu-Gang
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 379 - 387
[27] Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition
Kim, Hyun-Woo
Choi, Yong-Suk
SENSORS, 2024, 24 (21)
[28] AGPN: Action Granularity Pyramid Network for Video Action Recognition
Chen, Yatong
Ge, Hongwei
Liu, Yuxuan
Cai, Xinye
Sun, Liang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3912 - 3923
[29] Imperceptible Adversarial Attack With Multigranular Spatiotemporal Attention for Video Action Recognition
Wu, Guoming
Xu, Yangfan
Li, Jun
Shi, Zhiping
Liu, Xianglong
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (20) : 17785 - 17796
[30] Multi-receptive field spatiotemporal network for action recognition
Mu Nie
Sen Yang
Zhenhua Wang
Baochang Zhang
Huimin Lu
Wankou Yang
International Journal of Machine Learning and Cybernetics, 2023, 14 : 2439 - 2453

← 1 2 3 4 5 →