Spatiotemporal distilled dense-connectivity network for video action recognition

被引：41

作者：

Hao, Wangli ^{[1
,3
]}

Zhang, Zhaoxiang ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China

[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 92卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;

D O I：

10.1016/j.patcog.2019.03.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：13 / 24

页数：12

共 50 条

[41] STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition
Xu, Zhi-Wei
Wu, Xiao-Jun
Kittler, Josef
INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2021, 18 (05) : 718 - 730
[42] STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition
Zhi-Wei Xu
Xiao-Jun Wu
Josef Kittler
International Journal of Automation and Computing, 2021, 18 : 718 - 730
[43] TWO-PATHWAY TRANSFORMER NETWORK FOR VIDEO ACTION RECOGNITION
Jiang, Bo
Yu, Jiahong
Zhou, Lei
Wu, Kailin
Yang, Yang
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1089 - 1093
[44] Manet: motion-aware network for video action recognition
Li, Xiaoyang
Yang, Wenzhu
Wang, Kanglin
Wang, Tiebiao
Zhang, Chen
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (03)
[45] Multi-Kernel Excitation Network for Video Action Recognition
Tian, Qingze
Wang, Kun
Liu, Baodi
Wang, Yanjiang
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 155 - 159
[46] Multipath Attention and Adaptive Gating Network for Video Action Recognition
Haiping Zhang
Zepeng Hu
Dongjin Yu
Liming Guan
Xu Liu
Conghao Ma
Neural Processing Letters, 56
[47] SCN: Dilated silhouette convolutional network for video action recognition
Hua, Michelle
Gao, Mingqi
Zhong, Zichun
COMPUTER AIDED GEOMETRIC DESIGN, 2021, 85
[48] A Multi-Scale Video Longformer Network for Action Recognition
Chen, Congping
Zhang, Chunsheng
Dong, Xin
APPLIED SCIENCES-BASEL, 2024, 14 (03):
[49] Multipath Attention and Adaptive Gating Network for Video Action Recognition
Zhang, Haiping
Hu, Zepeng
Yu, Dongjin
Guan, Liming
Liu, Xu
Ma, Conghao
NEURAL PROCESSING LETTERS, 2024, 56 (02)
[50] FREQUENCY ENHANCEMENT NETWORK FOR EFFICIENT COMPRESSED VIDEO ACTION RECOGNITION
Ming, Yue
Xiong, Lu
Jia, Xia
Zheng, Qingfang
Zhou, Jiangwan
Feng, Fan
Hu, Nannan
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 825 - 829

← 1 2 3 4 5 →