Spatiotemporal distilled dense-connectivity network for video action recognition

被引:41
|
作者
Hao, Wangli [1 ,3 ]
Zhang, Zhaoxiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China
[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;
D O I
10.1016/j.patcog.2019.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 50 条
  • [1] Spatiotemporal Pyramid Network for Video Action Recognition
    Wang, Yunbo
    Long, Mingsheng
    Wang, Jianmin
    Yu, Philip S.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2097 - 2106
  • [2] Dense Dilated Network for Video Action Recognition
    Xu, Baohan
    Ye, Hao
    Zheng, Yingbin
    Wang, Heng
    Luwang, Tianyu
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) : 4941 - 4953
  • [3] Sparse Dense Transformer Network for Video Action Recognition
    Qu, Xiaochun
    Zhang, Zheyuan
    Xiao, Wei
    Ran, Jinye
    Wang, Guodong
    Zhang, Zili
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 43 - 56
  • [4] Video spatiotemporal mapping for human action recognition by convolutional neural network
    Zare, Amin
    Abrishami Moghaddam, Hamid
    Sharifi, Arash
    PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) : 265 - 279
  • [5] Video spatiotemporal mapping for human action recognition by convolutional neural network
    Amin Zare
    Hamid Abrishami Moghaddam
    Arash Sharifi
    Pattern Analysis and Applications, 2020, 23 : 265 - 279
  • [6] Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition
    Luo H.
    Tong K.
    Tongxin Xuebao/Journal on Communications, 2019, 40 (10): : 189 - 198
  • [7] Multi-scale Spatiotemporal Information Fusion Network for Video Action Recognition
    Cai, Yutong
    Lin, Weiyao
    See, John
    Cheng, Ming-Ming
    Liu, Guangcan
    Xiong, Hongkai
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [8] Spatiotemporal Residual Networks for Video Action Recognition
    Feichtenhofer, Christoph
    Pinz, Axel
    Wildes, Richard P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [9] Spatiotemporal Fusion Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    Zhang, Junxuan
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
  • [10] Spatiotemporal Relation Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    IEEE ACCESS, 2019, 7 : 14969 - 14976