Spatiotemporal distilled dense-connectivity network for video action recognition

被引:41
|
作者
Hao, Wangli [1 ,3 ]
Zhang, Zhaoxiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China
[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;
D O I
10.1016/j.patcog.2019.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 50 条
  • [31] Multi-receptive field spatiotemporal network for action recognition
    Nie, Mu
    Yang, Sen
    Wang, Zhenhua
    Zhang, Baochang
    Lu, Huimin
    Yang, Wankou
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (07) : 2439 - 2453
  • [32] Spatiotemporal attention enhanced features fusion network for action recognition
    Danfeng Zhuang
    Min Jiang
    Jun Kong
    Tianshan Liu
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 823 - 841
  • [33] A Spatiotemporal Fusion Network For Skeleton-Based Action Recognition
    Bao, Wenxia
    Wang, Junyi
    Yang, Xianjun
    Chen, Hemu
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 347 - 352
  • [34] Spatiotemporal attention enhanced features fusion network for action recognition
    Zhuang, Danfeng
    Jiang, Min
    Kong, Jun
    Liu, Tianshan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (03) : 823 - 841
  • [35] A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition
    Chen, Enqing
    Bai, Xue
    Gao, Lei
    Tinega, Haron Chweya
    Ding, Yingqiang
    IEEE ACCESS, 2019, 7 : 57267 - 57275
  • [36] D3D: Distilled 3D Networks for Video Action Recognition
    Stroud, Jonathan C.
    Ross, David A.
    Sun, Chen
    Deng, Jia
    Sukthankar, Rahul
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 614 - 623
  • [37] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [38] Badminton video action recognition based on time network
    Zhi, Juncai
    Sun, Zijie
    Zhang, Ruijie
    Zhao, Zhouxiang
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2739 - 2752
  • [39] Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1250 - 1261
  • [40] Human-Body Action Recognition Based on Dense Trajectories and Video Saliency
    Gao Deyong
    Kang Zibing
    Wang Song
    Wang Yangping
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (24)