Spatiotemporal distilled dense-connectivity network for video action recognition

被引:41
|
作者
Hao, Wangli [1 ,3 ]
Zhang, Zhaoxiang [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China
[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;
D O I
10.1016/j.patcog.2019.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:13 / 24
页数:12
相关论文
共 50 条
  • [21] Deep Spatiotemporal Relation Learning With 3D Multi-Level Dense Fusion for Video Action Recognition
    Zhang, Junxuan
    Hu, Haifeng
    IEEE ACCESS, 2019, 7 : 15222 - 15229
  • [22] Dense Semantics-Assisted Networks for Video Action Recognition
    Luo, Haonan
    Lin, Guosheng
    Yao, Yazhou
    Tang, Zhenmin
    Wu, Qingyao
    Hua, Xiansheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 3073 - 3084
  • [23] A spatiotemporal and motion information extraction network for action recognition
    Wang, Wei
    Wang, Xianmin
    Zhou, Mingliang
    Wei, Xuekai
    Li, Jing
    Ren, Xiaojun
    Zong, Xuemei
    WIRELESS NETWORKS, 2024, 30 (06) : 5389 - 5405
  • [24] Action Keypoint Network for Efficient Video Recognition
    Chen, Xu
    Han, Yahong
    Wang, Xiaohan
    Sun, Yifan
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4980 - 4993
  • [25] Binary Neural Network for Video Action Recognition
    Han, Hongfeng
    Lu, Zhiwu
    Wen, Ji-Rong
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 95 - 106
  • [26] Dense Dilated Network for Few Shot Action Recognition
    Xu, Baohan
    Ye, Hao
    Zheng, Yingbin
    Wang, Heng
    Luwang, Tianyu
    Jiang, Yu-Gang
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 379 - 387
  • [27] Fusion Attention for Action Recognition: Integrating Sparse-Dense and Global Attention for Video Action Recognition
    Kim, Hyun-Woo
    Choi, Yong-Suk
    SENSORS, 2024, 24 (21)
  • [28] AGPN: Action Granularity Pyramid Network for Video Action Recognition
    Chen, Yatong
    Ge, Hongwei
    Liu, Yuxuan
    Cai, Xinye
    Sun, Liang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3912 - 3923
  • [29] Imperceptible Adversarial Attack With Multigranular Spatiotemporal Attention for Video Action Recognition
    Wu, Guoming
    Xu, Yangfan
    Li, Jun
    Shi, Zhiping
    Liu, Xianglong
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (20) : 17785 - 17796
  • [30] Multi-receptive field spatiotemporal network for action recognition
    Mu Nie
    Sen Yang
    Zhenhua Wang
    Baochang Zhang
    Huimin Lu
    Wankou Yang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2439 - 2453