LAE-Net: Light and Efficient Network for Compressed Video Action Recognition

被引:2
|
作者
Guo, Jinxin [1 ]
Zhang, Jiaqiang [1 ]
Zhang, Xiaojing [1 ]
Ma, Ming [1 ]
机构
[1] Inner Mongolia Univ, Hohhot, Peoples R China
来源
关键词
Action recognition; Compressed video; Transfer learning;
D O I
10.1007/978-3-031-27818-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Action recognition is a crucial task in computer vision and video analysis. The Two-stream network and 3D ConvNets are representative works. Although both of them have achieved outstanding performance, the optical flow and 3D convolution require huge computational effort, without taking into account the need for real-time applications. Current work extracts motion vectors and residuals directly from the compressed video to replace optical flow. However, due to the noisy and inaccurate representation of the motion, the accuracy of the model is significantly decreased when using motion vectors as input. Besides the current works focus only on improving accuracy or reducing computational cost, without exploring the tradeoff strategy between them. In this paper, we propose a light and efficient multi-stream framework, including a motion temporal fusion module (MTFM) and a double compressed knowledge distillation module (DCKD). MTFM improves the network's ability to extract complete motion information and compensates to some extent for the problem of inaccurate description of motion information by motion vectors in compressed video. DCKD allows the student network to gain more knowledge from teacher with less parameters and input frames. Experimental results on the two public benchmarks(UCF-101 and HMDB-51) outperform the state of the art on the compressed domain.
引用
收藏
页码:265 / 276
页数:12
相关论文
共 50 条
  • [1] FREQUENCY ENHANCEMENT NETWORK FOR EFFICIENT COMPRESSED VIDEO ACTION RECOGNITION
    Ming, Yue
    Xiong, Lu
    Jia, Xia
    Zheng, Qingfang
    Zhou, Jiangwan
    Feng, Fan
    Hu, Nannan
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 825 - 829
  • [2] LAE-Net: A locally-adaptive emb e dding network for low-light image enhancement
    Liu, Xiaokai
    Ma, Weihao
    Ma, Xiaorui
    Wang, Jie
    PATTERN RECOGNITION, 2023, 133
  • [3] AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding
    Wang, Bin
    Liu, Chunsheng
    Chang, Faliang
    Wang, Wenqian
    Li, Nanjun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5458 - 5468
  • [4] Dynamic Spatial Focus for Efficient Compressed Video Action Recognition
    Zheng, Ziwei
    Yang, Le
    Wang, Yulin
    Zhang, Miao
    He, Lijun
    Huang, Gao
    Li, Fan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 695 - 708
  • [5] Compressed Video Action Recognition
    Wu, Chao-Yuan
    Zaheer, Manzil
    Hu, Hexiang
    Manmatha, R.
    Smola, Alexander J.
    Krahenbuhl, Philipp
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6026 - 6035
  • [6] Action Keypoint Network for Efficient Video Recognition
    Chen, Xu
    Han, Yahong
    Wang, Xiaohan
    Sun, Yifan
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4980 - 4993
  • [7] EAC-Net: Efficient and Accurate Convolutional Network for Video Recognition
    Jin, Bowei
    Xu, Zhuo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11149 - 11156
  • [8] DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition
    Shou, Zheng
    Lin, Xudong
    Kalantidis, Yannis
    Sevilla-Lara, Laura
    Rohrbach, Marcus
    Chang, Shih-Fu
    Yan, Zhicheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1268 - 1277
  • [9] EPAM-Net: An efficient pose-driven attention-guided multimodal network for video action recognition
    Abdelkawy, Ahmed
    Ali, Asem
    Farag, Aly
    NEUROCOMPUTING, 2025, 633
  • [10] Multi-Stream Single Network: Efficient Compressed Video Action Recognition With a Single Multi-Input Multi-Output Network
    Terao, Hayato
    Noguchi, Wataru
    Iizuka, Hiroyuki
    Yamamoto, Masahito
    IEEE ACCESS, 2024, 12 : 20983 - 20997