Focal Channel Knowledge Distillation for Multi-Modality Action Recognition

被引:1
|
作者
Gan, Lipeng [1 ]
Cao, Runze [1 ]
Li, Ning [1 ]
Yang, Man [1 ]
Li, Xiaochao [1 ,2 ,3 ]
机构
[1] Xiamen Univ, Dept Microelect & lntegrated Circuit, Xiamen 361005, Peoples R China
[2] Xiamen Univ Malaysia, Dept Elect & Elect Engn, Sepang 43900, Selangor, Malaysia
[3] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Action recognition; knowledge distillation; multi-modality;
D O I
10.1109/ACCESS.2023.3298647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The multi-modality action recognition aims to learn the complementary information from multiple modalities to improve the action recognition performance. However, there exists a significant modality channel difference, the equal transferring channel semantic features from multi-modalities to RGB will result in competition and redundancy during knowledge distillation. To address this issue, we propose a focal channel knowledge distillation strategy to transfer the key semantic correlations and distributions of multi-modality teachers into the RGB student network. The focal channel correlations provide intrinsic relationships and diversity properties of key semantics, and focal channel distributions provide salient channel activation of features. By ignoring the less-discriminative and irrelevant channels, the student can more efficiently utilize the channel capability to learn the complementary semantic features from the other modalities. Our focal channel knowledge distillation achieves 91.2%, 95.6%, 98.3% and 81.0% accuracy with 4.5%, 4.2%, 3.7% and 7.1% improvement on NTU 60 (CS), UTD-MHAD, N-UCLA and HMDB51 datasets comparing to unimodal RGB models. This focal channel knowledge distillation framework can also be integrated with the unimodal models to achieve the state-of-the-art performance. The extensive experiments show that the proposed method achieves 92.5%, 96.0%, 98.9%, and 82.3% accuracy on NTU 60 (CS), UTD-MHAD, N-UCLA, and HMDB51 datasets respectively.
引用
收藏
页码:78285 / 78298
页数:14
相关论文
共 50 条
  • [1] Multi-modality Fusion Network for Action Recognition
    Huang, Kai
    Qin, Zheng
    Xu, Kaiping
    Ye, Shuxiong
    Wang, Guolong
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 139 - 149
  • [2] Multi-modality learning for human action recognition
    Ziliang Ren
    Qieshi Zhang
    Xiangyang Gao
    Pengyi Hao
    Jun Cheng
    Multimedia Tools and Applications, 2021, 80 : 16185 - 16203
  • [3] Multi-modality learning for human action recognition
    Ren, Ziliang
    Zhang, Qieshi
    Gao, Xiangyang
    Hao, Pengyi
    Cheng, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16185 - 16203
  • [4] Knowledge Distillation from Multi-Modality to Single-Modality for Person Verification
    Zhang, Leying
    Chen, Zhengyang
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 1897 - 1901
  • [5] Human Action Recognition Via Multi-modality Information
    Gao, Zan
    Song, Jian-ming
    Zhang, Hua
    Liu, An-An
    Xue, Yan-Bing
    Xu, Guang-ping
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2014, 9 (02) : 739 - 748
  • [6] Multi-Modality Self-Distillation for Weakly Supervised Temporal Action Localization
    Huang, Linjiang
    Wang, Liang
    Li, Hongsheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1504 - 1519
  • [7] GCN-Based Multi-Modality Fusion Network for Action Recognition
    Liu, Shaocan
    Wang, Xingtao
    Xiong, Ruiqin
    Fan, Xiaopeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1242 - 1253
  • [8] MMA: a multi-view and multi-modality benchmark dataset for human action recognition
    Zan Gao
    Tao-tao Han
    Hua Zhang
    Yan-bing Xue
    Guang-ping Xu
    Multimedia Tools and Applications, 2018, 77 : 29383 - 29404
  • [9] MMA: a multi-view and multi-modality benchmark dataset for human action recognition
    Gao, Zan
    Han, Tao-tao
    Zhang, Hua
    Xue, Yan-bing
    Xu, Guang-ping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29383 - 29404
  • [10] Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition
    Gao, Z.
    Zhang, H.
    Xu, G. P.
    Xue, Y. B.
    NEUROCOMPUTING, 2015, 151 : 554 - 564