Cross-modality online distillation for multi-view action recognition

被引:14
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [21] Automatic Multi-view Action Recognition with Robust Features
    Chou, Kuang-Pen
    Prasad, Mukesh
    Li, Dong-Lin
    Bharill, Neha
    Lin, Yu-Feng
    Hussain, Farookh
    Lin, Chin-Teng
    Lin, Wen-Chieh
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 554 - 563
  • [22] Multi-View Action Recognition using Contrastive Learning
    Shah, Ketul
    Shah, Anshul
    Lau, Chun Pong
    de Melo, Celso M.
    Chellappa, Rama
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3370 - 3380
  • [23] Action Recognition with a Multi-View Temporal Attention Network
    Dengdi Sun
    Zhixiang Su
    Zhuanlian Ding
    Bin Luo
    Cognitive Computation, 2022, 14 : 1082 - 1095
  • [24] Action Recognition with a Multi-View Temporal Attention Network
    Sun, Dengdi
    Su, Zhixiang
    Ding, Zhuanlian
    Luo, Bin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1082 - 1095
  • [25] Multi-View Action Recognition One Camera At a Time
    Spurlock, Scott
    Souvenir, Richard
    2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 604 - 609
  • [26] Compositional action recognition with multi-view feature fusion
    Zhao, Zhicheng
    Liu, Yingan
    Ma, Lei
    PLOS ONE, 2022, 17 (04):
  • [27] Focal Channel Knowledge Distillation for Multi-Modality Action Recognition
    Gan, Lipeng
    Cao, Runze
    Li, Ning
    Yang, Man
    Li, Xiaochao
    IEEE ACCESS, 2023, 11 : 78285 - 78298
  • [28] CROSS-MODALITY DISTILLATION: A CASE FOR CONDITIONAL GENERATIVE ADVERSARIAL NETWORKS
    Roheda, Siddharth
    Riggan, Benjamin S.
    Krim, Hamid
    Dai, Liyi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2926 - 2930
  • [29] Multi-View and Multi-Modal Action Recognition with Learned Fusion
    Ardianto, Sandy
    Hang, Hsueh-Ming
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1601 - 1604
  • [30] Regularized Multi-View Multi-Metric Learning for Action Recognition
    Wu, Xuqing
    Shah, Shishir K.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 471 - 476