MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

被引:0
|
作者
Xiong, Jianbo [1 ]
Zou, Shinan [1 ]
Tang, Jin [1 ]
Tjahjadi, Tardi [2 ]
机构
[1] Cent South Univ, Sch Automation, Changsha, Peoples R China
[2] Univ Warwick, Sch Engn, Coventry, England
来源
VISUAL COMPUTER | 2024年 / 40卷 / 10期
关键词
Biometrics; Human identification; Gait recognition; Multimodal co-learning distillation; Spatial-temporal graph reasoning;
D O I
10.1007/s00371-024-03426-y
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Gait recognition in the wild has attracted the attention of the academic community. However, existing unimodal algorithms cannot achieve the same performance on in-the-wild datasets as in-the-lab datasets because unimodal data have many limitations in-the-wild environments. Therefore, we propose a multimodal approach combining silhouettes and skeletons and formulate the multimodal gait recognition problem as a multimodal co-learning problem. In particular, we propose a multimodal co-learning distillation network (MCDGait) that integrates two sub-networks processing unimodal data into a single fusion network. Based on the semantic consistency of different modalities and the paradigm of deep mutual learning, the performance of the entire network is continuously improved via the bidirectional knowledge distillation between the sub-networks and fusion network. Inspired by the observation that specific body parts or joints exhibit unique motion characteristics and have linkage with other parts or joints during walking, we propose a spatial-temporal graph reasoning module (ST-GRM). This module represents the parts or joints as graph nodes and the motion linkages between them as edges. By utilizing dynamic graph generator, the module implicitly captures the dynamic changes of the human body. Based on the generated graphs, the independent spatial-temporal linkage feature of each part and the interactive spatial-temporal linkage feature are aggregated simultaneously. Extensive experiments conducted on two in-the-wild datasets demonstrate the state-of-the-art performance of the proposed method. The average rank-1 accuracy on datasets Gait3D and GREW is 50.90% and 58.06%, respectively. The source code can be obtained from https://github.com/BoyeXiong/MCDGait.
引用
收藏
页码:7221 / 7234
页数:14
相关论文
共 50 条
  • [31] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
    Chen, Lian
    Lu, Ke
    Niu, Zehai
    Wei, Runchen
    Xue, Jian
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [32] A New Partitioned Spatial-Temporal Graph Attention Convolution Network for Human Motion Recognition
    Guo, Keyou
    Wang, Pengshuo
    Shi, Peipeng
    He, Chengbo
    Wei, Caili
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [33] A Separable Spatial-Temporal Graph Learning Approach for Skeleton-Based Action Recognition
    Zheng, Hui
    Zhao, Ye-Sheng
    Zhang, Bo
    Shang, Guo-Qiang
    IEEE SENSORS LETTERS, 2024, 8 (11)
  • [34] Adaptive Spatial-Temporal Aware Graph Learning for EEG-Based Emotion Recognition
    Ye, Weishan
    Wang, Jiyuan
    Chen, Lin
    Dai, Lifei
    Sun, Zhe
    Liang, Zhen
    CYBORG AND BIONIC SYSTEMS, 2024, 5
  • [35] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
    Dong Tian
    Zhe-Ming Lu
    Xiao Chen
    Long-Hua Ma
    Multimedia Tools and Applications, 2020, 79 : 12679 - 12697
  • [36] An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition
    Tian, Dong
    Lu, Zhe-Ming
    Chen, Xiao
    Ma, Long-Hua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 12679 - 12697
  • [37] Skeletal Spatial-Temporal Semantics Guided Homogeneous-Heterogeneous Multimodal Network for Action Recognition
    Zhang, Chenwei
    Hu, Yuxuan
    Yang, Min
    Li, Chengming
    Hu, Xiping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3657 - 3666
  • [38] Attentive graph structure learning embedded in deep spatial-temporal graph neural network for traffic forecasting
    Pritam Bikram
    Shubhajyoti Das
    Arindam Biswas
    Applied Intelligence, 2024, 54 : 2716 - 2749
  • [39] Attentive graph structure learning embedded in deep spatial-temporal graph neural network for traffic forecasting
    Bikram, Pritam
    Das, Shubhajyoti
    Biswas, Arindam
    APPLIED INTELLIGENCE, 2024, 54 (03) : 2716 - 2749
  • [40] Multimodal joint prediction of traffic spatial-temporal data with graph sparse attention mechanism and bidirectional temporal convolutional network
    Zhang, Dongran
    Yan, Jiangnan
    Polat, Kemal
    Alhudhaif, Adi
    Li, Jun
    ADVANCED ENGINEERING INFORMATICS, 2024, 62