Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

被引:0
|
作者
Zhou, Hao [1 ]
Zhou, Wengang [1 ]
Zhou, Yun [1 ]
Li, Hougiang [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and intercue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.
引用
收藏
页码:13009 / 13016
页数:8
相关论文
共 50 条
  • [1] Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 768 - 779
  • [2] Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition
    Yin, Wenjie
    Hou, Yonghong
    Guo, Zihui
    Liu, Kailin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1684 - 1695
  • [3] Multi-cue temporal modeling for skeleton-based sign language recognition
    Ozdemir, Ogulcan
    Baytas, Inci M.
    Akarun, Lale
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [4] Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network
    Guo, Qi
    Zhang, Shujun
    Li, Hui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1653 - 1670
  • [5] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
    Wang, Zhen
    Li, Dongyuan
    Jiang, Renhe
    Okumura, Manabu
    IEEE Access, 13 : 5491 - 5506
  • [6] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
    Wang, Zhen
    Li, Dongyuan
    Jiang, Renhe
    Okumura, Manabu
    IEEE ACCESS, 2025, 13 : 5491 - 5506
  • [7] StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
    Shen, Xiaolong
    Zheng, Zhedong
    Yang, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)
  • [8] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
    de Amorim, Cleison Correia
    Macedo, David
    Zanchettin, Cleber
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
  • [9] Multiscale temporal network for continuous sign language recognition
    Zhu, Qidan
    Li, Jing
    Yuan, Fei
    Gan, Quan
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [10] STFE-Net: A Spatial-Temporal Feature Extraction Network for Continuous Sign Language Translation
    Hu, Jiwei
    Liu, Yunfei
    Lam, Kin-Man
    Lou, Ping
    IEEE ACCESS, 2023, 11 : 46204 - 46217