Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

被引:0
|
作者
Zhou, Hao [1 ]
Zhou, Wengang [1 ]
Zhou, Yun [1 ]
Li, Hougiang [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and intercue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.
引用
收藏
页码:13009 / 13016
页数:8
相关论文
共 50 条
  • [31] SLOWFAST NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
    Ahn, Junseok
    Jang, Youngjoon
    Chung, Joon Son
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3920 - 3924
  • [32] Continuous Sign Language Recognition with Correlation Network
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2529 - 2539
  • [33] Continuous Sign Language Recognition with Correlation Network
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    arXiv, 2023,
  • [34] Novel Spatio-Temporal Continuous Sign Language Recognition Using an Attentive Multi-Feature Network
    Aditya, Wisnu
    Shih, Timothy K.
    Thaipisutikul, Tipajin
    Fitriajie, Arda Satata
    Gochoo, Munkhjargal
    Utaminingrum, Fitri
    Lin, Chih-Yang
    SENSORS, 2022, 22 (17)
  • [35] Multi-cue based 3D residual network for action recognition
    Ming Zong
    Ruili Wang
    Zhe Chen
    Maoli Wang
    Xun Wang
    Johan Potgieter
    Neural Computing and Applications, 2021, 33 : 5167 - 5181
  • [36] Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis
    Gao, Liqing
    Liu, Peidong
    Wan, Liang
    Feng, Wei
    COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS, CAD/GRAPHICS 2023, 2024, 14250 : 154 - 169
  • [37] Spatial-temporal feature-based End-to-end Fourier network for 3D sign language recognition
    Abdullahi, Sunusi Bala
    Chamnongthai, Kosin
    Bolon-Canedo, Veronica
    Cancela, Brais
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [38] Temporal Lift Pooling for Continuous Sign Language Recognition
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 511 - 527
  • [39] A multi-cue guidance network for depth completion
    Zhang, Yongchi
    Wei, Ping
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 441 : 291 - 299
  • [40] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    Computer Engineering and Applications, 2023, 59 (09): : 150 - 158