Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

被引：0

作者：

Zhou, Hao ^{[1
]}

Zhou, Wengang ^{[1
]}

Zhou, Yun ^{[1
]}

Li, Hougiang ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Beijing, Peoples R China

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and intercue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

引用

页码：13009 / 13016

页数：8

共 50 条

[31] SLOWFAST NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
Ahn, Junseok
Jang, Youngjoon
Chung, Joon Son
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3920 - 3924
[32] Continuous Sign Language Recognition with Correlation Network
Hu, Lianyu
Gao, Liqing
Liu, Zekang
Feng, Wei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2529 - 2539
[33] Continuous Sign Language Recognition with Correlation Network
Hu, Lianyu
Gao, Liqing
Liu, Zekang
Feng, Wei
arXiv, 2023,
[34] Novel Spatio-Temporal Continuous Sign Language Recognition Using an Attentive Multi-Feature Network
Aditya, Wisnu
Shih, Timothy K.
Thaipisutikul, Tipajin
Fitriajie, Arda Satata
Gochoo, Munkhjargal
Utaminingrum, Fitri
Lin, Chih-Yang
SENSORS, 2022, 22 (17)
[35] Multi-cue based 3D residual network for action recognition
Ming Zong
Ruili Wang
Zhe Chen
Maoli Wang
Xun Wang
Johan Potgieter
Neural Computing and Applications, 2021, 33 : 5167 - 5181
[36] Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis
Gao, Liqing
Liu, Peidong
Wan, Liang
Feng, Wei
COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS, CAD/GRAPHICS 2023, 2024, 14250 : 154 - 169
[37] Spatial-temporal feature-based End-to-end Fourier network for 3D sign language recognition
Abdullahi, Sunusi Bala
Chamnongthai, Kosin
Bolon-Canedo, Veronica
Cancela, Brais
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
[38] Temporal Lift Pooling for Continuous Sign Language Recognition
Hu, Lianyu
Gao, Liqing
Liu, Zekang
Feng, Wei
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 511 - 527
[39] A multi-cue guidance network for depth completion
Zhang, Yongchi
Wei, Ping
Zheng, Nanning
NEUROCOMPUTING, 2021, 441 : 291 - 299
[40] Spatial-Temporal Convolutional Attention Network for Action Recognition
Luo, Huilan
Chen, Han
Computer Engineering and Applications, 2023, 59 (09): : 150 - 158

← 1 2 3 4 5 →