Multi-cue temporal modeling for skeleton-based sign language recognition

被引:10
|
作者
Ozdemir, Ogulcan [1 ]
Baytas, Inci M. [1 ]
Akarun, Lale [1 ]
机构
[1] Bogazici Univ, Comp Engn Dept, Perceptual Intelligence Lab, Istanbul, Turkiye
关键词
sign language recognition; spatio-temporal representation learning; graph convolutional networks; long short-term memory networks; deep learning-based human action recognition; HAND;
D O I
10.3389/fnins.2023.1148191
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Sign languages are visual languages used as the primary communication medium for the Deaf community. The signs comprise manual and non-manual articulators such as hand shapes, upper body movement, and facial expressions. Sign Language Recognition (SLR) aims to learn spatial and temporal representations from the videos of the signs. Most SLR studies focus on manual features often extracted from the shape of the dominant hand or the entire frame. However, facial expressions combined with hand and body gestures may also play a significant role in discriminating the context represented in the sign videos. In this study, we propose an isolated SLR framework based on Spatial-Temporal Graph Convolutional Networks (ST-GCNs) and Multi-Cue Long Short-Term Memorys (MC-LSTMs) to exploit multi-articulatory (e.g., body, hands, and face) information for recognizing sign glosses. We train an ST-GCN model for learning representations from the upper body and hands. Meanwhile, spatial embeddings of hand shape and facial expression cues are extracted from Convolutional Neural Networks (CNNs) pre-trained on large-scale hand and facial expression datasets. Thus, the proposed framework coupling ST-GCNs with MC-LSTMs for multi-articulatory temporal modeling can provide insights into the contribution of each visual Sign Language (SL) cue to recognition performance. To evaluate the proposed framework, we conducted extensive analyzes on two Turkish SL benchmark datasets with different linguistic properties, BosphorusSign22k and AUTSL. While we obtained comparable recognition performance with the skeleton-based state-of-the-art, we observe that incorporating multiple visual SL cues improves the recognition performance, especially in certain sign classes where multi-cue information is vital.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Hougiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13009 - 13016
  • [2] Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 768 - 779
  • [3] Asymmetric multi-branch GCN for skeleton-based sign language recognition
    Liu, Yuhong
    Lu, Fei
    Cheng, Xianpeng
    Yuan, Ying
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 75293 - 75319
  • [4] Language-guided temporal primitive modeling for skeleton-based action recognition
    Pan, Qingzhe
    Xie, Xuemei
    NEUROCOMPUTING, 2025, 613
  • [5] An effective skeleton-based approach for multilingual sign language recognition
    Renjith, S.
    Suresh, M. S. Sumi
    Rashmi, Manazhy
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [6] SML: A Skeleton-based multi-feature learning method for sign language recognition
    Deng, Zhiwen
    Leng, Yuquan
    Hu, Jing
    Lin, Zengrong
    Li, Xuerui
    Gao, Qing
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [7] Skeleton-based Online Sign Language Recognition using Monotonic Attention
    Takayama, Natsuki
    Benitez-Garcia, Gibran
    Takahashi, Hiroki
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 601 - 608
  • [8] Hand Graph Topology Selection for Skeleton-based Sign Language Recognition
    Ozdemir, Ogulcan
    Baytas, Inci M.
    Akarun, Lale
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [9] SKIM: Skeleton-Based Isolated Sign Language Recognition With Part Mixing
    Lin, Kezhou
    Wang, Xiaohan
    Zhu, Linchao
    Zhang, Bang
    Yang, Yi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4271 - 4280
  • [10] Multi-Grained Temporal Segmentation Attention Modeling for Skeleton-Based Action Recognition
    Lv, Jinrong
    Gong, Xun
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 927 - 931