MLMSign: Multi-lingual multi-modal illumination-invariant sign language recognition

被引:0
|
作者
Sadeghzadeh, Arezoo [1 ]
Shah, A. F. M. Shahen [2 ]
Islam, Md Baharul [1 ,3 ]
机构
[1] Bahcesehir Univ, Dept Comp Engn, TR-34349 Yildiz, Istanbul, Turkiye
[2] Yildiz Tech Univ, Dept Elect & Commun Engn, Istanbul, Turkiye
[3] Florida Gulf Coast Univ, Dept Comp & Software Engn, Ft Myers, FL 33965 USA
来源
关键词
Sign language recognition; Multi-lingual system; Hand-crafted features; Ensemble learning; Illumination-invariant system; CONVOLUTIONAL NEURAL-NETWORK; HAND POSTURE; GESTURE RECOGNITION;
D O I
10.1016/j.iswa.2024.200384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sign language (SL) serves as a visual communication tool bearing great significance for deaf people to interact with others and facilitate their daily life. Wide varieties of SLs and the lack of interpretation knowledge necessitate developing automated sign language recognition (SLR) systems to attenuate the communication gap between the deaf and hearing communities. Despite numerous advanced static SLR systems, they are not practical and favorable enough for real-life scenarios once assessed simultaneously from different critical aspects: accuracy in dealing with high intra- and slight inter-class variations, robustness, computational complexity, and generalization ability. To this end, we propose a novel multi-lingual multi-modal SLR system, namely MLMSign, , by taking full strengths of hand-crafted features and deep learning models to enhance the performance and the robustness of the system against illumination changes while minimizing computational cost. The RGB sign images and 2D visualizations of their hand-crafted features, i.e., Histogram of Oriented Gradients (HOG) features and a * channel of L * a * b * color space, are employed as three input modalities to train a novel Convolutional Neural Network (CNN). The number of layers, filters, kernel size, learning rate, and optimization technique are carefully selected through an extensive parametric study to minimize the computational cost without compromising accuracy. The system's performance and robustness are significantly enhanced by jointly deploying the models of these three modalities through ensemble learning. The impact of each modality is optimized based on their impact coefficient determined by grid search. In addition to the comprehensive quantitative assessment, the capabilities of our proposed model and the effectiveness of ensembling over three modalities are evaluated qualitatively using the Grad-CAM visualization model. Experimental results on the test data with additional illumination changes verify the high robustness of our system in dealing with overexposed and underexposed lighting conditions. Achieving a high accuracy (> > 99.33%) . 33% ) on six benchmark datasets (i.e., Massey, Static ASL, NUS II, TSL Fingerspelling, BdSL36v1, and PSL) demonstrates that our system notably outperforms the recent state-of-the-art approaches with a minimum number of parameters and high generalization ability over complex datasets. Its promising performance for four different sign languages makes it a feasible system for multi-lingual applications.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Multi-lingual and multi-modal speech processing and applications
    Ivanecky, J
    Fischer, J
    Mast, M
    Kunzmann, S
    Ross, T
    Fischer, V
    PATTERN RECOGNITION, PROCEEDINGS, 2005, 3663 : 149 - 159
  • [2] Weighted Multi-modal Sign Language Recognition
    Liu, Edmond
    Lim, Jong Yoon
    MacDonald, Bruce
    Ahn, Ho Seok
    2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 880 - 885
  • [3] Large Scale Multi-Lingual Multi-Modal Summarization Dataset
    Verma, Yash
    Jangra, Anubhav
    Kumar, Raghvendra
    Saha, Sriparna
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3620 - 3632
  • [4] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2021, : 3408 - 3418
  • [5] Skeleton aware multi-modal sign language recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    arXiv, 2021,
  • [6] Skeleton Aware Multi-modal Sign Language Recognition
    Jiang, Songyao
    Sun, Bin
    Wang, Lichen
    Bai, Yue
    Li, Kunpeng
    Fu, Yun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3408 - 3418
  • [7] Multi-modal Sign Language Recognition with Enhanced Spatiotemporal Representation
    Xiao, Shiwei
    Fang, Yuchun
    Ni, Lan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
    Poignant, Johann
    Budnik, Mateusz
    Bredin, Herve
    Barras, Claude
    Stefas, Mickael
    Bruneau, Pierrick
    Adda, Gilles
    Besacier, Laurent
    Ekenel, Hazim
    Francopoulo, Gil
    Hernando, Javier
    Mariani, Joseph
    Morros, Ramon
    Quenot, Georges
    Rosset, Sophie
    Tamisier, Thomas
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1421 - 1425
  • [9] MULTI-LINGUAL DEEP NEURAL NETWORKS FOR LANGUAGE RECOGNITION
    Marcos, Luis Murphy
    Richardson, Frederick
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 330 - 334
  • [10] Placing multi-modal, and multi-lingual Data in the Humanities Domain on the Map: The Mythotopia Geotagged Corpus
    Giouli, Voula
    Vacalopoulou, Anna
    Sidiropoulos, Nikolaos
    Flouda, Christina
    Doupas, Athanasios
    Giannopoulos, Giorgos
    Bikakis, Nikos
    Kaffes, Vassilis
    Stainhaouer, Gregory
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2856 - 2864