Unsupervised extraction of phonetic units in sign language videos for natural language processing

被引:1
|
作者
Martinez-Guevara, Niels [1 ]
Rojano-Caceres, Jose-Rafael [1 ]
Curiel, Arturo [2 ]
机构
[1] Univ Veraruzana, Fac Estadist & Informat, Xalapa, Veracruz, Mexico
[2] Univ Veracruzana CONACyT, Xalapa, Veracruz, Mexico
关键词
Sign language; Machine learning; Natural language processing; Image thresholding; FRAMEWORK;
D O I
10.1007/s10209-022-00888-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sign languages (SL) are the natural languages used by Deaf communities to communicate with each other. Signers use visible parts of their bodies, like their hands, to convey messages without sound. Because of this modality change, SLs have to be represented differently in natural language processing (NLP) tasks: Inputs are regularly presented as video data rather than text or sound, which makes even simple tasks computationally intensive. Moreover, the applicability of NLP techniques to SL processing is limited by their linguistic characteristics. For instance, current research in SL recognition has centered around lexical sign identification. However, SLs tend to exhibit lower vocabulary sizes than vocal languages, as signers codify part of their message through highly iconic signs that are not lexicalized. Thus, a lot of potentially relevant information is lost to most NLP algorithms. Furthermore, most documented SL corpora contain less than a hundred video hours; far from enough to train most non-symbolic NLP approaches. This article proposes a method to achieve unsupervised identification of phonetic units in SL videos, based on Image Thresholding using The Liddell and Johnson Movement-Hold Model [13]. The procedure strives to identify the smallest possible linguistic units that may carry relevant information. This is an effort to avoid losing sub-lexical data that would be otherwise missed to most NLP algorithms. Furthermore, the process enables the elimination of noisy or redundant video frames from the input, decreasing the overall computation costs. The algorithm was tested in a collection of Mexican Sign Language videos. The relevance of the extracted segments was assessed by way of human judges. Further comparisons were carried against French Sign Language resources (LSF), so as to explore how well the algorithm performs across different SLs. The results show that the frames selected by the algorithm contained enough information to remain comprehensible to human signers. In some cases, as much as 80% of the available frames could be discarded without loss of comprehensibility, which may have direct repercussions on how SLs are represented, transmitted and processed electronically in the future.
引用
收藏
页码:1143 / 1151
页数:9
相关论文
共 50 条
  • [41] Natural language processing
    Chowdhury, GG
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 51 - 89
  • [42] Natural language processing
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357
  • [43] Natural language processing
    EDITORIAL: Automatische Sprachverarbeitung
    Hoepel-Man, Jakob, 1600, De Gruyter Oldenbourg (36):
  • [44] Natural language processing
    Anon
    1600, Knowledge Technology Inc. (15):
  • [45] Natural language processing
    Gelbukh, A
    HIS 2005: 5th International Conference on Hybrid Intelligent Systems, Proceedings, 2005, : 6 - 6
  • [46] Sign boundary and hand articulation feature recognition in Sign Language videos
    Koulierakis, Ioannis
    Siolas, Georgios
    Efthimiou, Eleni
    Fotinea, Stavroula-Evita
    Stafylopatis, Andreas-Georgios
    MACHINE TRANSLATION, 2021, 35 (03) : 323 - 343
  • [47] Sign boundary and hand articulation feature recognition in Sign Language videos
    Koulierakis, Ioannis
    Siolas, Georgios
    Efthimiou, Eleni
    Fotinea, Stavroula-Evita
    Stafylopatis, Andreas-Georgios
    Machine Translation, 2021, 35 (03): : 323 - 343
  • [48] Natural Language Processing Analysis of TikTok's Most Popular #Pitocin Videos
    Aaron, Bryan L.
    Neff, Katherine
    Cai, Fei
    Burns, Luke P.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (01) : S367 - S367
  • [49] Unsupervised learning of the morphology of a natural language
    Goldsmith, J
    COMPUTATIONAL LINGUISTICS, 2001, 27 (02) : 153 - 198
  • [50] Identifying Sign Language Videos in Video Sharing Sites
    Shipman, Frank M.
    Gutierrez-Osuna, Ricardo
    Monteiro, Caio D. D.
    ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2014, 5 (04)