Unsupervised extraction of phonetic units in sign language videos for natural language processing

被引：1

作者：

Martinez-Guevara, Niels ^{[1
]}

Rojano-Caceres, Jose-Rafael ^{[1
]}

Curiel, Arturo ^{[2
]}

机构：

[1] Univ Veraruzana, Fac Estadist & Informat, Xalapa, Veracruz, Mexico

[2] Univ Veracruzana CONACyT, Xalapa, Veracruz, Mexico

来源：

UNIVERSAL ACCESS IN THE INFORMATION SOCIETY | 2023年 / 22卷 / 04期

关键词：

Sign language; Machine learning; Natural language processing; Image thresholding; FRAMEWORK;

D O I：

10.1007/s10209-022-00888-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sign languages (SL) are the natural languages used by Deaf communities to communicate with each other. Signers use visible parts of their bodies, like their hands, to convey messages without sound. Because of this modality change, SLs have to be represented differently in natural language processing (NLP) tasks: Inputs are regularly presented as video data rather than text or sound, which makes even simple tasks computationally intensive. Moreover, the applicability of NLP techniques to SL processing is limited by their linguistic characteristics. For instance, current research in SL recognition has centered around lexical sign identification. However, SLs tend to exhibit lower vocabulary sizes than vocal languages, as signers codify part of their message through highly iconic signs that are not lexicalized. Thus, a lot of potentially relevant information is lost to most NLP algorithms. Furthermore, most documented SL corpora contain less than a hundred video hours; far from enough to train most non-symbolic NLP approaches. This article proposes a method to achieve unsupervised identification of phonetic units in SL videos, based on Image Thresholding using The Liddell and Johnson Movement-Hold Model [13]. The procedure strives to identify the smallest possible linguistic units that may carry relevant information. This is an effort to avoid losing sub-lexical data that would be otherwise missed to most NLP algorithms. Furthermore, the process enables the elimination of noisy or redundant video frames from the input, decreasing the overall computation costs. The algorithm was tested in a collection of Mexican Sign Language videos. The relevance of the extracted segments was assessed by way of human judges. Further comparisons were carried against French Sign Language resources (LSF), so as to explore how well the algorithm performs across different SLs. The results show that the frames selected by the algorithm contained enough information to remain comprehensible to human signers. In some cases, as much as 80% of the available frames could be discarded without loss of comprehensibility, which may have direct repercussions on how SLs are represented, transmitted and processed electronically in the future.

引用

页码：1143 / 1151

页数：9

共 50 条

[41] Natural language processing
Chowdhury, GG
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 51 - 89
[42] Natural language processing
Martinez, Angel R.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (03) : 352 - 357
[43] Natural language processing
EDITORIAL: Automatische Sprachverarbeitung
Hoepel-Man, Jakob, 1600, De Gruyter Oldenbourg (36):
[44] Natural language processing
Anon
1600, Knowledge Technology Inc. (15):
[45] Natural language processing
Gelbukh, A
HIS 2005: 5th International Conference on Hybrid Intelligent Systems, Proceedings, 2005, : 6 - 6
[46] Sign boundary and hand articulation feature recognition in Sign Language videos
Koulierakis, Ioannis
Siolas, Georgios
Efthimiou, Eleni
Fotinea, Stavroula-Evita
Stafylopatis, Andreas-Georgios
MACHINE TRANSLATION, 2021, 35 (03) : 323 - 343
[47] Sign boundary and hand articulation feature recognition in Sign Language videos
Koulierakis, Ioannis
Siolas, Georgios
Efthimiou, Eleni
Fotinea, Stavroula-Evita
Stafylopatis, Andreas-Georgios
Machine Translation, 2021, 35 (03): : 323 - 343
[48] Natural Language Processing Analysis of TikTok's Most Popular #Pitocin Videos
Aaron, Bryan L.
Neff, Katherine
Cai, Fei
Burns, Luke P.
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (01) : S367 - S367
[49] Unsupervised learning of the morphology of a natural language
Goldsmith, J
COMPUTATIONAL LINGUISTICS, 2001, 27 (02) : 153 - 198
[50] Identifying Sign Language Videos in Video Sharing Sites
Shipman, Frank M.
Gutierrez-Osuna, Ricardo
Monteiro, Caio D. D.
ACM TRANSACTIONS ON ACCESSIBLE COMPUTING, 2014, 5 (04)

← 1 2 3 4 5 →