Towards Better Text Processing Tools for the Ainu Language

被引:0
|
作者
Nowakowski, Karol [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
机构
[1] Kitami Inst Technol, Dept Comp Sci, 165 Koen Cho, Kitami, Hokkaido 0908507, Japan
关键词
Ainu language; Endangered languages; Under-resourced languages; Transcription normalization; Word segmentation; Tokenization; Part-of-speech tagging;
D O I
10.1007/978-3-030-66527-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
引用
收藏
页码:131 / 145
页数:15
相关论文
共 50 条
  • [41] An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study
    Heider, Paul M.
    Meystre, Stephane M.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [42] Towards better evaluation for human language technology
    Harman, Donna
    LARGE-SCALE KNOWLEDGE RESOURCES: CONSTRUCTION AND APPLICATION, 2008, 4938 : 344 - 350
  • [43] Towards Effective Processing of Large Text Collections
    Szymanski, Julian
    Krawczyk, Henryk
    2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2012, : 265 - 270
  • [44] The Text Analysis and Processing of Thai Language Text to Speech Conversion System
    Lin, Xuee
    Yang, Jian
    Zhao, Juan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 436 - 436
  • [45] Natural Language Processing of Radiology Text Reports: Interactive Text Classification
    Wiggins, Walter F.
    Kitamura, Felipe
    Santos, Igor
    Prevedello, Luciano M.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (04)
  • [46] Reflex Intellectual Text Processing Systems: Natural Language Text Addressing
    Lenkov, Serhii
    Kubyavka, Mykola
    Kubiavka, Liubov
    Lenkov, Yevhen
    Shevchuk, Valerii
    MOMLET&DS-2019: MODERN MACHINE LEARNING TECHNOLOGIES AND DATA SCIENCE, 2019, 2386 : 85 - 95
  • [47] Towards a better understanding of the mechanics of refactoring detection tools
    Oliveira, Jonhnanthan
    Gheyi, Rohit
    Teixeira, Leopoldo
    Ribeiro, Marcio
    Leandro, Osmar
    Fonseca, Baldoino
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 162
  • [48] Building natural language processing tools for Runyakitara
    Katushemererwe, Fridah
    Caines, Andrew
    Buttery, Paula
    APPLIED LINGUISTICS REVIEW, 2021, 12 (04) : 585 - 609
  • [49] Stone tools, predictive processing and the evolution of language
    Pain, Ross
    MIND & LANGUAGE, 2023, 38 (03) : 711 - 731
  • [50] Inventory of Tools for Dutch Clinical Language Processing
    Cornet, Ronald
    van Eldik, Armand
    de Keizer, Nicolette
    QUALITY OF LIFE THROUGH QUALITY OF INFORMATION, 2012, 180 : 245 - 249