Towards Better Text Processing Tools for the Ainu Language

被引:0
|
作者
Nowakowski, Karol [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
机构
[1] Kitami Inst Technol, Dept Comp Sci, 165 Koen Cho, Kitami, Hokkaido 0908507, Japan
关键词
Ainu language; Endangered languages; Under-resourced languages; Transcription normalization; Word segmentation; Tokenization; Part-of-speech tagging;
D O I
10.1007/978-3-030-66527-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
引用
收藏
页码:131 / 145
页数:15
相关论文
共 50 条
  • [1] Improving Basic Natural Language Processing Tools for the Ainu Language
    Nowakowski, Karol
    Ptaszynski, Michal
    Masui, Fumito
    Momouchi, Yoshio
    INFORMATION, 2019, 10 (11)
  • [2] LMTextSpotter: Towards Better Scene Text Spotting with Language Modeling in Transformer
    Xia, Xin
    Ding, Guodong
    Li, Siyuan
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 76 - 92
  • [3] Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
    Park, Albert
    Hartzler, Andrea L.
    Huh, Jina
    McDonald, David W.
    Pratt, Wanda
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2015, 17 (08)
  • [4] Ainu language use and display in the National Ainu Museum
    Santalahti, Saana
    INTERNATIONAL JOURNAL OF THE SOCIOLOGY OF LANGUAGE, 2025, 2025 (291) : 177 - 202
  • [5] Towards improved collaborative text editing CRDTs by using Natural Language Processing
    Bauwens, Jim
    De Porre, Kevin
    Boix, Elisa Gonzalez
    PROCEEDINGS OF THE 10TH WORKSHOP ON PRINCIPLES AND PRACTICE OF CONSISTENCY FOR DISTRIBUTED DATA, PAPOC 2023, 2023, : 51 - 55
  • [6] Generating Mind Map from Indonesian Text using Natural Language Processing Tools
    Saelan, Athia
    Purwarianti, Ayu
    4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013), 2013, 11 : 1163 - 1169
  • [7] Towards a Cascade of Morpho-syntactic Tools for Arabic Natural Language Processing
    Mesfar, Slim
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 150 - 162
  • [8] ALGEBRAIC TOOLS FOR LANGUAGE PROCESSING
    RUS, T
    HALVERSON, T
    COMPUTER LANGUAGES, 1994, 20 (04): : 213 - 238
  • [9] Tools in Data Science for Better Processing
    Hussien, Nur Syahela
    Sulaiman, Sarina
    Shamsuddin, Siti Mariyam
    ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS, 2016, 1750
  • [10] Text readability analysis with Natural Language Processing Tools: assessment of the "Literatura para Todos" Collection
    Rodrigues, Erica dos Santos
    Freitas, Claudia
    Quental, Violeta
    LETRAS DE HOJE-ESTUDOS E DEBATES EM LINGUISTICA LITERATURA E LINGUA PORTUGUESA, 2013, 48 (01): : 91 - 99