Towards Better Text Processing Tools for the Ainu Language

被引:0
|
作者
Nowakowski, Karol [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
机构
[1] Kitami Inst Technol, Dept Comp Sci, 165 Koen Cho, Kitami, Hokkaido 0908507, Japan
关键词
Ainu language; Endangered languages; Under-resourced languages; Transcription normalization; Word segmentation; Tokenization; Part-of-speech tagging;
D O I
10.1007/978-3-030-66527-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
引用
收藏
页码:131 / 145
页数:15
相关论文
共 50 条
  • [21] Materials for the Study of the Ainu Language and Folklore
    Hestermann, P. F.
    ANTHROPOS, 1914, 9 (3-4) : 696 - 697
  • [22] Promises of text processing: natural language processing meets AI
    Chang, JT
    Altman, RB
    DRUG DISCOVERY TODAY, 2002, 7 (19) : 992 - 993
  • [23] Towards the Development of Language Analysis Tools for the Written Latgalian Language
    Deksne, Daiga
    Vulane, Anna
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE (HLT 2020), 2020, 328 : 142 - 149
  • [24] TextFlow: Towards Better Understanding of Evolving Topics in Text
    Cui, Weiwei
    Liu, Shixia
    Tan, Li
    Shi, Conglei
    Song, Yangqiu
    Gao, Zekai J.
    Tong, Xin
    Qu, Huamin
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) : 2412 - 2421
  • [25] Towards Better Hierarchical Text Classification with Data Generation
    Wang, Yue
    Qiao, Dan
    Li, Juntao
    Chang, Jinxiong
    Zhang, Qishen
    Liu, Zhongyi
    Zhang, Guannan
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7722 - 7739
  • [26] Natural language processing for Nepali text: a review
    Tej Bahadur Shahi
    Chiranjibi Sitaula
    Artificial Intelligence Review, 2022, 55 : 3401 - 3429
  • [27] SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models
    Yim, Moonbin
    Kim, Yoonsik
    Cho, Han-Cheol
    Park, Sungrae
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 109 - 124
  • [28] NATURAL-LANGUAGE TEXT-PROCESSING
    BRINER, LL
    SAGER, N
    KITTREDGE, R
    PETRICK, SR
    BORKO, H
    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1981, 18 : 198 - 198
  • [29] Automatic Text Summarization in Natural Language Processing
    Desai, M. R.
    Gachhinakatti, Bhagyashree
    Balaganur, Pooja
    Rajeshwari, Y.
    Rathod, Laxmi
    2021 IEEE INTERNATIONAL CONFERENCE ON MOBILE NETWORKS AND WIRELESS COMMUNICATIONS (ICMNWC), 2021,
  • [30] Urdu text translation with Natural Language Processing
    Shaikh, MK
    Khowaja, HHA
    Khan, MA
    SCONEST 2004: STUDENT CONFERENCE ON ENGINEERING SCIENCES AND TECHNOLOGY, 2002, : 81 - 85