Towards Better Text Processing Tools for the Ainu Language

被引:0
|
作者
Nowakowski, Karol [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
机构
[1] Kitami Inst Technol, Dept Comp Sci, 165 Koen Cho, Kitami, Hokkaido 0908507, Japan
关键词
Ainu language; Endangered languages; Under-resourced languages; Transcription normalization; Word segmentation; Tokenization; Part-of-speech tagging;
D O I
10.1007/978-3-030-66527-2_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers.
引用
收藏
页码:131 / 145
页数:15
相关论文
共 50 条
  • [31] Natural language processing for Nepali text: a review
    Shahi, Tej Bahadur
    Sitaula, Chiranjibi
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3401 - 3429
  • [32] Text mining and natural language processing in construction
    Shamshiri, Alireza
    Ryu, Kyeong Rok
    Park, June Young
    AUTOMATION IN CONSTRUCTION, 2024, 158
  • [33] A NATURAL LANGUAGE PROGRAMMING SYSTEM FOR TEXT PROCESSING
    BARNETT, MP
    RUHSAM, WM
    IEEE TRANSACTIONS ON ENGINEERING WRITING AND SPEECH, 1968, EW11 (02): : 45 - &
  • [34] A Signal Processing Method for Text Language Identification
    Hassanpour, H.
    AlyanNezhadi, M. M.
    Mohammadi, M.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (06): : 1413 - 1418
  • [35] Sign language processing and interactive tools for sign language education
    Aran, Oya
    Akarun, Lale
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 83 - 86
  • [36] PermA and Balloon: Tools for string alignment and text processing
    Reichel, Uwe D.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1872 - 1875
  • [37] Assessing Emoji Use in Modern Text Processing Tools
    Shoeb, Abu Awal Md
    de Melo, Gerard
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1379 - 1388
  • [38] Towards Better Language Modeling for Thai LVCSR
    Jongtaveesataporn, Markpong
    Thienlikit, Issara
    Wutiwiwatchai, Chai
    Furui, Sadaoki
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 317 - +
  • [39] Comparison Among Four Prominent Text Processing Tools
    Luo, Jin
    Wang, Ruoyu
    Sun, Daniel
    Wang, Yingying
    Li, Guoqiang
    2018 15TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS AND NETWORKS (I-SPAN 2018), 2018, : 325 - 330
  • [40] Analysis of text intelligibility using Natural Language Processing tools: adapting Coh-Metrix metrics to Portuguese
    Scarton, Carolina Evaristo
    Aluisio, Sandra Maria
    LINGUAMATICA, 2010, 2 (01): : 45 - 61