A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Dept Comp Sci, Baltimore, MD 21218 USA
关键词
Bible; alignment; named entities; translation; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Transliteration from low-resource languages is difficult, in large part due to the small amounts of data available for training transliteration systems. In this paper, we evaluate the effectiveness of several translation methods in the task of transliterating around 1000 Bible names from 591 languages into English. In this extremely low-resource task, we found that a phrase-based MT system performs much better than other methods, including a g2p system and a neural MT system. However, by combining the data and training a single neural system, we discovered significant gains over single-language systems. We release the output from each system for comparative analysis.
引用
收藏
页码:938 / 943
页数:6
相关论文
共 50 条
  • [1] Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages
    Upadhyay, Shyam
    Kodner, Jordan
    Roth, Dan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 501 - 511
  • [2] Phonology-Augmented Statistical Transliteration for Low-Resource Languages
    Hoang Gia Ngo
    Chen, Nancy F.
    Nguyen Binh Minh
    Ma, Bin
    Li, Haizhou
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3670 - 3674
  • [3] Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages
    Le, Ngoc Tan
    Sadat, Fatiha
    NAMED ENTITIES, 2018, : 95 - 100
  • [4] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [5] How Low is Too Low? A Computational Perspective on Extremely Low-Resource Languages
    Bansal, Rachit
    Choudhary, Himanshu
    Punia, Ravneet
    Schenk, Niko
    Dahl, Jacob L.
    Page-Perron, Emilie
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 44 - 59
  • [6] Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages
    Downey, C. M.
    Drizin, Shannon
    Haroutunian, Levon
    Thukral, Shivin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5331 - 5346
  • [7] Comparative Analysis of Transformer Models for Sentiment Analysis in Low-Resource Languages
    Aliyu, Yusuf
    Sarlan, Aliza
    Danyaro, Kamaluddeen Usman
    Rahman, Abdulahi Sani B. A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (04) : 353 - 364
  • [8] Voice Activation for Low-Resource Languages
    Kolesau, Aliaksei
    Sesok, Dmitrij
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [9] Low-Resource Machine Transliteration Using Recurrent Neural Networks
    Ngoc Tan Le
    Sadat, Fatiha
    Menard, Lucie
    Dien Dinh
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
  • [10] Effective Entity Disambiguation in Low-Resource Languages: A Study of Icelandic
    Eggertsson, Valdimar Agust
    Johannesson, Benedikt Geir
    Einarsson, Hafsteinn
    Loftsson, Hrafn
    2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 318 - 324