A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Dept Comp Sci, Baltimore, MD 21218 USA
关键词
Bible; alignment; named entities; translation; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Transliteration from low-resource languages is difficult, in large part due to the small amounts of data available for training transliteration systems. In this paper, we evaluate the effectiveness of several translation methods in the task of transliterating around 1000 Bible names from 591 languages into English. In this extremely low-resource task, we found that a phrase-based MT system performs much better than other methods, including a g2p system and a neural MT system. However, by combining the data and training a single neural system, we discovered significant gains over single-language systems. We release the output from each system for comparative analysis.
引用
收藏
页码:938 / 943
页数:6
相关论文
共 50 条
  • [31] Special Issue: NLP in Low-Resource Languages Preface
    Soboroff, Ian
    Tong, Audrey
    MACHINE TRANSLATION, 2018, 32 (1-2) : 1 - 2
  • [32] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    DATA IN BRIEF, 2024, 52
  • [33] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [34] Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages
    Eskander, Ramy
    Klavans, Judith L.
    Muresan, Smaranda
    16TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2019), 2019, : 189 - 195
  • [35] Knowledge Transfer for Utterance Classification in Low-Resource Languages
    Smirnov, Andrei
    Mendelev, Valentin
    SPEECH AND COMPUTER, 2016, 9811 : 435 - 442
  • [36] Neural Machine Translation for Low-resource Languages: A Survey
    Ranathunga, Surangika
    Lee, En-Shiun Annie
    Skenduli, Marjana Prifti
    Shekhar, Ravi
    Alam, Mehreen
    Kaur, Rishemjit
    ACM COMPUTING SURVEYS, 2023, 55 (11)
  • [37] Loanword Identification in Low-Resource Languages with Minimal Supervision
    Mi, Chenggang
    Xie, Lei
    Zhang, Yanning
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (03)
  • [38] Efficient Entity Candidate Generation for Low-Resource Languages
    Garcia-Duran, Alberto
    Arora, Akhil
    West, Robert
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6429 - 6438
  • [39] Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT
    Triet Huynh Minh Le
    Babar, M. Ali
    Tung Hoang Thai
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 679 - 685
  • [40] Low-Resource NMT: A Case Study on theWritten and Spoken Languages in Hong Kong
    Mak, Hei Yi
    Lee, Tan
    2021 5TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2021, 2021, : 81 - 87