A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Dept Comp Sci, Baltimore, MD 21218 USA
关键词
Bible; alignment; named entities; translation; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Transliteration from low-resource languages is difficult, in large part due to the small amounts of data available for training transliteration systems. In this paper, we evaluate the effectiveness of several translation methods in the task of transliterating around 1000 Bible names from 591 languages into English. In this extremely low-resource task, we found that a phrase-based MT system performs much better than other methods, including a g2p system and a neural MT system. However, by combining the data and training a single neural system, we discovered significant gains over single-language systems. We release the output from each system for comparative analysis.
引用
收藏
页码:938 / 943
页数:6
相关论文
共 50 条
  • [21] NeuMorph: Neural Morphological Tagging for Low-Resource Languages-An Experimental Study for Indic Languages
    Chakrabarty, Abhisek
    Chaturvedi, Akshay
    Garain, Utpal
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [22] Morphological Processing of Low-Resource Languages: Where We Are and What's Next
    Wiemerslage, Adam
    Silfverberg, Miikka
    Yang, Changbing
    McCarthy, Arya D.
    Nicolai, Garrett
    Colunga, Eliana
    Kann, Katharina
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 988 - 1007
  • [23] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
    Xu, Jin
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Li, Jian
    Zhao, Sheng
    Liu, Tie-Yan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812
  • [24] Detecting Social Media Manipulation in Low-Resource Languages
    Haider, Samar
    Luceri, Luca
    Deb, Ashok
    Badawy, Adam
    Peng, Nanyun
    Ferrara, Emilio
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1358 - 1364
  • [25] OCR Improves Machine Translation for Low-Resource Languages
    Ignat, Oana
    Maillard, Jean
    Chaudhary, Vishrav
    Guzman, Francisco
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1164 - 1174
  • [26] Low-Resource Languages Jailbreak GPT-4
    Yong, Zheng-Xin
    Menghini, Cristina
    Bach, Stephen H.
    arXiv, 2023,
  • [27] IMPROVING CAPTIONING FOR LOW-RESOURCE LANGUAGES BY CYCLE CONSISTENCY
    Wu, Yike
    Zhao, Shiwan
    Chen, Jia
    Zhang, Ying
    Yuan, Xiaojie
    Su, Zhong
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 362 - 367
  • [28] Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages
    Diwan, Anuj
    Jyothi, Preethi
    INTERSPEECH 2021, 2021, : 3445 - 3449
  • [29] Natural language processing applications for low-resource languages
    Pakray, Partha
    Gelbukh, Alexander
    Bandyopadhyay, Sivaji
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 183 - 197
  • [30] AUTOMATIC RATING OF SPONTANEOUS SPEECH FOR LOW-RESOURCE LANGUAGES
    Al-Ghezi, Ragheb
    Getman, Yaroslav
    Voskoboinik, Ekaterina
    Singh, Mittul
    Kurimo, Mikko
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 339 - 345