Creating Large-Scale Multilingual Cognate Tables

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
cognates; clustering; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Low-resource languages often suffer from a lack of high-coverage lexical resources. In this paper, we propose a method to generate cognate tables by clustering words from existing lexical resources. We then employ character-based machine translation methods in solving the task of cognate chain completion by inducing missing word translations from lower-coverage dictionaries to fill gaps in the cognate chain, finding improvements over single language pair baselines when employing simple but novel multi-language system combination on the Romance and Turkic language families. For the Romance family, we show that system combination using the results of clustering outperforms weights derived from the historical-linguistic scholarship on language phylogenies. Our approach is applicable to any language family and has not been previously performed at such scale. The cognate tables are released to the research community.
引用
收藏
页码:3411 / 3418
页数:8
相关论文
共 50 条
  • [31] Volumetric bioprinting strategies for creating large-scale tissues and organs
    Kim, Daekeun
    Kang, Dayoon
    Kim, Donghwan
    Jang, Jinah
    MRS BULLETIN, 2023, 48 (06) : 657 - 667
  • [32] Creating restoration landscapes: partnerships in large-scale conservation in the UK
    Adams, William M.
    Hodge, Ian D.
    Macgregor, Nicholas A.
    Sandbrook, Lindsey C.
    ECOLOGY AND SOCIETY, 2016, 21 (03):
  • [33] A process for creating multimetric indices for large-scale aquatic surveys
    Stoddard, John L.
    Herlihy, Alan T.
    Peck, David V.
    Hughes, Robert M.
    Whittier, Thomas R.
    Tarquinio, Ellen
    JOURNAL OF THE NORTH AMERICAN BENTHOLOGICAL SOCIETY, 2008, 27 (04): : 878 - 891
  • [34] A Bioinformatics Protocol for Quickly Creating Large-Scale Phylogenetic Trees
    Lopez-Fernandez, Hugo
    Duque, Pedro
    Henriques, Silvia
    Vazquez, Noe
    Fdez-Riverola, Florentino
    Vieira, Cristina P.
    Reboiro-Jato, Miguel
    Vieira, Jorge
    PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 803 : 88 - 96
  • [35] Information Cartography: Creating Zoomable, Large-Scale Maps of Information
    Shahaf, Dafna
    Yang, Jaewon
    Suen, Caroline
    Jacobs, Jeff
    Wang, Heidi
    Leskovec, Jure
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 1097 - 1105
  • [36] Criteria for Selecting Subsystem Configuration in Creating Large-Scale OXCs
    Tanaka, Yasuhiro
    Hasegawa, Hiroshi
    Sato, Ken-ichi
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2015, 7 (10) : 1009 - 1017
  • [37] Information technology for creating and controlling large-scale research projects
    Bagrinovskii, KA
    Bendikov, MA
    Khrustalev, EY
    AUTOMATION AND REMOTE CONTROL, 1999, 60 (08) : 1184 - 1190
  • [38] Creating space for large-scale restoration in tropical agricultural landscapes
    Latawiec, Agnieszka E.
    Strassburg, Bernardo B. N.
    Brancalion, Pedro H. S.
    Rodrigues, Ricardo R.
    Gardner, Toby
    FRONTIERS IN ECOLOGY AND THE ENVIRONMENT, 2015, 13 (04) : 211 - 218
  • [39] Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
    Kannan, Anjuli
    Datta, Arindrima
    Sainath, Tara N.
    Weinstein, Eugene
    Ramabhadran, Bhuvana
    Wu, Yonghui
    Bapna, Ankur
    Chen, Zhifeng
    Lee, Seungji
    INTERSPEECH 2019, 2019, : 2130 - 2134
  • [40] Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages
    Piao, Scott
    Rayson, Paul
    Archer, Dawn
    Bianchi, Francesca
    Dayrell, Carmen
    El-Haj, Mahmoud
    Jimenez, Ricardo-Maria
    Knight, Dawn
    Kren, Michal
    Lofberg, Laura
    Nawab, Rao Muhammad Adeel
    Shafi, Jawad
    Teh, Phoey Lee
    Mudraya, Olga
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2614 - 2619