Creating Large-Scale Multilingual Cognate Tables

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
cognates; clustering; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Low-resource languages often suffer from a lack of high-coverage lexical resources. In this paper, we propose a method to generate cognate tables by clustering words from existing lexical resources. We then employ character-based machine translation methods in solving the task of cognate chain completion by inducing missing word translations from lower-coverage dictionaries to fill gaps in the cognate chain, finding improvements over single language pair baselines when employing simple but novel multi-language system combination on the Romance and Turkic language families. For the Romance family, we show that system combination using the results of clustering outperforms weights derived from the historical-linguistic scholarship on language phylogenies. Our approach is applicable to any language family and has not been previously performed at such scale. The cognate tables are released to the research community.
引用
收藏
页码:3411 / 3418
页数:8
相关论文
共 50 条
  • [1] CogNet: a Large-Scale Cognate Database
    Batsuren, Khuyagbaatar
    Bella, Gabor
    Giunchiglia, Fausto
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3136 - 3145
  • [2] A Large-Scale Multilingual Disambiguation of Glosses
    Camacho-Collados, Jose
    Bovi, Claudio Delli
    Raganato, Alessandro
    Navigli, Roberto
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1701 - 1708
  • [3] MultiSubs: A Large-scale Multimodal and Multilingual Dataset
    Wang, Josiah
    Figueiredo, Josiel
    Specia, Lucia
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6776 - 6785
  • [4] MLS: A Large-Scale Multilingual Dataset for Speech Research
    Pratap, Vineel
    Xu, Qiantong
    Sriram, Anuroop
    Synnaeve, Gabriel
    Collobert, Ronan
    INTERSPEECH 2020, 2020, : 2757 - 2761
  • [5] Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
    Duquenne, Paul-Ambroise
    Gong, Hongyu
    Schwenk, Holger
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Large-Scale Data Dictionaries Based on Hash Tables
    Juhasz, Sandor
    INTELLIGENT DISTRIBUTED COMPUTING, SYSTEMS AND APPLICATIONS, 2008, 162 : 257 - 262
  • [7] Creating Change and Developing Large-scale Organisations
    Durrani, Tariq S.
    Forbes, Sheila
    2007 IEEE INTERNATIONAL ENGINEERING MANAGEMENT CONFERENCE, 2007, : 64 - 69
  • [8] Mechatronic analysis of large-scale CNC rotary tables
    Ding, Wenzheng
    Zhu, Songqing
    Wang, Mulan
    Huang, Xiaodiao
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2015, 51 (11): : 165 - 170
  • [9] FbMultiLingMisinfo: Challenging Large-Scale Multilingual Benchmark for Misinformation Detection
    Barnabo, Giorgio
    Siciliano, Federico
    Castillo, Carlos
    Leonardi, Stefano
    Nakov, Preslav
    Martino, Giovanni Da San
    Silvestri, Fabrizio
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] On the Multilingual Capabilities of Very Large-Scale English Language Models
    Armengol-Estape, Jordi
    de Gibert Bonet, Ona
    Melero, Maite
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3056 - 3068