Creating Large-Scale Multilingual Cognate Tables

被引:0
|
作者
Wu, Winston [1 ]
Yarowsky, David [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
cognates; clustering; transliteration;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Low-resource languages often suffer from a lack of high-coverage lexical resources. In this paper, we propose a method to generate cognate tables by clustering words from existing lexical resources. We then employ character-based machine translation methods in solving the task of cognate chain completion by inducing missing word translations from lower-coverage dictionaries to fill gaps in the cognate chain, finding improvements over single language pair baselines when employing simple but novel multi-language system combination on the Romance and Turkic language families. For the Romance family, we show that system combination using the results of clustering outperforms weights derived from the historical-linguistic scholarship on language phylogenies. Our approach is applicable to any language family and has not been previously performed at such scale. The cognate tables are released to the research community.
引用
收藏
页码:3411 / 3418
页数:8
相关论文
共 50 条
  • [41] Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
    Jou, Brendan
    Chen, Tao
    Pappas, Nikolaos
    Redi, Miriam
    Topkara, Mercan
    Chang, Shih-Fu
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 159 - 168
  • [42] VISCOUNTH: A Large-scale Multilingual Visual Question Answering Dataset for Cultural Heritage
    Becattini, Federico
    Bongini, Pietro
    Bulla, Luana
    Marinucci, Ludovica
    del Bimbo, Alberto
    Mongiovi, Misael
    Presutti, Valentina
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [43] Arguing for Multilingual Motivation in Web 2.0: an Evaluation of a Large-Scale European Pilot
    Hainey, Tom
    Connolly, Thomas
    Stansfield, Mark
    Boyle, Liz
    Josephson, Joel
    O'Donovan, Aisling
    Ortiz, Claudia Rodriguez
    Tsvetkova, Nina
    Stoimenova, Bistra
    Tsvetanova, Sevda
    PROCEEDINGS OF THE 3RD EUROPEAN CONFERENCE ON GAMES BASED LEARNING, 2009, : 164 - 172
  • [44] XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
    Hasan, Tahmid
    Bhattacharjee, Abhik
    Islam, Md Saiful
    Samin, Kazi
    Li, Yuan-Fang
    Kang, Yong-Bin
    Rahman, M. Sohel
    Shahriyar, Rifat
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4693 - 4703
  • [45] GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
    Hagiwara, Masato
    Mita, Masato
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6761 - 6768
  • [46] SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
    Duquenne, Paul-Ambroise
    Gong, Hongyu
    Dong, Ning
    Du, Jingfei
    Lee, Ann
    Goswami, Vedanuj
    Wang, Changhan
    Pino, Juan
    Sagot, Benoit
    Schwenk, Holger
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 16251 - 16269
  • [47] MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark
    Macko, Dominik
    Moro, Robert
    Uchendu, Adaku
    Lucas, Jason Samuel
    Yamashita, Michiharu
    MatusPikuliak
    Srba, Ivan
    Le, Thai
    Lee, Dongwon
    Simko, Jakub
    Bielikova, Maria
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9960 - 9987
  • [48] Efficient Visualization of Large-scale Data Tables through Reordering and Entropy Minimization
    Djuric, Nemanja
    Vucetic, Slobodan
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 121 - 130
  • [49] LARGE-SCALE MONTE CARLO SIMULATIONS FOR ZEROS IN CHARACTER TABLES OF SYMMETRIC GROUPS
    Miller, Alexander Rossi
    Scheinerman, Danny
    MATHEMATICS OF COMPUTATION, 2025, 94 (351) : 505 - 515
  • [50] Flash Embedding: Storing Embedding Tables in SSD for Large-Scale Recommender Systems
    Wan, Hu
    Sun, Xuan
    Cui, Yufei
    Yang, Chia-Lin
    Kuo, Tei-Wei
    Xue, Chun Jason
    APSYS '21: PROCEEDINGS OF THE 12TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, 2021, : 9 - 16