Towards Producing Bilingual Lexica from Monolingual Corpora

被引:0
|
作者
Han, Jingyi [1 ]
Bel, Nuria [1 ]
机构
[1] Univ Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain
关键词
automatic bilingual lexicon production; lexical resources; bilingual dictionaries;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embedding-based vectors of only a few hundred translation equivalent word pairs. The word embedding representations of translation pairs were obtained from source and target monolingual corpora, which are not necessarily related. Our classifier is able to predict whether a new word pair is under a translation relation or not. We tested it on two quite distinct language pairs Chinese-Spanish and English-Spanish. The classifiers achieved more than 0.90 precision and recall for both language pairs in different evaluation scenarios. These results show a high potential for this method to be used in bilingual lexica production for language pairs with reduced amount of parallel or comparable corpora, in particular for phrase table expansion in Statistical Machine Translation systems.
引用
收藏
页码:2222 / 2227
页数:6
相关论文
共 50 条
  • [11] CHALLENGING THE MYTH OF MONOLINGUAL CORPORA
    Vessey, Rachelle
    APPLIED LINGUISTICS, 2019, 40 (05) : 864 - 866
  • [12] Utilizing Large Twitter Corpora to Create Sentiment Lexica
    Fredriksen, Valerij
    Jahren, Brage
    Gamback, Bjorn
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2829 - 2836
  • [13] *MWELex - MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora
    Ljubesic, Nikola
    Dobrovoljc, Kaja
    Fiser, Darja
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2015, 39 (03): : 293 - 300
  • [14] Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair
    Ljubesic, Nikola
    Espla-Gomis, Miquel
    Toral, Antonio
    Ortiz-Rojas, Sergio
    Klubicka, Filip
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2949 - 2956
  • [15] Semiautomatic acquisition of translation templates from monolingual unannotated corpora
    Hu, RL
    Zong, CQ
    Xu, B
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 163 - 167
  • [16] MERONYMY IN A TOURIST GUIDE: TOWARDS A COHERENT ANALYSIS OF BILINGUAL COMPARABLE CORPORA
    Sliwa, Dorota
    ROCZNIKI HUMANISTYCZNE, 2012, 60 (08): : 97 - 128
  • [17] Corpora as a correction tool for monolingual dictionaries
    Geyken, A
    LILI-ZEITSCHRIFT FUR LITERATURWISSENSCHAFT UND LINGUISTIK, 2004, 34 (136): : 72 - 100
  • [18] Sentence alignment for monolingual comparable corpora
    Barzilay, R
    Elhadad, N
    PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32
  • [19] Computational modeling of verb acquisition, from a monolingual to a bilingual study
    Prevot, Laurent
    Chang, Chun-Han
    Desalle, Yann
    PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 841 - 851
  • [20] Language dependency in parsing:: Evidence from monolingual and bilingual processing
    Fernández, EM
    PSYCHOLOGICA BELGICA, 1998, 38 (3-4) : 197 - 229