Towards Producing Bilingual Lexica from Monolingual Corpora

被引:0
|
作者
Han, Jingyi [1 ]
Bel, Nuria [1 ]
机构
[1] Univ Pompeu Fabra, Roc Boronat 138, Barcelona 08018, Spain
关键词
automatic bilingual lexicon production; lexical resources; bilingual dictionaries;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Bilingual lexica are the basis for many cross-lingual natural language processing tasks. Recent works have shown success in learning bilingual dictionary by taking advantages of comparable corpora and a diverse set of signals derived from monolingual corpora. In the present work, we describe an approach to automatically learn bilingual lexica by training a supervised classifier using word embedding-based vectors of only a few hundred translation equivalent word pairs. The word embedding representations of translation pairs were obtained from source and target monolingual corpora, which are not necessarily related. Our classifier is able to predict whether a new word pair is under a translation relation or not. We tested it on two quite distinct language pairs Chinese-Spanish and English-Spanish. The classifiers achieved more than 0.90 precision and recall for both language pairs in different evaluation scenarios. These results show a high potential for this method to be used in bilingual lexica production for language pairs with reduced amount of parallel or comparable corpora, in particular for phrase table expansion in Statistical Machine Translation systems.
引用
收藏
页码:2222 / 2227
页数:6
相关论文
共 50 条
  • [21] Meaningful texts: the extraction of semantic information from monolingual and multilingual corpora
    Frazier, Stefan
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2009, 12 (04) : 489 - 492
  • [22] Extracting translation equivalents from bilingual comparable corpora
    Kaji, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02): : 313 - 323
  • [23] Automatic induction of romanization systems from bilingual corpora
    Doshisha University, Kyotanabe-shi
    610-0394, Japan
    不详
    619-0289, Japan
    IEICE Trans Inf Syst, 1600, 2 (381-393):
  • [24] Automatic discovery of translation collocations from bilingual corpora
    Barrachina, S
    Vilar, JM
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 571 - 575
  • [25] Computational modeling of verb acquisition, from a monolingual to a bilingual study
    Prévot, Laurent
    Chang, Chun-Han
    Desalle, Yann
    PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 2010, : 841 - 851
  • [26] Automatic Induction of Romanization Systems from Bilingual Corpora
    Taguchi, Keiko
    Finch, Andrew
    Yamamoto, Seiichi
    Sumita, Eiichiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (02) : 381 - 393
  • [27] Word sense acquisition from bilingual comparable corpora
    Kaji, H
    HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2003, : 111 - 118
  • [28] Translingual information retrieval: learning from bilingual corpora
    Yang, YM
    Carbonell, JG
    Brown, RD
    Frederking, RE
    ARTIFICIAL INTELLIGENCE, 1998, 103 (1-2) : 323 - 345
  • [29] Extracting paraphrase patterns from bilingual parallel corpora
    Zhao, Shiqi
    Wang, Haifeng
    Liu, Ting
    Li, Sheng
    NATURAL LANGUAGE ENGINEERING, 2009, 15 : 503 - 526
  • [30] Lexical semantic typologies from bilingual corpora - A framework
    Eger, Steffen
    *SEM 2012 - 1st Joint Conference on Lexical and Computational Semantics, 2012, 1 : 90 - 94