Bilingual Lexicon Extraction using Locally Weighted Linear Regression from Comparable Corpora

被引:0
|
作者
Zhang, Chunyue [1 ]
Zhao, Tiejun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
bilingual lexicon extraction; word embedding; transformation matrix; locally weighted linear regression;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, it is easy to underfit for transforming all the words just using a single transformation matrix. This paper proposes a simple non-parameter based solution using locally weighted linear regression (LWR) which forces that the closer words in the training lexicon with the target word should be more important for estimating the objective function for the regression. The experimental results confirm that the proposed solution can achieve a 36.7% relative improvement at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.
引用
收藏
页码:13 / 16
页数:4
相关论文
共 50 条
  • [21] Knowledge extraction from bilingual corpora
    Somers, H
    INFORMATION EXTRACTION: TOWARDS SCALABLE, ADAPTABLE SYSTEMS, 1999, 1714 : 120 - 133
  • [22] Bilingual Lexicon Extraction from Arabic-English Parallel Corpora with a View to Machine Translation
    Sabtan, Yasser Muhammad Naguib
    ARAB WORLD ENGLISH JOURNAL, 2016, : 317 - 336
  • [23] Automatic Methods for the Extension of a Bilingual Dictionary using Comparable Corpora
    Rosner, Michael
    Sultana, Kurt
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3790 - 3797
  • [24] Bilingual Contexts from Comparable Corpora to Mine for Translations of Collocations
    Taslimipoor, Shiva
    Mitkov, Ruslan
    Pastor, Gloria Corpas
    Fazly, Afsaneh
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 115 - 126
  • [25] HDR IMAGE RECONSTRUCTION USING LOCALLY WEIGHTED LINEAR REGRESSION
    Li, Xiaofen
    Huo, Yongqing
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
  • [26] Terminology Extraction from Comparable Corpora for Latvian
    Gornostay, Tatiana
    Ramm, Anita
    Heid, Ulrich
    Morin, Emmanuel
    Harastani, Rima
    Planas, Emmanuel
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 66 - +
  • [27] Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
    Hazem, Amir
    Morin, Emmanuel
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4184 - 4187
  • [28] Unsupervised word-sense disambiguation using bilingual comparable corpora
    Kaji, H
    Morimoto, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02) : 289 - 301
  • [29] Automatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora
    Gamallo Otero, Pablo
    Pichel Campos, Jose Ramom
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 473 - +
  • [30] Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
    Ljubesic, Nikola
    Fiser, Darja
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 91 - 98