Bilingual Lexicon Extraction using Locally Weighted Linear Regression from Comparable Corpora

被引:0
|
作者
Zhang, Chunyue [1 ]
Zhao, Tiejun [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
bilingual lexicon extraction; word embedding; transformation matrix; locally weighted linear regression;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently a simple linear transformation with word embedding has been found to be highly effective to extract a bilingual lexicon from comparable corpora. However, it is easy to underfit for transforming all the words just using a single transformation matrix. This paper proposes a simple non-parameter based solution using locally weighted linear regression (LWR) which forces that the closer words in the training lexicon with the target word should be more important for estimating the objective function for the regression. The experimental results confirm that the proposed solution can achieve a 36.7% relative improvement at Top-1 over the baseline approach on the English-to-Chinese bilingual lexicon extraction task.
引用
收藏
页码:13 / 16
页数:4
相关论文
共 50 条
  • [1] Addressing polysemy in bilingual lexicon extraction from comparable corpora
    Fiser, Darja
    Ljubesic, Nikola
    Kubelka, Ozren
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3031 - 3035
  • [2] Bilingual Lexicon Extraction with Forced Correlation from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    NEURAL INFORMATION PROCESSING, PT II, 2015, 9490 : 528 - 535
  • [3] Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora
    Hazem, Amir
    Morin, Emmanuel
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 288 - 292
  • [4] Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction
    Morin, Emmanuel
    Hazem, Amir
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1284 - 1293
  • [5] Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction
    Morin, Emmanuel
    Hazem, Amir
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 575 - 601
  • [6] Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge
    Chu, Chenhui
    Nakazawa, Toshiaki
    Kurohashi, Sadao
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 296 - 309
  • [7] Bilingual Lexicon Extraction from Comparable Corpora Based on Closed Concepts Mining
    Chebel, Mohamed
    Latiri, Chiraz
    Gaussier, Eric
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I, 2017, 10234 : 586 - 598
  • [8] Bilingual Lexicon Extraction with Temporal Distributed Word Representation from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 380 - 387
  • [9] Re-ranking for Bilingual Lexicon Extraction with Bi-directional Linear Transformation from Comparable Corpora
    Zhang, Chunyue
    Zhao, Tiejun
    MACHINE TRANSLATION, 2016, 668 : 25 - 34
  • [10] Efficient bilingual lexicon extraction from comparable corpora based on formal concepts analysis
    Chebel, Mohamed
    Latiri, Chiraz
    Gaussier, Eric
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (01) : 138 - 161