Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge

被引:0
|
作者
Chu, Chenhui [1 ]
Nakazawa, Toshiaki [1 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction systemthat is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior knowledge and the performance can be iteratively improved. To the best of our knowledge, this is the first study that iteratively exploits both topical and contextual knowledge for bilingual lexicon extraction. Experiments conduct on Chinese-English and Japanese-English Wikipedia data show that our proposed method performs significantly better than a state-of-the-art method that only uses topical knowledge.
引用
收藏
页码:296 / 309
页数:14
相关论文
共 50 条
  • [41] Anchoring points for bilingual lexical extraction from small, specialized, comparable corpus
    Prochasson, Emmanuel
    Morin, Emmanuel
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (01): : 283 - 304
  • [42] Improved machine translation performance via parallel sentence extraction from comparable corpora
    Munteanu, DS
    Fraser, A
    Marcu, D
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
  • [43] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
    Terryn, Ayla Rigouts
    Hoste, Veronique
    Lefever, Els
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 385 - 418
  • [44] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
    Ayla Rigouts Terryn
    Véronique Hoste
    Els Lefever
    Language Resources and Evaluation, 2020, 54 : 385 - 418
  • [45] Iterative Knowledge Extraction from Social Networks
    Brambilla, Marco
    Ceri, Stefano
    Daniel, Florian
    Di Giovanni, Marco
    Mauri, Andrea
    Ramponi, Giorgia
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1359 - 1364
  • [46] Indonesian-Japanese Term Extraction from Bilingual Corpora Using Machine Learning
    Nassirudin, Muhammad
    Purwarianti, Ayu
    2015 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2015, : 111 - 115
  • [47] Combining Bilingual Lexicons Extracted from Comparable Corpora: The Complementary Approach Between Word Embedding and Text Mining
    Rhouma, Sourour Belhaj
    Latiri, Chiraz
    Berrut, Catherine
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 510 - 518
  • [48] Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data
    Ma, Tengfei
    Nasukawa, Tetsuya
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4075 - 4081
  • [49] Automatic extraction of low frequency bilingual word pairs from parallel corpora with various languages
    Echizen-Ya, H
    Araki, K
    Momouchi, Y
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 32 - 37
  • [50] TExSIS Bilingual terminology extraction from parallel corpora using chunk-based alignment
    Macken, Lieve
    Lefever, Els
    Hoste, Veronique
    TERMINOLOGY, 2013, 19 (01): : 1 - 30