Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge

被引：0

作者：

Chu, Chenhui ^{[1
]}

Nakazawa, Toshiaki ^{[1
]}

Kurohashi, Sadao ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan

来源：

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II | 2014年 / 8404卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction systemthat is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior knowledge and the performance can be iteratively improved. To the best of our knowledge, this is the first study that iteratively exploits both topical and contextual knowledge for bilingual lexicon extraction. Experiments conduct on Chinese-English and Japanese-English Wikipedia data show that our proposed method performs significantly better than a state-of-the-art method that only uses topical knowledge.

引用

页码：296 / 309

页数：14

共 50 条

[41] Anchoring points for bilingual lexical extraction from small, specialized, comparable corpus
Prochasson, Emmanuel
Morin, Emmanuel
TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (01): : 283 - 304
[42] Improved machine translation performance via parallel sentence extraction from comparable corpora
Munteanu, DS
Fraser, A
Marcu, D
HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 265 - 272
[43] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
Terryn, Ayla Rigouts
Hoste, Veronique
Lefever, Els
LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 385 - 418
[44] In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora
Ayla Rigouts Terryn
Véronique Hoste
Els Lefever
Language Resources and Evaluation, 2020, 54 : 385 - 418
[45] Iterative Knowledge Extraction from Social Networks
Brambilla, Marco
Ceri, Stefano
Daniel, Florian
Di Giovanni, Marco
Mauri, Andrea
Ramponi, Giorgia
COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1359 - 1364
[46] Indonesian-Japanese Term Extraction from Bilingual Corpora Using Machine Learning
Nassirudin, Muhammad
Purwarianti, Ayu
2015 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2015, : 111 - 115
[47] Combining Bilingual Lexicons Extracted from Comparable Corpora: The Complementary Approach Between Word Embedding and Text Mining
Rhouma, Sourour Belhaj
Latiri, Chiraz
Berrut, Catherine
DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 510 - 518
[48] Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data
Ma, Tengfei
Nasukawa, Tetsuya
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4075 - 4081
[49] Automatic extraction of low frequency bilingual word pairs from parallel corpora with various languages
Echizen-Ya, H
Araki, K
Momouchi, Y
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 32 - 37
[50] TExSIS Bilingual terminology extraction from parallel corpora using chunk-based alignment
Macken, Lieve
Lefever, Els
Hoste, Veronique
TERMINOLOGY, 2013, 19 (01): : 1 - 30

← 1 2 3 4 5 →