A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora

被引：0

作者：

Khademian, Mahdi ^{[1
]}

Taghipour, Kaveh ^{[1
]}

Mansour, Saab ^{[2
]}

Khadivi, Shahram ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Comp Engn & IT, Human Language Technol Lab, 424 Hafez Ave, Tehran, Iran

[2] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Dept Comp Sci, Aachen, Germany

来源：

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年

关键词：

Parallel Fragment Extraction; Hough Transform; Statistical Machine Translation;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Achieving accurate translation, especially in multiple domain documents with statistical machine translation systems, requires more and more bilingual texts and this need becomes more critical when training such systems for language pairs with scarce training data. In the recent years, there have been some researches on new sources of parallel texts that are documents which are not necessarily parallel but are comparable. Since these methods search for possible translation equivalences in a greedy manner, they are unable to consider all possible parallel texts in comparable documents. This paper investigates a different approach for this need by considering relationships between all words of two comparable documents, which works fairly well even in the worst case of comparability. We represent each document pair in a matrix and then transform it to a new space to find parallel fragments. Evaluations show that the system is successful in extraction of useful fragment pairs.

引用

页码：4073 / 4079

页数：7

共 50 条

[31] Processing comparable corpora with bilingual suffix trees
Munteanu, DS
Marcu, D
PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 289 - 295
[32] Fast and accurate sentence alignment of bilingual corpora
Moore, RC
MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 135 - 144
[33] A Hybrid Approach for Automatic Extraction of Bilingual Multiword Expressions from Parallel Corpora
Semmar, Nasredine
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 311 - 318
[34] Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages
Ljubesic, Nikola
Fiser, Darja
TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 91 - 98
[35] Re-ranking for Bilingual Lexicon Extraction with Bi-directional Linear Transformation from Comparable Corpora
Zhang, Chunyue
Zhao, Tiejun
MACHINE TRANSLATION, 2016, 668 : 25 - 34
[36] Vector disambiguation for translation extraction from comparable corpora
1600, Slovene Society Informatika (37):
[37] Vector Disambiguation for Translation Extraction from Comparable Corpora
Apidianaki, Marianna
Ljubesic, Nikola
Fiser, Darja
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (02): : 193 - 202
[38] Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models
Hazem, Amir
Morin, Emmanuel
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 310 - 323
[39] Combining Bilingual Lexicons Extracted from Comparable Corpora: The Complementary Approach Between Word Embedding and Text Mining
Rhouma, Sourour Belhaj
Latiri, Chiraz
Berrut, Catherine
DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 510 - 518
[40] French-English terminology extraction from comparable corpora
Daille, B
Morin, E
NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 707 - 718

← 1 2 3 4 5 →