A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora

被引：0

作者：

Khademian, Mahdi ^{[1
]}

Taghipour, Kaveh ^{[1
]}

Mansour, Saab ^{[2
]}

Khadivi, Shahram ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Comp Engn & IT, Human Language Technol Lab, 424 Hafez Ave, Tehran, Iran

[2] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Dept Comp Sci, Aachen, Germany

来源：

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年

关键词：

Parallel Fragment Extraction; Hough Transform; Statistical Machine Translation;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Achieving accurate translation, especially in multiple domain documents with statistical machine translation systems, requires more and more bilingual texts and this need becomes more critical when training such systems for language pairs with scarce training data. In the recent years, there have been some researches on new sources of parallel texts that are documents which are not necessarily parallel but are comparable. Since these methods search for possible translation equivalences in a greedy manner, they are unable to consider all possible parallel texts in comparable documents. This paper investigates a different approach for this need by considering relationships between all words of two comparable documents, which works fairly well even in the worst case of comparability. We represent each document pair in a matrix and then transform it to a new space to find parallel fragments. Evaluations show that the system is successful in extraction of useful fragment pairs.

引用

页码：4073 / 4079

页数：7

共 50 条

[21] Knowledge extraction from bilingual corpora
Somers, H
INFORMATION EXTRACTION: TOWARDS SCALABLE, ADAPTABLE SYSTEMS, 1999, 1714 : 120 - 133
[22] Sentence alignment for monolingual comparable corpora
Barzilay, R
Elhadad, N
PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32
[23] Bilingual comparable corpora and the training of translators
Zanettin, F
META, 1998, 43 (04) : 616 - 630
[24] Combining Lexical Context with Pseudo-alignment for Bilingual Lexicon Extraction from Comparable Corpora
Li, Bo
Zhu, Qunyan
He, Tingting
Chen, Qianjun
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 223 - 233
[25] Reducing hub translation candidates improves the accuracy of bilingual lexicon extraction from comparable corpora
2016, Japanese Society for Artificial Intelligence (31)
[26] Parallel sentence generation from comparable corpora for improved SMT
Rauf, Sadaf Abdul
Schwenk, Holger
MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
[27] PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
Ion, Radu
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2181 - 2188
[28] Bilingual Contexts from Comparable Corpora to Mine for Translations of Collocations
Taslimipoor, Shiva
Mitkov, Ruslan
Pastor, Gloria Corpas
Fazly, Afsaneh
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 115 - 126
[29] Terminology Extraction from Comparable Corpora for Latvian
Gornostay, Tatiana
Ramm, Anita
Heid, Ulrich
Morin, Emmanuel
Harastani, Rima
Planas, Emmanuel
HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 66 - +
[30] Improving Bilingual Terminology Extraction from Comparable Corpora via Multiple Word-Space Models
Hazem, Amir
Morin, Emmanuel
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4184 - 4187

← 1 2 3 4 5 →