Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

被引：12

作者：

Lee, CJ ^{[1
]}

Chang, JS

Jang, JSR

机构：

[1] Chunghwa Telecom Co Ltd, Telecommun Labs, Chungli 326, Taiwan

[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 300, Taiwan

来源：

INFORMATION SCIENCES | 2006年 / 176卷 / 01期

关键词：

transliteration pair; transliteration model; parallel corpora; statistical learning; machine transliteration;

D O I：

10.1016/j.ins.2004.10.006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between Source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively. (c) 2004 Elsevier Inc. All rights reserved.

引用

页码：67 / 90

页数：24

共 50 条

[1] Extraction of name and transliteration in monolingual and parallel corpora
Lin, T
Wu, JC
Chang, JS
MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 177 - 186
[2] A hybrid model for extracting transliteration equivalents from parallel corpora
Oh, Jong-Hoon
Choi, Key-Sun
Isahara, Hitoshi
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 119 - 126
[3] Curate a transliteration corpus from transliteration/translation pairs
Wu, Shih-Hung
Li, Yu-Te
PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 208 - 213
[4] Extracting English-Korean transliteration pairs from web corpora
Oh, Jong-Hoon
Isahara, Hitoshi
COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 222 - +
[5] Transliteration and Alignment of Parallel Texts from Cyrillic to Latin
Petic, Mircea
Gifu, Daniela
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1819 - 1823
[6] Exploiting Parallel Corpus for Automatic Extraction of Multilingual Names: Transliteration Perspective
Kundu, Bibekananda
Choudhury, Sanjay Kumar
2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 608 - 612
[7] Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora
Klementiev, Alexandre
Roth, Dan
COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 817 - 824
[8] Malay Manuscripts Transliteration Using Statistical Machine Translation (SMT)
Razak, Sitti Munirah Abdul
Abu Seman, Muhamad Sadry
Ali, Wan
Mamat, Wan Yusoff Wan
Nizan, Noor Hasrul
Noor, Mohammad
2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA SCIENCES (AIDAS2019), 2019, : 137 - 141
[9] Automating Transliteration of Cuneiform from Parallel Lines with Sparse Data
Bogacz, Bartosz
Klingmann, Maximilian
Mara, Hubert
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 615 - 620
[10] Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration Pairs from Web
Chen, Chien-Hsing
Hsu, Chung-Chian
ADVANCES IN SWARM INTELLIGENCE, PT II, 2011, 6729 : 236 - +

← 1 2 3 4 5 →