Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

被引:12
|
作者
Lee, CJ [1 ]
Chang, JS
Jang, JSR
机构
[1] Chunghwa Telecom Co Ltd, Telecommun Labs, Chungli 326, Taiwan
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 300, Taiwan
关键词
transliteration pair; transliteration model; parallel corpora; statistical learning; machine transliteration;
D O I
10.1016/j.ins.2004.10.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between Source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively. (c) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:67 / 90
页数:24
相关论文
共 50 条
  • [1] Extraction of name and transliteration in monolingual and parallel corpora
    Lin, T
    Wu, JC
    Chang, JS
    MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 177 - 186
  • [2] A hybrid model for extracting transliteration equivalents from parallel corpora
    Oh, Jong-Hoon
    Choi, Key-Sun
    Isahara, Hitoshi
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 119 - 126
  • [3] Curate a transliteration corpus from transliteration/translation pairs
    Wu, Shih-Hung
    Li, Yu-Te
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 208 - 213
  • [4] Extracting English-Korean transliteration pairs from web corpora
    Oh, Jong-Hoon
    Isahara, Hitoshi
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 222 - +
  • [5] Transliteration and Alignment of Parallel Texts from Cyrillic to Latin
    Petic, Mircea
    Gifu, Daniela
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1819 - 1823
  • [6] Exploiting Parallel Corpus for Automatic Extraction of Multilingual Names: Transliteration Perspective
    Kundu, Bibekananda
    Choudhury, Sanjay Kumar
    2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 608 - 612
  • [7] Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora
    Klementiev, Alexandre
    Roth, Dan
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 817 - 824
  • [8] Malay Manuscripts Transliteration Using Statistical Machine Translation (SMT)
    Razak, Sitti Munirah Abdul
    Abu Seman, Muhamad Sadry
    Ali, Wan
    Mamat, Wan Yusoff Wan
    Nizan, Noor Hasrul
    Noor, Mohammad
    2019 1ST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA SCIENCES (AIDAS2019), 2019, : 137 - 141
  • [9] Automating Transliteration of Cuneiform from Parallel Lines with Sparse Data
    Bogacz, Bartosz
    Klingmann, Maximilian
    Mara, Hubert
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 615 - 620
  • [10] Comparison of Ensemble Classifiers in Extracting Synonymous Chinese Transliteration Pairs from Web
    Chen, Chien-Hsing
    Hsu, Chung-Chian
    ADVANCES IN SWARM INTELLIGENCE, PT II, 2011, 6729 : 236 - +