Research of English-Chinese alignment at word granularity on parallel corpora

被引:0
|
作者
Xu Yang [1 ]
Wang Hou-feng [1 ]
Lue Xue-qiang [2 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Beijing 100101, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICIS.2008.28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual alignment is a crucial problem in the research of natural language processing, and word alignment is a nodus among all granularities of alignment. This paper describes an English-Chinese word alignment model based on a bilingual lexicon and some language knowledge, which works on bilingual corpora. The model is built on the theory of formal optimal partition of the bilingual sentence pairs, and is ubiquitous to sentences pairs of any natural language. Particularly, we obtain some alignment strategies which are independent to alignment direction by denoting some definitions and proving a theorem. The model deals with part-matching cases, solves multi-appear-word problems and remedies the deficiency of bilingual lexicon. The experimental results show that the model can align bilingual corpora at word level effectively with a high accuracy, and maintain the grammar structure of the original sentences at the same time.
引用
收藏
页码:223 / +
页数:2
相关论文
共 50 条
  • [21] ParaMed: a parallel corpus for English-Chinese translation in the biomedical domain
    Liu, Boxiang
    Huang, Liang
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [22] Teaching Design for Translation Based on English-Chinese Parallel Corpus
    Sun, Lihua
    Li, Zhiyuan
    2017 2ND EBMEI INTERNATIONAL CONFERENCE ON EDUCATION, INFORMATION AND MANAGEMENT (EBMEI-EIM 2017, 2017, 85 : 57 - 60
  • [23] Research of Chinese-English word alignment algorithm based on bilingual dictionary
    Deng, Dan
    Liu, Qun
    Yu, Hongkui
    Jisuanji Gongcheng/Computer Engineering, 2005, 31 (16): : 45 - 47
  • [24] Translating medical terminologies through word alignment in parallel text corpora
    Deleger, Louise
    Merkel, Magnus
    Zweigenbaum, Pierre
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (04) : 692 - 701
  • [25] Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel corpora
    Ma, Q
    Kanzaki, K
    Zhang, YJ
    Murata, M
    Isahara, H
    NEURAL NETWORKS, 2004, 17 (8-9) : 1241 - 1253
  • [26] A Research on Length Based Sentence Alignment for Chinese-English Parallel Corpus
    Zan, Hongying
    Zhang, Xia
    Fan, Ming
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 145 - 149
  • [28] Building a Case-based Semantic English-Chinese Parallel Treebank
    Shi, Huaxing
    Zhao, Tiejun
    Su, Keh-Yih
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2918 - 2924
  • [29] Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
    Dalianis, Hercules
    Xing, Hao-chun
    Zhang, Xin
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1700 - 1705
  • [30] English-Chinese bilingual phrase alignment based on effective sentential-form
    Qu, Gang
    Chen, Xiao-Rong
    Lu, Ru-Zhan
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2003, 40 (02):