Research of English-Chinese alignment at word granularity on parallel corpora

被引:0
|
作者
Xu Yang [1 ]
Wang Hou-feng [1 ]
Lue Xue-qiang [2 ]
机构
[1] Peking Univ, Beijing 100871, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Beijing 100101, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICIS.2008.28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilingual alignment is a crucial problem in the research of natural language processing, and word alignment is a nodus among all granularities of alignment. This paper describes an English-Chinese word alignment model based on a bilingual lexicon and some language knowledge, which works on bilingual corpora. The model is built on the theory of formal optimal partition of the bilingual sentence pairs, and is ubiquitous to sentences pairs of any natural language. Particularly, we obtain some alignment strategies which are independent to alignment direction by denoting some definitions and proving a theorem. The model deals with part-matching cases, solves multi-appear-word problems and remedies the deficiency of bilingual lexicon. The experimental results show that the model can align bilingual corpora at word level effectively with a high accuracy, and maintain the grammar structure of the original sentences at the same time.
引用
收藏
页码:223 / +
页数:2
相关论文
共 50 条
  • [1] Issues in building English-Chinese parallel corpora with WordNets
    Bond, Francis
    Wang, Shan
    PROCEEDINGS OF THE SEVENTH GLOBAL WORDNET CONFERENCE, GWC 2014, 2014, : 391 - 399
  • [2] Integration algorithm of English-Chinese word segmentation and alignment
    Xu, Zhi-Ming
    Kit, Chun-Yu
    Webster, Jonathan J.
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4105 - +
  • [3] Word alignment of English-Chinese bilingual corpus based on chunks
    Le, S
    Youbing, J
    Du, L
    Wang, SF
    PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 110 - 116
  • [4] Phraseology in Contrast: Evidence from English-Chinese Corpora
    Li, Tao
    LANGUAGES IN CONTRAST, 2015, 15 (02) : 302 - 306
  • [5] Investigating English-Chinese Word Level Alignment by Using Semantic Similarities and Linguistic Knowledge
    Huang, Fuwei
    2015 5TH INTERNATIONAL CONFERENCE ON APPLIED SOCIAL SCIENCE (ICASS 2015), PT 2, 2015, 81 : 212 - 216
  • [6] Bertalign: Improved word embedding-based sentence alignment for Chinese-English parallel corpora of literary texts
    Liu, Lei
    Zhu, Min
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (02) : 621 - 634
  • [7] A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources
    Srivastava, Jyoti
    Sanyal, Sudip
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 185 - 188
  • [8] Research on English-Chinese machine translation shift based on word vector similarity
    Ma, Qingqing
    ARTIFICIAL LIFE AND ROBOTICS, 2024, 29 (04) : 585 - 589
  • [9] Chinese Measure Word Dictionary: A Chinese-English English-Chinese Usage Guide
    Yu, Li
    Fang Jiqing
    MODERN LANGUAGE JOURNAL, 2010, 94 (01): : 160 - 161
  • [10] Alignment and matching of bilingual English-Chinese news texts
    Xu, Donghua
    Tan, Chew Lim
    2000, Kluwer Academic Publishers, Dordrecht, Netherlands (14)