ANNOTATION OF COMPLEX NOUN PHRASES FROM MULTILINGUAL PARALLEL CORPUS

被引:0
|
作者
Cao, Jingxiang [1 ,2 ]
Huang, Degen [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Sch Foreign Languages, Dalian 116024, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Complex NPs; Structural ambiguity; Annotation; Multilingual parallel corpus; STRUCTURAL AMBIGUITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Noun Phrase (NP) is the dominant construct in natural language text. While base NPs (BNP) and maximal length NPs (MNP) are relatively easy to identified and extracted, the internal structure of NPs is rather a challenge in natural language processing. Penn Treebank leaves the BNPs flat as implicit right branching. Vadas and Curran added BNP internal structure to the Penn Treebank. But the results of the BNP structure are very often incorrect when it is considered within a longer complex NP (CNP). Structural ambiguity prevails in most CNPs and multilingual comparison may help improve disambiguation. We introduce a new NP annotation scheme, which is applicable to multilingual parallel corpora and discriminate genuine flat branching and right branching. Flat branching is preferred instead of binary branching wherever appropriate so as to achieve inter-lingual consistency. As a pilot task to build a gold standard corpus for structural and semantic analysis of CNPs, 381 document titles are extracted from the UN resolutions as typical examples of CNPs. Document titles in Chinese, English and Russian are manually annotated in XML format with the hope to help acquire rules for parsers or machine translators targeted at CNPs. The problems encountered are reported.
引用
收藏
页码:1440 / 1444
页数:5
相关论文
共 50 条
  • [41] SwissAdmin: A multilingual tagged parallel corpus of press releases
    Scherrer, Yves
    Nerima, Luka
    Russo, Lorenza
    Ivanova, Maria
    Wehrli, Eric
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1832 - 1836
  • [43] Variation in the translation patterns of English complex noun phrases into Spanish in a specific domain
    Carrio Pastor, Maria Luisa
    Candel Mora, Miguel Angel
    LANGUAGES IN CONTRAST, 2013, 13 (01) : 28 - 45
  • [44] Tendency of Modifiers in English Noun Phrases from the Perspective of Cognition
    肖沂
    马玉波
    海外英语, 2013, (01) : 250 - 251+266
  • [45] English noun phrases from a functional-cognitive perspective
    Brems, Lieselotte
    ENGLISH LANGUAGE & LINGUISTICS, 2023, 27 (04) : 878 - 882
  • [46] Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus
    Ahmadi, Sina
    Hassani, Hossein
    Jaff, Daban Q.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [47] Multilingual sense intersection in a parallel corpus with diverse language families
    Bonansinga, Giulia
    Bond, Francis
    Proceedings of the 8th Global WordNet Conference, GWC 2016, 2016, : 44 - 49
  • [48] A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
    Avramidis, Eleftherios
    Costa-Jussa, Marta R.
    Federmann, Christian
    Melero, Maite
    Pecina, Pavel
    van Genabith, Josef
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2189 - 2193
  • [49] From NP to DP, vol 1, The syntax and semantics qof noun phrases. From NP to DP, vol 11, The evpression of possession in noun phrases.
    Mathieu, E
    JOURNAL OF LINGUISTICS, 2005, 41 (01) : 191 - 197
  • [50] Word Alignment Annotation in a Japanese-Chinese Parallel Corpus
    Zhang, Yujie
    Wang, Zhulong
    Uchimoto, Kiyotaka
    Ma, Qing
    Isahara, Hitoshi
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1025 - 1029