ANNOTATION OF COMPLEX NOUN PHRASES FROM MULTILINGUAL PARALLEL CORPUS

被引:0
|
作者
Cao, Jingxiang [1 ,2 ]
Huang, Degen [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Sch Foreign Languages, Dalian 116024, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Complex NPs; Structural ambiguity; Annotation; Multilingual parallel corpus; STRUCTURAL AMBIGUITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Noun Phrase (NP) is the dominant construct in natural language text. While base NPs (BNP) and maximal length NPs (MNP) are relatively easy to identified and extracted, the internal structure of NPs is rather a challenge in natural language processing. Penn Treebank leaves the BNPs flat as implicit right branching. Vadas and Curran added BNP internal structure to the Penn Treebank. But the results of the BNP structure are very often incorrect when it is considered within a longer complex NP (CNP). Structural ambiguity prevails in most CNPs and multilingual comparison may help improve disambiguation. We introduce a new NP annotation scheme, which is applicable to multilingual parallel corpora and discriminate genuine flat branching and right branching. Flat branching is preferred instead of binary branching wherever appropriate so as to achieve inter-lingual consistency. As a pilot task to build a gold standard corpus for structural and semantic analysis of CNPs, 381 document titles are extracted from the UN resolutions as typical examples of CNPs. Document titles in Chinese, English and Russian are manually annotated in XML format with the hope to help acquire rules for parsers or machine translators targeted at CNPs. The problems encountered are reported.
引用
收藏
页码:1440 / 1444
页数:5
相关论文
共 50 条
  • [31] Corpus evidence of anti-deletion in Black South African English noun phrases
    Botha, Yolande
    ENGLISH TODAY, 2013, 29 (01) : 16 - 21
  • [32] Noun combination in interlanguage: Typology effects in complex determiner phrases.
    Dykstra, LD
    STUDIES IN SECOND LANGUAGE ACQUISITION, 2005, 27 (03) : 478 - 479
  • [33] Noun combination in interlanguage: Typology effects in complex determiner phrases.
    Baker, BA
    Gramberg, AK
    SOUTHERN HUMANITIES REVIEW, 2005, 39 (01): : 102 - 104
  • [34] Disambiguation of semantic types in complex noun phrases for extracting candidate terms
    Department of Software Technologies and Information Systems, LIRE Laboratory, University of Constantine, 2-Abdelhamid Mehri, Constantine
    25000, Algeria
    Int. J. Metadata Semant. Ontol., 2 (112-122):
  • [35] Learning properties of Noun Phrases: from data to functions
    Quochi, Valeria
    Calderone, Basilio
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2596 - 2602
  • [36] Extracting descriptive noun phrases from conversational speech
    Kimball, O
    Iyer, R
    Gish, H
    Miller, S
    Richardson, F
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 33 - 36
  • [37] From NP to DP, vol 1, The syntax and semantics of noun phrases, vol 2, The expression of possession in noun phrases.
    Grohmann, KK
    LANGUAGE, 2006, 82 (01) : 191 - 193
  • [38] Building The Sense-Tagged Multilingual Parallel Corpus
    Wang, Shan
    Bond, Francis
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2403 - 2409
  • [39] Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
    Kvapilikova, Ivana
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    Bojar, Ondrej
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 255 - 262
  • [40] Interference and the Translation of Phraseological Units in a Parallel and Multilingual Corpus
    Sanz-Villar, Zurine
    META, 2018, 63 (01) : 72 - 93