ANNOTATION OF COMPLEX NOUN PHRASES FROM MULTILINGUAL PARALLEL CORPUS

被引:0
|
作者
Cao, Jingxiang [1 ,2 ]
Huang, Degen [1 ]
机构
[1] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Sch Foreign Languages, Dalian 116024, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Complex NPs; Structural ambiguity; Annotation; Multilingual parallel corpus; STRUCTURAL AMBIGUITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Noun Phrase (NP) is the dominant construct in natural language text. While base NPs (BNP) and maximal length NPs (MNP) are relatively easy to identified and extracted, the internal structure of NPs is rather a challenge in natural language processing. Penn Treebank leaves the BNPs flat as implicit right branching. Vadas and Curran added BNP internal structure to the Penn Treebank. But the results of the BNP structure are very often incorrect when it is considered within a longer complex NP (CNP). Structural ambiguity prevails in most CNPs and multilingual comparison may help improve disambiguation. We introduce a new NP annotation scheme, which is applicable to multilingual parallel corpora and discriminate genuine flat branching and right branching. Flat branching is preferred instead of binary branching wherever appropriate so as to achieve inter-lingual consistency. As a pilot task to build a gold standard corpus for structural and semantic analysis of CNPs, 381 document titles are extracted from the UN resolutions as typical examples of CNPs. Document titles in Chinese, English and Russian are manually annotated in XML format with the hope to help acquire rules for parsers or machine translators targeted at CNPs. The problems encountered are reported.
引用
收藏
页码:1440 / 1444
页数:5
相关论文
共 50 条
  • [1] Coreference and anaphoric relations of demonstrative noun phrases in multilingual corpus
    Vieira, R
    Salmon-Alt, S
    Gasperin, C
    ANAPHORA PROCESSING: LINGUISTIC, COGNITIVE AND COMPUTATIONAL MODELLING, 2004, 263 : 385 - 401
  • [2] Noun Phrases in a corpus of medical reports
    D'Argenio, Elisa
    Vecchia, Cesarina
    CUADERNOS DE FILOLOGIA ITALIANA, 2018, 25 : 37 - 54
  • [3] Functional Discourse Grammar and extraction from (complex) noun phrases
    Velasco, Daniel Garcia
    NOUN PHRASE IN FUNCTIONAL DISCOURSE GRAMMAR, 2008, 195 : 321 - 363
  • [4] WHAT CAN CORPUS TELL US ABOUT NOUN PHRASES (ON THE EXAMPLE OF THE NOUN MREZA)
    Ivankovic, Ivana Matas
    Bartolec, Goranka Blagus
    JEZIKOSLOVLJE, 2016, 17 (1-2): : 361 - 375
  • [5] Pragmatic Annotation of Discourse Markers in a Multilingual Parallel Corpus (Arabic-Spanish-English)
    Samy, Doaa
    Gonzalez-Ledesma, Ana
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3299 - 3305
  • [6] Complex cardinal noun phrases and scalar implicatures
    Ferreira, Marcelo
    REVISTA LETRAS, 2008, 75-76 : 197 - 212
  • [7] Syntactic Dependencies for Multilingual and Multilevel Corpus Annotation
    Mille, Simon
    Wanner, Leo
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1889 - 1896
  • [8] Crowdsourcing a multilingual speech corpus: recording, transcription and annotation of the CROWDED CORPUS
    Caines, Andrew
    Bentz, Christian
    Graham, Calbert
    Polzehl, Tim
    Buttery, Paula
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2145 - 2152
  • [9] ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus
    Imani, Ayyoob
    Sabet, Masoud Jalili
    Duller, Philipp
    Cysouw, Michael
    Schuetze, Hinrich
    ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 63 - 72
  • [10] The Occurrence of the Particle de in Chinese Attributive Noun Phrases - Evidence from News Broadcasting Corpus
    Zhang, Lu
    PROCEEDINGS OF 2019 YOUTH ACADEMIC FORUM ON LINGUISTIC, LITERATURE, TRANSLATION AND CULTURE, 2019, : 66 - 73