TimeBankPT: A TimeML Annotated Corpus of Portuguese

被引:0
|
作者
Costa, Francisco [1 ]
Branco, Antonio [1 ]
机构
[1] Univ Lisbon, P-1699 Lisbon, Portugal
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
Corpora; Temporal Information Extraction; Portuguese;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its usefulness in the future. TimeBankPT is freely available for download.
引用
收藏
页码:3727 / 3734
页数:8
相关论文
共 50 条
  • [41] A semantically annotated corpus of tombstone inscriptions
    Johan Bos
    International Journal of Digital Humanities, 2022, 3 (1-3) : 1 - 33
  • [42] A Manually Annotated Corpus of Pharmaceutical Patents
    Kiss, Marton
    Nagy, Agoston
    Vincze, Veronika
    Almasi, Attila
    Alexin, Zoltan
    Csirik, Janos
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 135 - 142
  • [43] ROMBAC: The Romanian Balanced Annotated Corpus
    Ion, Radu
    Irimia, Elena
    Stefanescu, Dan
    Tufis, Dan
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 339 - 344
  • [44] Annotated Corpus of Polish Spoken Dialogues
    Mykowiecka, Agnieszka
    Marasek, Krzysztof
    Marciniak, Malgorzata
    Rabiega-Wisniewska, Joanna
    Gubrynowicz, Ryszard
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 50 - +
  • [45] NoNiRes: A Catalan corpus annotated with negation
    Tana Velasco, Laura
    Nofre Maiz, Montserrat
    Calvo Figueras, Blanca
    Armentano-Oller, Carme
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 39 - 51
  • [46] An annotated corpus for the analysis of VP ellipsis
    Bos, Johan
    Spenader, Jennifer
    LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (04) : 463 - 494
  • [47] PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
    Ozbal, Gozde
    Strapparava, Carlo
    Tekiroglu, Serra Sinem
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3787 - 3793
  • [48] PGxCorpus, a manually annotated corpus for pharmacogenomics
    Legrand, Joel
    Gogdemir, Romain
    Bousquet, Cedric
    Dalleau, Kevin
    Devignes, Marie-Dominique
    Digan, William
    Lee, Chia-Ju
    Ndiaye, Ndeye-Coumba
    Petitpain, Nadine
    Ringot, Patrice
    Smail-Tabbone, Malika
    Toussaint, Yannick
    Coulet, Adrien
    SCIENTIFIC DATA, 2020, 7 (01)
  • [49] Developing a large semantically annotated corpus
    Basile, Valerio
    Bos, Johan
    Evang, Kilian
    Venhuizen, Noortje
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3196 - 3200
  • [50] PGxCorpus, a manually annotated corpus for pharmacogenomics
    Legrand, J.
    Gogdemir, R.
    Bousquet, C.
    Dalleau, K.
    Devignes, M. D.
    Digan, W.
    Lee, C. J.
    Ndiaye, N. C.
    Petitpain, N.
    Ringot, P.
    Smail-Tabbone, M.
    Toussaint, Y.
    Coulet, A.
    FUNDAMENTAL & CLINICAL PHARMACOLOGY, 2021, 35 : 195 - 196