TimeBankPT: A TimeML Annotated Corpus of Portuguese

被引：0

作者：

Costa, Francisco ^{[1
]}

Branco, Antonio ^{[1
]}

机构：

[1] Univ Lisbon, P-1699 Lisbon, Portugal

来源：

LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年

关键词：

Corpora; Temporal Information Extraction; Portuguese;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its usefulness in the future. TimeBankPT is freely available for download.

引用

页码：3727 / 3734

页数：8

共 50 条

[41] A semantically annotated corpus of tombstone inscriptions
Johan Bos
International Journal of Digital Humanities, 2022, 3 (1-3) : 1 - 33
[42] A Manually Annotated Corpus of Pharmaceutical Patents
Kiss, Marton
Nagy, Agoston
Vincze, Veronika
Almasi, Attila
Alexin, Zoltan
Csirik, Janos
TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 135 - 142
[43] ROMBAC: The Romanian Balanced Annotated Corpus
Ion, Radu
Irimia, Elena
Stefanescu, Dan
Tufis, Dan
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 339 - 344
[44] Annotated Corpus of Polish Spoken Dialogues
Mykowiecka, Agnieszka
Marasek, Krzysztof
Marciniak, Malgorzata
Rabiega-Wisniewska, Joanna
Gubrynowicz, Ryszard
HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 50 - +
[45] NoNiRes: A Catalan corpus annotated with negation
Tana Velasco, Laura
Nofre Maiz, Montserrat
Calvo Figueras, Blanca
Armentano-Oller, Carme
PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 39 - 51
[46] An annotated corpus for the analysis of VP ellipsis
Bos, Johan
Spenader, Jennifer
LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (04) : 463 - 494
[47] PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors
Ozbal, Gozde
Strapparava, Carlo
Tekiroglu, Serra Sinem
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3787 - 3793
[48] PGxCorpus, a manually annotated corpus for pharmacogenomics
Legrand, Joel
Gogdemir, Romain
Bousquet, Cedric
Dalleau, Kevin
Devignes, Marie-Dominique
Digan, William
Lee, Chia-Ju
Ndiaye, Ndeye-Coumba
Petitpain, Nadine
Ringot, Patrice
Smail-Tabbone, Malika
Toussaint, Yannick
Coulet, Adrien
SCIENTIFIC DATA, 2020, 7 (01)
[49] Developing a large semantically annotated corpus
Basile, Valerio
Bos, Johan
Evang, Kilian
Venhuizen, Noortje
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3196 - 3200
[50] PGxCorpus, a manually annotated corpus for pharmacogenomics
Legrand, J.
Gogdemir, R.
Bousquet, C.
Dalleau, K.
Devignes, M. D.
Digan, W.
Lee, C. J.
Ndiaye, N. C.
Petitpain, N.
Ringot, P.
Smail-Tabbone, M.
Toussaint, Y.
Coulet, A.
FUNDAMENTAL & CLINICAL PHARMACOLOGY, 2021, 35 : 195 - 196

← 1 2 3 4 5 →