TimeBankPT: A TimeML Annotated Corpus of Portuguese

被引:0
|
作者
Costa, Francisco [1 ]
Branco, Antonio [1 ]
机构
[1] Univ Lisbon, P-1699 Lisbon, Portugal
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
Corpora; Temporal Information Extraction; Portuguese;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its usefulness in the future. TimeBankPT is freely available for download.
引用
收藏
页码:3727 / 3734
页数:8
相关论文
共 50 条
  • [1] An annotated corpus with support verb constructions in Portuguese
    Rassi, Amanda Pontes
    Baptista, Jorge
    Vale, Oto Araujo
    GRAGOATA-UFF, 2015, 20 (38): : 207 - 230
  • [2] Translation errors from English to Portuguese: an annotated corpus
    Costa, Angela
    Luis, Tiago
    Coheur, Luisa
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1231 - 1234
  • [3] A Portuguese-Spanish Corpus Annotated for Subject Realization and Referentiality
    Rello, Luz
    Gayo, Iria
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 154 - 157
  • [4] A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
    Pontes, Elvys Linhares
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    Linhares, Andrea Carneiro
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3192 - 3196
  • [5] Corref-PT:A Semi-Automatic Annotated Portuguese Coreference Corpus
    Vieira, Renata
    Mendes, Amalia
    Quaresma, Paulo
    Fonseca, Evandro
    Collovini, Sandra
    Antunes, Sandra
    COMPUTACION Y SISTEMAS, 2018, 22 (04): : 1259 - 1267
  • [6] Tell the Spaniards: Portuguese Poetry of the Liberal Triennium. Analysis and Bilingual Annotated Corpus
    Ruiz Mas, Jose
    HISTORIA CONSTITUCIONAL, 2021, (22): : 1077 - 1084
  • [7] An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing
    Carnaz, Goncalo
    Antunes, Mario
    Nogueira, Vitor Beires
    DATA, 2021, 6 (07)
  • [8] PTPARL-D: an annotated corpus of forty-four years of Portuguese parliamentary debates
    Almeida, Paulo
    Marques-Pita, Manuel
    Goncalves-Sa, Joana
    CORPORA, 2021, 16 (03) : 337 - 348
  • [9] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Casanova, Edresson
    Junior, Arnaldo Candido
    Shulby, Christopher
    de Oliveira, Frederico Santos
    Teixeira, Joao Paulo
    Ponti, Moacir Antonelli
    Aluisio, Sandra
    LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 1043 - 1055
  • [10] TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese
    Edresson Casanova
    Arnaldo Candido Junior
    Christopher Shulby
    Frederico Santos de Oliveira
    João Paulo Teixeira
    Moacir Antonelli Ponti
    Sandra Aluísio
    Language Resources and Evaluation, 2022, 56 : 1043 - 1055