TimeBankPT: A TimeML Annotated Corpus of Portuguese

被引:0
|
作者
Costa, Francisco [1 ]
Branco, Antonio [1 ]
机构
[1] Univ Lisbon, P-1699 Lisbon, Portugal
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
Corpora; Temporal Information Extraction; Portuguese;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its usefulness in the future. TimeBankPT is freely available for download.
引用
收藏
页码:3727 / 3734
页数:8
相关论文
共 50 条
  • [21] JAIST Annotated Corpus of Free Conversation
    Shirai, Kiyoaki
    Fukuoka, Tomotaka
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 741 - 748
  • [22] A Semantically Annotated Swedish Medical Corpus
    Kokkinakis, Dimitrios
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 32 - 38
  • [23] Fakepedia Corpus: A Flexible Fake News Corpus in Portuguese
    Charles, Anderson Cordeiro
    Ruback, Livia
    Oliveira, Jonice
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 37 - 45
  • [24] The COPLE2 Corpus: a Learner Corpus for Portuguese
    Mendes, Amalia
    Antunes, Sandra
    Janssen, Maarten
    Goncalves, Anabela
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3207 - 3214
  • [25] An Annotated Social Media Corpus for German
    Bick, Eckhard
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6127 - 6135
  • [26] FactBank: a corpus annotated with event factuality
    Sauri, Roser
    Pustejovsky, James
    LANGUAGE RESOURCES AND EVALUATION, 2009, 43 (03) : 227 - 268
  • [27] BAAC: Bangor Arabic Annotated Corpus
    Alkhazi, Ibrahim S.
    Teahan, William J.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (11) : 131 - 140
  • [28] NegPar: a parallel corpus annotated for negation
    Liu, Qianchu
    Fancellu, Federico
    Webber, Bonnie
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3464 - 3472
  • [29] LINGUISTICALLY ANNOTATED SPOKEN NGANASAN CORPUS
    Beata, Wagner-Nagy
    Sandor, Szeverenyi
    TOMSKII ZHURNAL LINGVISTICHESKIKH I ANTROPOLOGICHESKIKH ISSLEDOVANII-TOMSK JOURNAL OF LINGUISTICS AND ANTHROPOLOGY, 2015, (02): : 25 - 34
  • [30] An annotated corpus with nanomedicine and pharmacokinetic parameters
    Lewinski, Nastassja A.
    Jimenez, Ivan
    McInnes, Bridget T.
    INTERNATIONAL JOURNAL OF NANOMEDICINE, 2017, 12 : 7519 - 7527