TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus

被引:0
|
作者
Elena Álvarez-Mellado
María Luisa Díez-Platas
Pablo Ruiz-Fabo
Helena Bermúdez
Salvador Ros
Elena González-Blanco
机构
[1] UNED University,Digital Humanities Innovation Lab (LINHD), School of Computer Science
[2] CoverWallet,undefined
来源
关键词
Named-entity annotation; Annotation scheme; Historical NER; Medieval named entities; Medieval Spanish corpus;
D O I
暂无
中图分类号
学科分类号
摘要
Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.
引用
收藏
页码:525 / 549
页数:24
相关论文
共 27 条
  • [1] TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
    Alvarez-Mellado, Elena
    Diez-Platas, Maria Luisa
    Ruiz-Fabo, Pablo
    Bermudez, Helena
    Ros, Salvador
    Gonzalez-Blanco, Elena
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (02) : 525 - 549
  • [2] People and Places of Historical Europe: Bootstrapping Annotation Pipeline and a New Corpus of Named Entities in Late Medieval Texts
    Novotny, Vit
    Luger, Kristyna
    Stefanik, Michal
    Vrabcova, Tereza
    Horak, Ales
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14104 - 14113
  • [3] Towards the Annotation of Named Entities in the National Corpus of Polish
    Savary, Agata
    Waszczuk, Jakub
    Przepiorkowski, Adam
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [4] Annotation tools for syntax and named entities in the National Corpus of Polish
    Waszczuk, Jakub
    Glowinska, Katarzyna
    Savary, Agata
    Przepiorkowski, Adam
    Lenart, Michal
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2013, 5 (02) : 103 - 122
  • [5] Named Entities in Medical Case Reports: Corpus and Experiments
    Schulz, Sarah
    Seva, Jurica
    Rodriguez, Samuel
    Ostendorff, Malte
    Rehm, Georg
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4495 - 4500
  • [6] Annotation Scheme and Specification for Named Entities and Relations on Chinese Medical Knowledge Graph
    Yue, Donghui
    Zhang, Kunli
    Zhuang, Lei
    Zhao, Xu
    Byambasuren, Odmaa
    Zan, Hongying
    CHINESE LEXICAL SEMANTICS (CLSW 2019), 2020, 11831 : 563 - 574
  • [7] Medieval Spanish (12th-15th centuries) named entity recognition and attribute annotation system based on contextual information
    Diez Platas, Ma Luisa
    Ros Munoz, Salvador
    Gonzalez-Blanco, Elena
    Ruiz Fabo, Pablo
    Alvarez Mellado, Elena
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (02) : 224 - 238
  • [8] A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters
    Boros, Emanuela
    Romero, Veronica
    Maarand, Martin
    Zenklova, Katerina
    Kreckova, Jitka
    Vidal, Enrique
    Stutzmann, Dominique
    Kermorvant, Christopher
    2020 17TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2020), 2020, : 79 - 84
  • [9] Thai Named Entity Corpus Annotation Scheme and Self Verification by BiLSTM-CNN-CRF
    Sornlertlamvanich, Virach
    Suriyachay, Kitiya
    Charoenporn, Thatsanee
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2019, 2022, 13212 : 143 - 160
  • [10] Seville (8th - 15thcenturies). A corpus of medieval Spanish inscriptions
    Rodriguez Suarez, Natalia
    DOCUMENTA ET INSTRUMENTA, 2023, 21 : 287 - 289