A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:15
|
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [21] Named entity recognition through corpus transformation and system combination
    Troyano, JA
    Carrillo, V
    Enríquez, F
    Galán, FJ
    ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 255 - 266
  • [22] Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)
    Salah, Ramzi Esmail
    Zakaria, Lailatul Qadri Binti
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 150 - 157
  • [23] A Comparison of Performance of Sequential Learning Algorithms on the Task of Named Entity Recognition for Indian Languages
    Krishnarao, Awaghad Ashish
    Gahlot, Himanshu
    Srinet, Amit
    Kushwaha, D. S.
    COMPUTATIONAL SCIENCE - ICCS 2009, PART I, 2009, 5544 : 123 - 132
  • [24] An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition
    Hoxha, Klesti
    Baxhaku, Artur
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 95 - 108
  • [25] GENETAG: a tagged corpus for gene/protein named entity recognition
    Tanabe, L
    Xie, N
    Thom, LH
    Matten, W
    Wilbur, WJ
    BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [26] Corpus of Carbonate Platforms with Lexical Annotations for Named Entity Recognition
    Hu, Zhichen
    Ren, Huali
    Jiang, Jielin
    Cui, Yan
    Hu, Xiumian
    Xu, Xiaolong
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (01): : 91 - 108
  • [27] CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes
    Nastou, Katerina
    Koutrouli, Mikaela
    Pyysalo, Sampo
    Jensen, Lars Juhl
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [28] Chinese Named Entity Recognition Methods Combined with Entity Boundary Cues
    Huang, Rong
    Chen, Yanping
    Hu, Ying
    Huang, Ruizhang
    Qin, Yongbin
    Computer Engineering and Applications, 2024, 60 (06) : 199 - 206
  • [29] Emerging Named Entity Recognition on Retrieval Features in an Affective Computing Corpus
    Nawroth, Christian
    Engel, Felix
    Mc Kevitt, Paul
    Hemmje, Matthias L.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2860 - 2868
  • [30] LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain
    Pais, Vasile
    Mitrofan, Maria
    Gasan, Carol Luca
    Ianov, Alexandru
    Ghita, Corvin
    Coneschi, Vlad Silviu
    Onut, Andrei
    SEMANTIC WEB, 2024, 15 (03) : 831 - 844