A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:15
|
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [1] A Named Entity Recognition Model for Medieval Latin Charters
    Chastang, Pierre
    Aguilar, Sergio Torres
    Tannier, Xavier
    DIGITAL HUMANITIES QUARTERLY, 2021, 15 (04):
  • [2] Evaluation of Named Entity Recognition in Handwritten Documents
    Villanova-Aparisi, David
    Martinez-Hinarejos, Carlos-D
    Romero, Veronica
    Pastor-Gadea, Moises
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 568 - 582
  • [3] Named Entity Recognition Approaches
    Mansouri, Alireza
    Affendey, Lilly Suriani
    Mamat, Ali
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (02): : 339 - 344
  • [4] Named Entity Recognition Approaches and Their Comparison for Custom NER Model
    Shelar H.
    Kaur G.
    Heda N.
    Agrawal P.
    Science and Technology Libraries, 2020, 39 (03): : 324 - 337
  • [5] Uzbek news corpus for named entity recognition
    Yusufu, Aizihaierjiang
    Aziz, Kamran
    Yusufu, Aizierguli
    Ainiwaer, Abidan
    Li, Fei
    Ji, Donghong
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [6] A Twitter Corpus for Named Entity Recognition in Turkish
    Carik, Buse
    Yeniterzi, Reyyan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4546 - 4551
  • [7] Thai Nested Named Entity Recognition Corpus
    Buaphet, Weerayut
    Udomcharoenchaikit, Can
    Limkonchotiwat, Peerat
    Rutherford, Attapol T.
    Nutanong, Sarana
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1473 - 1486
  • [8] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [9] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [10] Named Entity Recognition from Unstructured Handwritten Document Images
    Adak, Chandranath
    Chaudhuri, Bidyut B.
    Blumenstein, Michael
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 375 - 380