A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters

被引:15
|
作者
Boros, Emanuela [1 ]
Romero, Veronica [2 ]
Maarand, Martin [1 ]
Zenklova, Katerina [3 ]
Kreckova, Jitka [3 ]
Vidal, Enrique [2 ]
Stutzmann, Dominique [4 ]
Kermorvant, Christopher [1 ]
机构
[1] TEKLIA, Paris, France
[2] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[3] Narodni Arch, Prague, Czech Republic
[4] IRHT CNRS, Paris, France
关键词
Named entity recognition; Handwritten Text Recognition; historical document processing; multilingualism;
D O I
10.1109/ICFHR2020.2020.00025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a new corpus of multilingual medieval handwritten charter images, annotated with full transcription and named entities. The corpus is used to compare two approaches for named entity recognition in historical document images in several languages: on the one hand, a sequential approach, more commonly used, that sequentially applies handwritten text recognition (HTR) and named entity recognition (NER), on the other hand, a combined approach that simultaneously transcribes the image text line and extracts the entities. Experiments conducted on the charter corpus in Latin, early new high German and old Czech for name, date and location recognition demonstrate a superior performance of the combined approach.
引用
收藏
页码:79 / 84
页数:6
相关论文
共 50 条
  • [31] Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
    Jarrar, Mustafa
    Khalilia, Mohammed
    Ghanem, Sana
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3626 - 3636
  • [32] A web-based Bengali news corpus for named entity recognition
    Asif Ekbal
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2008, 42 : 173 - 182
  • [33] Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications
    Kanwal, Safia
    Malik, Kamran
    Shahzad, Khurram
    Aslam, Faisal
    Nawaz, Zubair
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [34] A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies
    Ngoc-Trinh Vu
    Van-Hien Tran
    Thi-Huyen-Trang Doan
    Hoang-Quynh Le
    Mai-Vu Tran
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 141 - 149
  • [35] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [36] A web-based Bengali news corpus for named entity recognition
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (02) : 173 - 182
  • [37] Big Data and Named Entity Recognition Approaches for Urdu Language
    Jamil, Qudsia
    Zafar, Muhammad Rehman
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 4 (16):
  • [38] Named Entity Recognition in Turkish with Bayesian Learning and Hybrid Approaches
    RehaYavuz, Sermet
    Kucuk, Dilek
    Yazici, Adnan
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 129 - 138
  • [39] Comparison of named entity recognition methodologies in biomedical documents
    Song, Hye-Jeong
    Jo, Byeong-Cheol
    Park, Chan-Young
    Kim, Jong-Dae
    Kim, Yu-Seop
    BIOMEDICAL ENGINEERING ONLINE, 2018, 17
  • [40] Clinical named-entity recognition: A short comparison
    Lossio-Ventura, Juan Antonio
    Boussard, Sebastien
    Morzan, Juandiego
    Hernandez-Boussard, Tina
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1548 - 1550