Reading Order Independent Metrics for Information Extraction in Handwritten Documents

被引:0
|
作者
Villanova-Aparisi, David [1 ]
Tarride, Solene [2 ]
Martinez-Hinarejos, Carlos-D [1 ]
Romero, Veronica [3 ]
Kermorvant, Christopher [2 ]
Pastor-Gadea, Moises [1 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Cami Vera S-N, Valencia 46021, Spain
[2] TEKLIA, Paris, France
[3] Univ Valencia, Dept Informat, Valencia 46010, Spain
关键词
Information Extraction; Evaluation Metrics; Reading Order; Full Page Recognition; End-to-End Model; RECOGNITION; DISTANCE; CORPUS;
D O I
10.1007/978-3-031-70536-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors. Therefore, they do not reflect the expected final application of the system and introduce biases in more complex documents. In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents. In our experimentation, we perform an in-depth analysis of the behavior of the metrics to recommend what we consider to be the minimal set of metrics to evaluate a task correctly.
引用
收藏
页码:191 / 215
页数:25
相关论文
共 50 条
  • [1] Reading order detection on handwritten documents
    Quiros, Lorenzo
    Vidal, Enrique
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (12): : 9593 - 9611
  • [2] Reading order detection on handwritten documents
    Lorenzo Quirós
    Enrique Vidal
    Neural Computing and Applications, 2022, 34 : 9593 - 9611
  • [3] Extraction of handwritten information in geometrically distorted documents
    Safari, R
    Narasimhamurthi, N
    Shridhar, M
    Ahmadi, M
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1298 - 1300
  • [4] Handwritten information extraction from historical census documents
    Nion, Thibauld
    Menasri, Fares
    Louradour, Jerome
    Sibade, Cedric
    Retornaz, Thomas
    Metaireau, Pierre-Yves
    Kermorvant, Christopher
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 822 - 826
  • [5] Information Extraction from Handwritten Tables in Historical Documents
    Andres, Jose
    Ramon Prieto, Jose
    Granell, Emilio
    Romero, Veronica
    Andreu Sanchez, Joan
    Vidal, Enrique
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 184 - 198
  • [6] Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
    Ryu, Jewoong
    Koo, Hyung Il
    Cho, Nam Ik
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1115 - 1119
  • [7] Date Field Extraction in Handwritten Documents
    Mandal, Ranju
    Roy, Partha Pratim
    Pal, Umapada
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 533 - 536
  • [8] DANIEL: a fast document attention network for information extraction and labelling of handwritten documents
    Constum, Thomas
    Tranouez, Pierrick
    Paquet, Thierry
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2025,
  • [9] Reading Handwritten German Words in Historical Documents
    Steinke, Karl-Heinz
    Zhang, Yuanchen
    2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 1294 - 1298
  • [10] Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
    Zhang, Chong
    Guo, Ya
    Tu, Yi
    Chen, Huan
    Tang, Jinyang
    Zhu, Huijia
    Zhang, Qi
    Gui, Tao
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13716 - 13730