Reading Order Independent Metrics for Information Extraction in Handwritten Documents

被引:0
|
作者
Villanova-Aparisi, David [1 ]
Tarride, Solene [2 ]
Martinez-Hinarejos, Carlos-D [1 ]
Romero, Veronica [3 ]
Kermorvant, Christopher [2 ]
Pastor-Gadea, Moises [1 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Cami Vera S-N, Valencia 46021, Spain
[2] TEKLIA, Paris, France
[3] Univ Valencia, Dept Informat, Valencia 46010, Spain
关键词
Information Extraction; Evaluation Metrics; Reading Order; Full Page Recognition; End-to-End Model; RECOGNITION; DISTANCE; CORPUS;
D O I
10.1007/978-3-031-70536-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors. Therefore, they do not reflect the expected final application of the system and introduce biases in more complex documents. In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents. In our experimentation, we perform an in-depth analysis of the behavior of the metrics to recommend what we consider to be the minimal set of metrics to evaluate a task correctly.
引用
收藏
页码:191 / 215
页数:25
相关论文
共 50 条
  • [31] Text Line Extraction from Multi-skewed Handwritten Documents
    Jiang Yong
    Chen Xiaojing
    PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 412 - +
  • [32] Extraction of chemical information from documents
    Villar, Hugo O.
    Betancort, Juan
    Hansen, Mark R.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240
  • [33] Editorial: Information extraction for health documents
    Mensa, Enrico
    Fernandez, Paloma Martinez
    Roller, Roland
    Radicioni, Daniele P.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [34] Information Extraction from Legal Documents
    Cheng, Tin Tin
    Cua, Jeffrey Leonard
    Tan, Mark Davies
    Yao, Kenneth Gerard
    Roxas, Rachel Edita
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 157 - +
  • [35] Multi-Oriented Text Line Extraction from Handwritten Arabic Documents
    Ouwayed, Nazih
    Belaid, Abdel
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 339 - 346
  • [36] A Thresholding Approach for Text Extraction in Handwritten Historical Documents using Adaptive Morphology
    Roy, Bishakha
    Chatterjee, Rohit Kamal
    2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 198 - 203
  • [37] A general approach for multi-oriented text line extraction of handwritten documents
    Nazih Ouwayed
    Abdel Belaïd
    International Journal on Document Analysis and Recognition (IJDAR), 2012, 15 : 297 - 314
  • [38] A general approach for multi-oriented text line extraction of handwritten documents
    Ouwayed, Nazih
    Belaid, Abdel
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2012, 15 (04) : 297 - 314
  • [39] Clustering Web Documents with Tables for Information Extraction
    Shchekotykhin, Kostyantyn
    Jannach, Dietmar
    Friedrich, Gerhard
    K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 169 - 170
  • [40] Semantic information extraction from Tamil documents
    Pandian, S. Lakshmana
    Devakumar, J.
    Geetha, T.V.
    International Journal of Metadata, Semantics and Ontologies, 2008, 3 (03) : 226 - 232