Reading Order Independent Metrics for Information Extraction in Handwritten Documents

被引：0

作者：

Villanova-Aparisi, David ^{[1
]}

Tarride, Solene ^{[2
]}

Martinez-Hinarejos, Carlos-D ^{[1
]}

Romero, Veronica ^{[3
]}

Kermorvant, Christopher ^{[2
]}

Pastor-Gadea, Moises ^{[1
]}

机构：

[1] Univ Politecn Valencia, PRHLT Res Ctr, Cami Vera S-N, Valencia 46021, Spain

[2] TEKLIA, Paris, France

[3] Univ Valencia, Dept Informat, Valencia 46010, Spain

来源：

DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II | 2024年 / 14805卷

关键词：

Information Extraction; Evaluation Metrics; Reading Order; Full Page Recognition; End-to-End Model; RECOGNITION; DISTANCE; CORPUS;

D O I：

10.1007/978-3-031-70536-6_12

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors. Therefore, they do not reflect the expected final application of the system and introduce biases in more complex documents. In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents. In our experimentation, we perform an in-depth analysis of the behavior of the metrics to recommend what we consider to be the minimal set of metrics to evaluate a task correctly.

引用

页码：191 / 215

页数：25

共 50 条

[31] Text Line Extraction from Multi-skewed Handwritten Documents
Jiang Yong
Chen Xiaojing
PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 4, 2008, : 412 - +
[32] Extraction of chemical information from documents
Villar, Hugo O.
Betancort, Juan
Hansen, Mark R.
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240
[33] Editorial: Information extraction for health documents
Mensa, Enrico
Fernandez, Paloma Martinez
Roller, Roland
Radicioni, Daniele P.
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
[34] Information Extraction from Legal Documents
Cheng, Tin Tin
Cua, Jeffrey Leonard
Tan, Mark Davies
Yao, Kenneth Gerard
Roxas, Rachel Edita
2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 157 - +
[35] Multi-Oriented Text Line Extraction from Handwritten Arabic Documents
Ouwayed, Nazih
Belaid, Abdel
PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 339 - 346
[36] A Thresholding Approach for Text Extraction in Handwritten Historical Documents using Adaptive Morphology
Roy, Bishakha
Chatterjee, Rohit Kamal
2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 198 - 203
[37] A general approach for multi-oriented text line extraction of handwritten documents
Nazih Ouwayed
Abdel Belaïd
International Journal on Document Analysis and Recognition (IJDAR), 2012, 15 : 297 - 314
[38] A general approach for multi-oriented text line extraction of handwritten documents
Ouwayed, Nazih
Belaid, Abdel
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2012, 15 (04) : 297 - 314
[39] Clustering Web Documents with Tables for Information Extraction
Shchekotykhin, Kostyantyn
Jannach, Dietmar
Friedrich, Gerhard
K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 169 - 170
[40] Semantic information extraction from Tamil documents
Pandian, S. Lakshmana
Devakumar, J.
Geetha, T.V.
International Journal of Metadata, Semantics and Ontologies, 2008, 3 (03) : 226 - 232

← 1 2 3 4 5 →