Reading Order Independent Metrics for Information Extraction in Handwritten Documents

被引:0
|
作者
Villanova-Aparisi, David [1 ]
Tarride, Solene [2 ]
Martinez-Hinarejos, Carlos-D [1 ]
Romero, Veronica [3 ]
Kermorvant, Christopher [2 ]
Pastor-Gadea, Moises [1 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Cami Vera S-N, Valencia 46021, Spain
[2] TEKLIA, Paris, France
[3] Univ Valencia, Dept Informat, Valencia 46010, Spain
关键词
Information Extraction; Evaluation Metrics; Reading Order; Full Page Recognition; End-to-End Model; RECOGNITION; DISTANCE; CORPUS;
D O I
10.1007/978-3-031-70536-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors. Therefore, they do not reflect the expected final application of the system and introduce biases in more complex documents. In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents. In our experimentation, we perform an in-depth analysis of the behavior of the metrics to recommend what we consider to be the minimal set of metrics to evaluate a task correctly.
引用
收藏
页码:191 / 215
页数:25
相关论文
共 50 条
  • [21] End-to-End Information Extraction in Handwritten Documents: Understanding Paris Marriage Records from 1880 to 1940
    Constum, Thomas
    Preel, Lucas
    Larcher, Theo
    Paquet, Thierry
    Tranouez, Pierrick
    Bree, Sandra
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT III, 2024, 14806 : 195 - 214
  • [22] Script-independent text line segmentation in freestyle handwritten documents
    Li, Yi
    Zheng, Yefeng
    Doermann, David
    Jaeger, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1313 - 1329
  • [23] Information Extraction in Handwritten Marriage Licenses Books
    Romero, Veronica
    Fornes, Alicia
    Granell, Emilio
    Vidal, Enrique
    Sanchez, Joan Andreu
    PROCEEDINGS OF THE 2019 WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING (HIP' 19), 2019, : 66 - 71
  • [24] Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image
    Al-Barhamtoshy, Hassanin M.
    Jambi, Kamal M.
    Abdou, Sherif M.
    Rashwan, Mohsen A.
    IEEE ACCESS, 2021, 9 : 51242 - 51257
  • [25] Handwritten Documents Text Line Segmentation based on Information Energy
    Boiangiu, C. A.
    Tanase, M. C.
    Ioanitescu, R.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (01) : 8 - 15
  • [26] Multi-oriented handwritten annotations extraction from scanned documents
    Benjlaiel, Mohamed
    Mullot, Remy
    Alimi, Adel M.
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 126 - 130
  • [27] Text line extraction from multi-skewed handwritten documents
    Basu, S.
    Chaudhuri, C.
    Kundu, M.
    Nasipuri, M.
    Basu, D. K.
    PATTERN RECOGNITION, 2007, 40 (06) : 1825 - 1839
  • [28] Lanna Handwritten Character Recognition on Historical Documents Using Feature Extraction
    Khankasikam, Krisda
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2553 - 2560
  • [29] An extraction method of handwritten characters on printed documents by maxout filter networks
    Itoi K.
    Nakashizuka M.
    Journal of the Institute of Image Electronics Engineers of Japan, 2019, 48 (01) : 153 - 160
  • [30] Separator and Content based Approach for Table Extraction in Handwritten Chemistry Documents
    Ghanmi, Nabil
    Belaid, Abdel
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 296 - 300