Reading Order Independent Metrics for Information Extraction in Handwritten Documents

被引:0
|
作者
Villanova-Aparisi, David [1 ]
Tarride, Solene [2 ]
Martinez-Hinarejos, Carlos-D [1 ]
Romero, Veronica [3 ]
Kermorvant, Christopher [2 ]
Pastor-Gadea, Moises [1 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Cami Vera S-N, Valencia 46021, Spain
[2] TEKLIA, Paris, France
[3] Univ Valencia, Dept Informat, Valencia 46010, Spain
关键词
Information Extraction; Evaluation Metrics; Reading Order; Full Page Recognition; End-to-End Model; RECOGNITION; DISTANCE; CORPUS;
D O I
10.1007/978-3-031-70536-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction processes in handwritten documents tend to rely on obtaining an automatic transcription and performing Named Entity Recognition (NER) over such transcription. For this reason, in publicly available datasets, the performance of the systems is usually evaluated with metrics particular to each dataset. Moreover, most of the metrics employed are sensitive to reading order errors. Therefore, they do not reflect the expected final application of the system and introduce biases in more complex documents. In this paper, we propose and publicly release a set of reading order independent metrics tailored to Information Extraction evaluation in handwritten documents. In our experimentation, we perform an in-depth analysis of the behavior of the metrics to recommend what we consider to be the minimal set of metrics to evaluate a task correctly.
引用
收藏
页码:191 / 215
页数:25
相关论文
共 50 条
  • [41] Information Extraction from Arabic Law Documents
    Abu Shamma, Samah
    Ayasa, Aseel
    Sleem, Wala'
    Yahya, Adnan
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [42] Information Extraction from Chinese Judgment Documents
    Zhuang, Chuhan
    Zhou, Yemao
    Ge, Jidong
    Li, Zhongjin
    Li, Chuanyi
    Zhou, Xiaoyu
    Luo, Bin
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 240 - 244
  • [43] Information extraction and summarization from medical documents
    Spyropoulos, CD
    Karkatetsis, V
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) : 107 - 110
  • [44] Extraction and integration of chemical information from documents
    Villar, Hugo O.
    Betancort, Juan
    Hansen, Mark R.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240
  • [45] Pragmatic Information Extraction in Brazilian Portuguese Documents
    Lima Sena, Cleiton Fernando
    Claro, Daniela Barreiro
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 46 - 56
  • [46] Information extraction and automatic markup for XML documents
    Abolhassani, M
    Fuhr, N
    Gövert, N
    INTELLIGENT SEARCH ON XML DATA: APPLICATIONS, LANGUAGES, MODELS IMPLEMENTATIONS AND BENCHMARKS, 2003, 2818 : 159 - 174
  • [47] Quantitative Information Extraction from Humanitarian Documents
    Liberatore, Daniele
    Kalimeri, Kyriaki
    Sever, Derya
    Mejova, Yelena
    PROCEEDINGS OF THE 2024 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY FOR SOCIAL GOOD, GOODIT 2024, 2024, : 240 - 248
  • [48] Script Independent Word Spotting in Offline Handwritten Documents Based on Hidden Markov Models
    Wshah, Safwan
    Kumar, Gaurav
    Govindaraju, Venu
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 14 - 19
  • [49] QUANTITY OF INFORMATION PERCEPTIBLE WHILE READING SCIENTIFIC DOCUMENTS
    KOLESINSKII, AA
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1970, (12): : 5 - +
  • [50] Combining Textual and Visual Information for Typed and Handwritten Text Separation in Legal Documents
    Torrisi, Alessandro
    Bevan, Robert
    Atkinson, Katie
    Bollegala, Danushka
    Coenen, Frans
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2019), 2019, 322 : 223 - 228