Classification of incunable glyphs and out-of-distribution detection with joint energy-based models

被引:6
|
作者
Kordon, Florian [1 ]
Weichselbaumer, Nikolaus [2 ]
Herz, Randall [2 ]
Mossman, Stephen [3 ]
Potten, Edward [4 ]
Seuret, Mathias [1 ]
Mayr, Martin [1 ]
Christlein, Vincent [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Martensstr 3, D-91058 Erlangen, Germany
[2] Johannes Gutenberg Univ Mainz, Gutenberg Inst Weltliteratur & schriftorientierte, Jakob Welder Weg 18, D-55128 Mainz, Germany
[3] Univ Manchester, Sch Arts Languages & Cultures, Oxford Rd, Manchester M13 9PL, England
[4] Univ York, Ctr Medieval Studies, York YO1 7EP, England
基金
英国艺术与人文研究理事会;
关键词
Letterpress printing; Glyph extraction; Optical character recognition; Joint energy-based models; OOD detection; NETWORKS; PRODUCTS;
D O I
10.1007/s10032-023-00442-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optical character recognition (OCR) has proved a powerful tool for the digital analysis of printed historical documents. However, its ability to localize and identify individual glyphs is challenged by the tremendous variety in historical type design, the physicality of the printing process, and the state of conservation. We propose to mitigate these problems by a downstream fine-tuning step that corrects for pathological and undesirable extraction results. We implement this idea by using a joint energy-based model which classifies individual glyphs and simultaneously prunes potential out-of-distribution (OOD) samples like rubrications, initials, or ligatures. During model training, we introduce specific margins in the energy spectrum that aid this separation and explore the glyph distribution's typical set to stabilize the optimization procedure. We observe strong classification at 0.972 AUPRC across 42 lower- and uppercase glyph types on a challenging digital reproduction of Johannes Balbus' Catholicon, matching the performance of purely discriminative methods. At the same time, we achieve OOD detection rates of 0.989 AUPRC and 0.946 AUPRC for OOD 'clutter' and 'ligatures' which substantially improves upon recently proposed OOD detection techniques. The proposed approach can be easily integrated into the postprocessing phase of current OCR to aid reproduction and shape analysis research.
引用
收藏
页码:223 / 240
页数:18
相关论文
共 50 条
  • [31] On the Learnability of Out-of-distribution Detection
    Fang, Zhen
    Li, Yixuan
    Liu, Feng
    Han, Bo
    Lu, Jie
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [32] DEEPLENS: Interactive Out-of-distribution Data Detection in NLP Models
    Song, Da
    Wang, Zhijie
    Huang, Yuheng
    Ma, Lei
    Zhang, Tianyi
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023, 2023,
  • [33] Understanding Failures in Out-of-Distribution Detection with Deep Generative Models
    Zhang, Lily H.
    Goldstein, Mark
    Ranganath, Rajesh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [34] Policy Entropy for Out-of-Distribution Classification
    Sedlmeier, Andreas
    Mueller, Robert
    Illium, Steffen
    Linnhoff-Popien, Claudia
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 420 - 431
  • [35] WOOD: Wasserstein-Based Out-of-Distribution Detection
    Wang, Yinan
    Sun, Wenbo
    Jin, Jionghua
    Kong, Zhenyu
    Yue, Xiaowei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 944 - 956
  • [36] Out-of-Distribution Detection in Deep Learning Models: A Feature Space-Based Approach
    Carvalho, Thiago Medeiros
    Vellasco, Marley
    Amaral, Jose Franco
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [37] Robust Classification Combined with Robust out-of-Distribution Detection: An Empirical Analysis
    Megyeri, Istvan
    Hegedus, Istvan
    Jelasity, Mark
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [38] Out-of-distribution detection based on multi-classifiers
    Jiang, Weijie
    Yu, Yuanlong
    COGNITIVE COMPUTATION AND SYSTEMS, 2023, 5 (02) : 95 - 108
  • [39] Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
    Baran, Mateusz
    Baran, Joanna
    Wojcik, Mateusz
    Zieba, Maciej
    Gonczarek, Adam
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 119 - 129
  • [40] Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation
    Chan, Robin
    Rottmann, Matthias
    Gottschalk, Hanno
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 5108 - 5117