Classification of incunable glyphs and out-of-distribution detection with joint energy-based models

被引:6
|
作者
Kordon, Florian [1 ]
Weichselbaumer, Nikolaus [2 ]
Herz, Randall [2 ]
Mossman, Stephen [3 ]
Potten, Edward [4 ]
Seuret, Mathias [1 ]
Mayr, Martin [1 ]
Christlein, Vincent [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Martensstr 3, D-91058 Erlangen, Germany
[2] Johannes Gutenberg Univ Mainz, Gutenberg Inst Weltliteratur & schriftorientierte, Jakob Welder Weg 18, D-55128 Mainz, Germany
[3] Univ Manchester, Sch Arts Languages & Cultures, Oxford Rd, Manchester M13 9PL, England
[4] Univ York, Ctr Medieval Studies, York YO1 7EP, England
基金
英国艺术与人文研究理事会;
关键词
Letterpress printing; Glyph extraction; Optical character recognition; Joint energy-based models; OOD detection; NETWORKS; PRODUCTS;
D O I
10.1007/s10032-023-00442-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optical character recognition (OCR) has proved a powerful tool for the digital analysis of printed historical documents. However, its ability to localize and identify individual glyphs is challenged by the tremendous variety in historical type design, the physicality of the printing process, and the state of conservation. We propose to mitigate these problems by a downstream fine-tuning step that corrects for pathological and undesirable extraction results. We implement this idea by using a joint energy-based model which classifies individual glyphs and simultaneously prunes potential out-of-distribution (OOD) samples like rubrications, initials, or ligatures. During model training, we introduce specific margins in the energy spectrum that aid this separation and explore the glyph distribution's typical set to stabilize the optimization procedure. We observe strong classification at 0.972 AUPRC across 42 lower- and uppercase glyph types on a challenging digital reproduction of Johannes Balbus' Catholicon, matching the performance of purely discriminative methods. At the same time, we achieve OOD detection rates of 0.989 AUPRC and 0.946 AUPRC for OOD 'clutter' and 'ligatures' which substantially improves upon recently proposed OOD detection techniques. The proposed approach can be easily integrated into the postprocessing phase of current OCR to aid reproduction and shape analysis research.
引用
收藏
页码:223 / 240
页数:18
相关论文
共 50 条
  • [21] Heatmap-based Out-of-Distribution Detection
    Hornauer, Julia
    Belagiannis, Vasileios
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2602 - 2611
  • [22] Joint Out-of-Distribution Detection and Uncertainty Estimation for Trajectory Prediction
    Wiederer, Julian
    Schmidt, Julian
    Kressel, Ulrich
    Dietmayer, Klaus
    Belagiannis, Vasileios
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5487 - 5494
  • [23] Balanced Energy Regularization Loss for Out-of-distribution Detection
    Choi, Hyunjun
    Jeong, Hawook
    Choi, Jin Young
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15691 - 15700
  • [24] Rule-Based Out-of-Distribution Detection
    De Bernardi G.
    Narteni S.
    Cambiaso E.
    Mongelli M.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2627 - 2637
  • [25] A Method for Out-of-Distribution Detection in Encrypted Mobile Traffic Classification
    Tong, Yuzhou
    Chen, Yongming
    Hwee, Gwee Bah
    Cao, Qi
    Razu, Sirajudeen Gulam
    Lin, Zhiping
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [26] Simultaneous classification and out-of-distribution detection for wafer bin maps
    Choi, Jeongman
    Ma, Eun-Yeol
    Kim, Heeyoung
    QUALITY ENGINEERING, 2023, 36 (04) : 713 - 725
  • [27] On the Learnability of Out-of-distribution Detection
    Fang, Zhen
    Li, Yixuan
    Liu, Feng
    Han, Bo
    Lu, Jie
    Journal of Machine Learning Research, 2024, 25
  • [28] Entropic Out-of-Distribution Detection
    Macedo, David
    Ren, Tsang Ing
    Zanchettin, Cleber
    Oliveira, Adriano L., I
    Ludermir, Teresa
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [29] Watermarking for Out-of-distribution Detection
    Wang, Qizhou
    Liu, Feng
    Zhang, Yonggang
    Zhang, Jing
    Gong, Chen
    Liu, Tongliang
    Han, Bo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Is Out-of-Distribution Detection Learnable?
    Fang, Zhen
    Li, Yixuan
    Lu, Jie
    Dong, Jiahua
    Han, Bo
    Liu, Feng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,