Cluster-based Sample Selection for Document Image Binarization

被引:2
|
作者
Krantz, Amandus [1 ]
Westphal, Florian [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci, Karlskrona, Sweden
来源
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5 | 2019年
关键词
document image binarization; sample selection; neural networks; computer vision; RELATIVE NEIGHBORHOOD GRAPH; COMPETITION;
D O I
10.1109/ICDARW.2019.40080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art, in terms of performance, for solving document image binarization is training artificial neural networks on pre-labelled ground truth data. As such, it faces the same issues as other, more conventional, classification problems; requiring a large amount of training data. However, unlike those conventional classification problems, document image binarization involves having to either manually craft or estimate the binarized ground truth data, which can be error-prone and time-consuming. This is where sample selection, the act of selecting training samples based on some method or metric, might help. By reducing the size of the training dataset in such a way that the binarization performance is not impacted, the required time spent creating the ground truth is also reduced. This paper proposes a cluster-based sample selection method that uses image similarity metrics and the relative neighbourhood graph to reduce the underlying redundancy of the dataset. The method, implemented with affinity propagation and the structural similarity index, reduces the training dataset on average by 49.57% while reducing the binarization performance only by 0.55%.
引用
收藏
页码:47 / 52
页数:6
相关论文
共 50 条
  • [31] Historical Document Image Binarization Based on Edge Contrast Information
    Li, Zhenjiang
    Wang, Weilan
    Cai, Zhengqi
    ADVANCES IN COMPUTER VISION, CVC, VOL 1, 2020, 943 : 614 - 628
  • [32] Continual Learning for Document Image Binarization
    Garrido-Munoz, Carlos
    Sanchez-Hernandez, Adrian
    Castellanos, Francisco J.
    Calvo-Zaragoza, Jorge
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1443 - 1449
  • [33] Ancient degraded document image binarization based on texture features
    Sehad, Abdenour
    Chibani, Youcef
    Cheriet, Mohamed
    Yaddaden, Yacine
    2013 8TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA), 2013, : 189 - +
  • [34] Restoration Based Contourlet Transform for Historical Document Image Binarization
    Zemouri, ET-Tahir
    Chibani, Youcef
    Brik, Youcef
    2014 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2014, : 309 - 313
  • [35] A Novel Approach for Document Image Binarization
    Vishnupriya, S.
    Saranya, P.
    Elangovan, E.
    ICACCS 2015 PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS, 2015,
  • [36] TSP and cluster-based solutions to the reassignment of document identifiers
    Blanco, Roi
    Barreiro, Alvaro
    INFORMATION RETRIEVAL, 2006, 9 (04): : 499 - 517
  • [37] TSP and cluster-based solutions to the reassignment of document identifiers
    Roi Blanco
    Álvaro Barreiro
    Information Retrieval, 2006, 9 : 499 - 517
  • [38] Cluster-based instance selection for machine classification
    Czarnowski, Ireneusz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (01) : 113 - 133
  • [39] A Cluster-Based Sequential Feature Selection Algorithm
    Zhu, Kexin
    Yang, Jian
    2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 848 - 852
  • [40] Efficient Coreset Selection with Cluster-based Methods
    Chai, Chengliang
    Wang, Jiayi
    Tang, Nan
    Yuan, Ye
    Liu, Jiabin
    Deng, Yuhao
    Wang, Guoren
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 167 - 178