Cluster-based Sample Selection for Document Image Binarization

被引:2
|
作者
Krantz, Amandus [1 ]
Westphal, Florian [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci, Karlskrona, Sweden
来源
2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5 | 2019年
关键词
document image binarization; sample selection; neural networks; computer vision; RELATIVE NEIGHBORHOOD GRAPH; COMPETITION;
D O I
10.1109/ICDARW.2019.40080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art, in terms of performance, for solving document image binarization is training artificial neural networks on pre-labelled ground truth data. As such, it faces the same issues as other, more conventional, classification problems; requiring a large amount of training data. However, unlike those conventional classification problems, document image binarization involves having to either manually craft or estimate the binarized ground truth data, which can be error-prone and time-consuming. This is where sample selection, the act of selecting training samples based on some method or metric, might help. By reducing the size of the training dataset in such a way that the binarization performance is not impacted, the required time spent creating the ground truth is also reduced. This paper proposes a cluster-based sample selection method that uses image similarity metrics and the relative neighbourhood graph to reduce the underlying redundancy of the dataset. The method, implemented with affinity propagation and the structural similarity index, reduces the training dataset on average by 49.57% while reducing the binarization performance only by 0.55%.
引用
收藏
页码:47 / 52
页数:6
相关论文
共 50 条
  • [21] Entropy-Based Selection of Cluster Representatives for Document Image Compression
    Munoz-Perez, Luis F.
    Guerrero, Jose A.
    Macias-Diaz, Jorge E.
    SIAM JOURNAL ON IMAGING SCIENCES, 2019, 12 (04): : 1720 - 1738
  • [22] A Hybrid Approach for Document Image Binarization
    Sakila, A.
    Vijayarani, S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 645 - 650
  • [23] Historical Document Image Binarization: A Review
    Tensmeyer C.
    Martinez T.
    SN Computer Science, 2020, 1 (3)
  • [24] Adaptive degraded document image binarization
    Gatos, B
    Pratikakis, I
    Perantonis, SJ
    PATTERN RECOGNITION, 2006, 39 (03) : 317 - 327
  • [25] Augment Document Image Binarization by Learning
    Zhu, Yuanping
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1905 - 1908
  • [26] Combination of Document Image Binarization Techniques
    Su, Bolan
    Lu, Shijian
    Tan, Chew Lim
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 22 - 26
  • [27] Discrete CRF based combination framework for document image binarization
    Hebert, David
    Nicolas, Stephane
    Paquet, Thierry
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1165 - 1169
  • [28] A Survey on Document Image Binarization Techniques
    Lokhande, Supriya Sunil
    Dawande, N. A.
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 742 - 746
  • [29] Improved binarization algorithm for document image
    Chen, Dan
    Zhang, Feng
    He, Guiming
    Jisuanji Gongcheng/Computer Engineering, 2003, 29 (13):
  • [30] Fast binarization algorithm for document image
    Shanghai Jiaotong Univ, Shanghai, China
    Hongwai Yu Haomibo Xuebao, 5 (344-350):