Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

被引:5
|
作者
Inkeaw, Papangkorn [1 ,2 ]
Udomwong, Piyachat [3 ]
Chaijaruwanich, Jeerayut [4 ]
机构
[1] Chiang Mai Univ, Adv Res Ctr Computat Simulat, Chiang Mai 50200, Thailand
[2] Chiang Mai Univ, Fac Sci, Dept Comp Sci, Chiang Mai 50200, Thailand
[3] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand
[4] Chiang Mai Univ, Dept Comp Sci, Data Sci Res Ctr, Fac Sci, Chiang Mai 50200, Thailand
关键词
Semi-supervised learning; Active learning; Ground truth generation; Handwritten character recognition;
D O I
10.1016/j.knosys.2021.106953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 25 条
  • [1] Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition
    Vajda, Szilard
    Rangoni, Yves
    Cecotti, Hubert
    PATTERN RECOGNITION LETTERS, 2015, 58 : 23 - 28
  • [2] Semi-automatic ground truth generation for chart image recognition
    Yang, L
    Huang, WH
    Tan, CL
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 324 - 335
  • [3] Semi-automatic Ground Truth Generation for License Plate Recognition System
    Wang, Shen-Zheng
    Zhao, San-Lung
    Chen, Yi-Yuan
    Lan, Kung-Ming
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIV, 2011, 8135
  • [4] Semi-automatic Generation of Accurate Ground Truth Data in Video Sequences
    Dominguez, Gustavo Fernandez
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 310 - 315
  • [5] A semi-automatic system for ground truth generation of soccer video sequences
    D'Orazio, T.
    Leo, M.
    Mosca, N.
    Spagnolo, P.
    Mazzeo, P. L.
    AVSS: 2009 6TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2009, : 559 - +
  • [6] Multi-feature based automatic recognition of ship targets in ISAR
    Pastina, D.
    Spina, C.
    IET RADAR SONAR AND NAVIGATION, 2009, 3 (04): : 406 - 423
  • [7] Towards Semi-automatic Bug Triage and Severity Prediction Based on Topic Model and Multi-Feature of Bug Reports
    Yang, Geunseok
    Zhang, Tao
    Lee, Byungjeong
    2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 97 - 106
  • [8] Effective Handwritten Digit Recognition Based on Multi-feature Extraction and Deep Analysis
    Ma, Caiyun
    Zhang, Hong
    2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 297 - 301
  • [9] Handwritten Formula Symbol Recognition Based on Multi-Feature Convolutional Neural Network
    Fang Dingbang
    Feng Gui
    Cao Haiyan
    Yang Hengjie
    Han Xue
    Yi Yincheng
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (07)
  • [10] Multi-feature based automatic recognition of ship targets in ISAR images
    Pastina, Debora
    Spina, Chiara
    2008 IEEE RADAR CONFERENCE, VOLS. 1-4, 2008, : 2180 - 2185