Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

被引:5
|
作者
Inkeaw, Papangkorn [1 ,2 ]
Udomwong, Piyachat [3 ]
Chaijaruwanich, Jeerayut [4 ]
机构
[1] Chiang Mai Univ, Adv Res Ctr Computat Simulat, Chiang Mai 50200, Thailand
[2] Chiang Mai Univ, Fac Sci, Dept Comp Sci, Chiang Mai 50200, Thailand
[3] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand
[4] Chiang Mai Univ, Dept Comp Sci, Data Sci Res Ctr, Fac Sci, Chiang Mai 50200, Thailand
关键词
Semi-supervised learning; Active learning; Ground truth generation; Handwritten character recognition;
D O I
10.1016/j.knosys.2021.106953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 25 条
  • [21] Multi-Feature Fusion Algorithm Based on Generalized Discriminative Multi-set Canonical Correlation Analysis and Its Application for Recognition
    Liu, Yihai
    He, Jiazhou
    Ding, Chunshan
    2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 506 - 510
  • [22] Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization
    Vindas, Yamil
    Guepie, Blaise Kevin
    Almar, Marilys
    Roux, Emmanuel
    Delachartre, Philippe
    MEDICAL IMAGE ANALYSIS, 2022, 79
  • [23] Semi-Automatic Detection of Ground Displacement from Multi-Temporal Sentinel-1 Synthetic Aperture Radar Interferometry Analysis and Density-Based Spatial Clustering of Applications with Noise in Xining City, China
    Chen, Dianqiang
    Wu, Qichen
    Sun, Zhongjin
    Shi, Xuguo
    Zhang, Shaocheng
    Zhang, Yi
    Wu, Yunlong
    REMOTE SENSING, 2024, 16 (16)
  • [24] Machine-learning based segmentation of the optic nerve head using multi-contrast Jones matrix optical coherence tomography with semi-automatic training dataset generation
    Kasaragod, Deepa
    Makita, Shuichi
    Hong, Young-Joo
    Yasuno, Yoshiaki
    BIOMEDICAL OPTICS EXPRESS, 2018, 9 (07): : 3220 - 3243
  • [25] A Deep Learning-Based Method for the Semi-Automatic Identification of Built-Up Areas within Risk Zones Using Aerial Imagery and Multi-Source GIS Data: An Application for Landslide Risk
    Francini, Mauro
    Salvo, Carolina
    Viscomi, Antonio
    Vitale, Alessandro
    REMOTE SENSING, 2022, 14 (17)