Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

被引：5

作者：

Inkeaw, Papangkorn ^{[1
,2
]}

Udomwong, Piyachat ^{[3
]}

Chaijaruwanich, Jeerayut ^{[4
]}

机构：

[1] Chiang Mai Univ, Adv Res Ctr Computat Simulat, Chiang Mai 50200, Thailand

[2] Chiang Mai Univ, Fac Sci, Dept Comp Sci, Chiang Mai 50200, Thailand

[3] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand

[4] Chiang Mai Univ, Dept Comp Sci, Data Sci Res Ctr, Fac Sci, Chiang Mai 50200, Thailand

来源：

KNOWLEDGE-BASED SYSTEMS | 2021年 / 220卷

关键词：

Semi-supervised learning; Active learning; Ground truth generation; Handwritten character recognition;

D O I：

10.1016/j.knosys.2021.106953

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.

引用

页数：13

共 25 条

[21] Multi-Feature Fusion Algorithm Based on Generalized Discriminative Multi-set Canonical Correlation Analysis and Its Application for Recognition
Liu, Yihai
He, Jiazhou
Ding, Chunshan
2017 20TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2017, : 506 - 510
[22] Semi-automatic data annotation based on feature-space projection and local quality metrics: An application to cerebral emboli characterization
Vindas, Yamil
Guepie, Blaise Kevin
Almar, Marilys
Roux, Emmanuel
Delachartre, Philippe
MEDICAL IMAGE ANALYSIS, 2022, 79
[23] Semi-Automatic Detection of Ground Displacement from Multi-Temporal Sentinel-1 Synthetic Aperture Radar Interferometry Analysis and Density-Based Spatial Clustering of Applications with Noise in Xining City, China
Chen, Dianqiang
Wu, Qichen
Sun, Zhongjin
Shi, Xuguo
Zhang, Shaocheng
Zhang, Yi
Wu, Yunlong
REMOTE SENSING, 2024, 16 (16)
[24] Machine-learning based segmentation of the optic nerve head using multi-contrast Jones matrix optical coherence tomography with semi-automatic training dataset generation
Kasaragod, Deepa
Makita, Shuichi
Hong, Young-Joo
Yasuno, Yoshiaki
BIOMEDICAL OPTICS EXPRESS, 2018, 9 (07): : 3220 - 3243
[25] A Deep Learning-Based Method for the Semi-Automatic Identification of Built-Up Areas within Risk Zones Using Aerial Imagery and Multi-Source GIS Data: An Application for Landslide Risk
Francini, Mauro
Salvo, Carolina
Viscomi, Antonio
Vitale, Alessandro
REMOTE SENSING, 2022, 14 (17)

← 1 2 3 →