Development of an effective character segmentation and efficient feature extraction technique for malayalam character recognition from palm leaf manuscripts

被引:3
|
作者
Sudarsan, Dhanya [1 ]
Sankar, Deepa [1 ]
机构
[1] Cochin Univ Sci & Technol, Sch Engn Elect & Commun, Kochi, India
关键词
Character segmentation; character recognition; base classifiers; KNN; Bayesian; decision tree; feature extraction; Malayalam Palm Leaf manuscripts; HANDWRITTEN BANGLA CHARACTER; NEURAL-NETWORK; CLASSIFICATION;
D O I
10.1007/s12046-023-02181-5
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The paper developed a novel character segmentation and feature extraction technique for old Malayalam Palm leaf manuscripts. The generic novel segmentation algorithm developed in this paper is fine-tuned to address all the language-specific properties of Malayalam characters written in old palm-leaf manuscripts. Since no major work has been reported in the area of character recognition from old Malayalam palm leaf manuscripts, the paper provides a clear insight into the performance of various feature extractors in recognizing the Malayalam characters which is mandatory while analyzing the performance of deep learning neural network for Malayalam character recognition from palm leaf manuscript. For this, an in-depth analysis of the performance of various existing feature extraction techniques on the base classifiers for Malayalam character recognition from palm-leaf manuscripts is done. The paper also aims to identify the best feature extractor classifier pair suitable for character recognition from old Malayalam palm leaf manuscript images. Initially, the color palm leaf manuscript is preprocessed using the linear block-by-block transformation, Nilblacks technique, and morphological operations for noise removal and binarization. A novel feature extraction technique is proposed is a combination of Log-Gabor which encodes a natural image in the best possible way and can properly address the properties of handwritten characters (similarity, overlapping characters, uneven background color, and foreground-background contrast) efficiently and uniform rotational invariant LBP which solves the invariant text analysis deficiency of Log-Gabor and thus the combination Log Gabor and uniform rotation invariant LBP was proved to be the best feature extractor for the purpose with an accuracy of 95.57%. The stacked ResNet (Convolutional Neural Network) architecture with the Long Short-Term Memory (LSTM) architecture is used to classify the different characters present in the manuscript.
引用
收藏
页数:21
相关论文
共 35 条