Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

被引:1
|
作者
Ghosh, Manosij [1 ]
Ghosh, Kushal Kanti [1 ]
Bhowmik, Showmik [2 ]
Sarkar, Ram [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[2] Ghani Khan Choudhury Inst Engn & Technol GKCIET, Dept Comp Sci & Engn, Malda, India
关键词
Coalition game; Feature selection; Text non-text classification; LBP; Texture feature; Handwritten document; CLASSIFICATION; IDENTIFICATION;
D O I
10.1007/s11042-020-09844-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text non-text classification is an important research problem in the domain of document image processing. Undesirably, this is an almost ignored research topic, particularly, when we consider the unconstrained offline handwritten document images. For text non-text classification, many times researchers employ high dimensional feature vectors, which not only increase the computation time and storage requirement, but also reduce the classification accuracy due to the presence of redundant or irrelevant features. Here lies the application of some feature selection (FS) algorithms in order to find out the relevant subset of the features from the original feature vector. In this paper, our aim is two-fold. Firstly, application of coalition game based FS technique to find out an optimal feature subset for classifying the components present in a handwritten document image either as text or non-text. Secondly, five variants of a popular texture based feature descriptor, called Local Binary Pattern (LBP), along with its basic version are fed to the FS module for identifying the useful patterns only which can pinpoint the regions of an image as most informative in terms of the said classification task. To the best of our knowledge, the approach is completely novel where coalition game based FS technique is applied for locating the feature-rich regions to be used for text non-text classification. For experimentation, we have prepared an in-house dataset along with its ground truth information which consists of 104 handwritten engineering class notes as well as laboratory copies that include handwritten and printed texts, graphical components and tables etc. Experimental outcomes confirm that the proposed approach not only helps in reducing the feature dimension significantly but also increases the recognition ability of all six feature vectors.
引用
收藏
页码:3229 / 3249
页数:21
相关论文
共 50 条
  • [41] A handwritten ancient text detector based on improved feature pyramid network
    Feng, Ruiqi
    Zhao, Fujia
    Chen, Shanxiong
    Zhang, Shixue
    Zhu, Shiyu
    PATTERN RECOGNITION LETTERS, 2023, 172 : 195 - 202
  • [42] A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents
    Puri, Shalini
    Singh, Satya Prakash
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2020, 13 (02) : 155 - 194
  • [43] GAN-based text line segmentation method for challenging handwritten documents
    Ozseker, Ibrahim
    Demir, Ali Alper
    Ozkaya, Ufuk
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2024,
  • [44] Text Line Segmentation in Handwritten Documents Based on Connected Components Trajectory Generation
    Setitra, Insaf
    Meziane, Abdelkrim
    Hadjadj, Zineb
    Bengherbia, Nawfel
    PATTERN RECOGNITION APPLICATIONS AND METHODS, 2018, 10857 : 222 - 234
  • [45] Entropy-Based Approach for Enabling Text Line Segmentation in Handwritten Documents
    Sindhushree, G. S.
    Amarnath, R.
    Nagabhushan, P.
    DATA ANALYTICS AND LEARNING, 2019, 43 : 169 - 184
  • [46] Handwritten text separation from annotated machine printed documents using Markov Random Fields
    Peng, Xujun
    Setlur, Srirangaraj
    Govindaraju, Venu
    Sitaram, Ramachandrula
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2013, 16 (01) : 1 - 16
  • [47] Handwritten text separation from annotated machine printed documents using Markov Random Fields
    Xujun Peng
    Srirangaraj Setlur
    Venu Govindaraju
    Ramachandrula Sitaram
    International Journal on Document Analysis and Recognition (IJDAR), 2013, 16 : 1 - 16
  • [48] Using micro-documents for feature selection: The case of ordinal text classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4687 - 4696
  • [49] A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data
    Chao, Shilong
    Cai, Jie
    Yang, Sheng
    Wang, Shulin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 122 - 132
  • [50] Coloring based handwritten Uyghur text line detection and separation algorithm
    Askar, Hamdulla
    Yi, Xiaofang
    Kamil, Moydin
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2013, 53 (02): : 259 - 264