Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

被引:1
|
作者
Ghosh, Manosij [1 ]
Ghosh, Kushal Kanti [1 ]
Bhowmik, Showmik [2 ]
Sarkar, Ram [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[2] Ghani Khan Choudhury Inst Engn & Technol GKCIET, Dept Comp Sci & Engn, Malda, India
关键词
Coalition game; Feature selection; Text non-text classification; LBP; Texture feature; Handwritten document; CLASSIFICATION; IDENTIFICATION;
D O I
10.1007/s11042-020-09844-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text non-text classification is an important research problem in the domain of document image processing. Undesirably, this is an almost ignored research topic, particularly, when we consider the unconstrained offline handwritten document images. For text non-text classification, many times researchers employ high dimensional feature vectors, which not only increase the computation time and storage requirement, but also reduce the classification accuracy due to the presence of redundant or irrelevant features. Here lies the application of some feature selection (FS) algorithms in order to find out the relevant subset of the features from the original feature vector. In this paper, our aim is two-fold. Firstly, application of coalition game based FS technique to find out an optimal feature subset for classifying the components present in a handwritten document image either as text or non-text. Secondly, five variants of a popular texture based feature descriptor, called Local Binary Pattern (LBP), along with its basic version are fed to the FS module for identifying the useful patterns only which can pinpoint the regions of an image as most informative in terms of the said classification task. To the best of our knowledge, the approach is completely novel where coalition game based FS technique is applied for locating the feature-rich regions to be used for text non-text classification. For experimentation, we have prepared an in-house dataset along with its ground truth information which consists of 104 handwritten engineering class notes as well as laboratory copies that include handwritten and printed texts, graphical components and tables etc. Experimental outcomes confirm that the proposed approach not only helps in reducing the feature dimension significantly but also increases the recognition ability of all six feature vectors.
引用
收藏
页码:3229 / 3249
页数:21
相关论文
共 50 条
  • [1] Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features
    Manosij Ghosh
    Kushal Kanti Ghosh
    Showmik Bhowmik
    Ram Sarkar
    Multimedia Tools and Applications, 2021, 80 : 3229 - 3249
  • [2] Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study
    Ghosh, Sourav
    Lahiri, Dibyadwati
    Bhowmik, Showmik
    Kavallieratou, Ergina
    Sarkar, Ram
    JOURNAL OF IMAGING, 2018, 4 (04)
  • [3] Context Modeling for Text/Non-Text Separation in Freeform Online Handwritten Documents
    Delaye, Adrien
    Liu, Cheng-Lin
    DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
  • [4] Text and Non-text Separation in Scanned Color-Official Documents
    Nandedkar, Amit Vijay
    Mukherjee, Jayanta
    Sural, Shamik
    COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING, ICVGIP 2016, 2017, 10481 : 231 - 242
  • [5] Text/Non-Text Classification in Online Handwritten Documents with Recurrent Neural Networks
    Truyen Van Phan
    Nakagawa, Masaki
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 23 - 28
  • [6] Text/Non-text Classification in Online Handwritten Documents with Conditional Random Fields
    Delaye, Adrien
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2012, 321 : 514 - 521
  • [7] Text and Non-text Segmentation based on Connected Component Features
    Viet Phuong Le
    Nayef, Nibal
    Visani, Muriel
    Ogier, Jean-Marc
    Cao De Tran
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1096 - 1100
  • [8] Text and Non-text Separation in Handwritten Document Images Using Local Binary Pattern Operator
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND COMMUNICATION, 2017, 458 : 507 - 515
  • [9] Keyword Spotting in Online Handwritten Documents Containing Text and Non-Text using BLSTM Neural Networks
    Indermuehle, Emanuel
    Frinken, Volkmar
    Fischer, Andreas
    Bunke, Horst
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 73 - 77
  • [10] Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm
    Soulib Ghosh
    S. K. Khalid Hassan
    Ali Hussain Khan
    Ankur Manna
    Showmik Bhowmik
    Ram Sarkar
    Soft Computing, 2022, 26 : 891 - 909