Coalition game based feature selection for text non-text separation in handwritten documents using LBP based features

被引:1
|
作者
Ghosh, Manosij [1 ]
Ghosh, Kushal Kanti [1 ]
Bhowmik, Showmik [2 ]
Sarkar, Ram [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[2] Ghani Khan Choudhury Inst Engn & Technol GKCIET, Dept Comp Sci & Engn, Malda, India
关键词
Coalition game; Feature selection; Text non-text classification; LBP; Texture feature; Handwritten document; CLASSIFICATION; IDENTIFICATION;
D O I
10.1007/s11042-020-09844-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text non-text classification is an important research problem in the domain of document image processing. Undesirably, this is an almost ignored research topic, particularly, when we consider the unconstrained offline handwritten document images. For text non-text classification, many times researchers employ high dimensional feature vectors, which not only increase the computation time and storage requirement, but also reduce the classification accuracy due to the presence of redundant or irrelevant features. Here lies the application of some feature selection (FS) algorithms in order to find out the relevant subset of the features from the original feature vector. In this paper, our aim is two-fold. Firstly, application of coalition game based FS technique to find out an optimal feature subset for classifying the components present in a handwritten document image either as text or non-text. Secondly, five variants of a popular texture based feature descriptor, called Local Binary Pattern (LBP), along with its basic version are fed to the FS module for identifying the useful patterns only which can pinpoint the regions of an image as most informative in terms of the said classification task. To the best of our knowledge, the approach is completely novel where coalition game based FS technique is applied for locating the feature-rich regions to be used for text non-text classification. For experimentation, we have prepared an in-house dataset along with its ground truth information which consists of 104 handwritten engineering class notes as well as laboratory copies that include handwritten and printed texts, graphical components and tables etc. Experimental outcomes confirm that the proposed approach not only helps in reducing the feature dimension significantly but also increases the recognition ability of all six feature vectors.
引用
收藏
页码:3229 / 3249
页数:21
相关论文
共 50 条
  • [21] Text non-text classification based on area occupancy of equidistant pixels
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1889 - 1900
  • [22] A recurrent neural network based deep learning model for text and non-text stroke classification in online handwritten Devanagari document
    Ghosh, Rajib
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24245 - 24263
  • [23] Evolutionary Feature Selection for Text Documents using the SVM
    Morariu, Daniel I.
    Vintan, Lucian N.
    Tresp, Volker
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 215 - +
  • [24] A recurrent neural network based deep learning model for text and non-text stroke classification in online handwritten Devanagari document
    Rajib Ghosh
    Multimedia Tools and Applications, 2022, 81 : 24245 - 24263
  • [25] Character-Based Handwritten Text Recognition of Multilingual Documents
    del Agua, Miguel A.
    Serrano, Nicolas
    Civera, Jorge
    Juan, Alfons
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 187 - 196
  • [26] Handwritten Documents Text Line Segmentation based on Information Energy
    Boiangiu, C. A.
    Tanase, M. C.
    Ioanitescu, R.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (01) : 8 - 15
  • [27] Deep features based convolutional neural network model for text and non-text region segmentation from document images
    Umer, Saiyed
    Mondal, Ranjan
    Pandey, Hari Mohan
    Rout, Ranjeet Kumar
    APPLIED SOFT COMPUTING, 2021, 113
  • [28] A Novel Method To Summarize and Retrieve Text Documents Using Text Feature Extraction Based on Ontology
    Patil, Aradhana R.
    Manjrekar, Amrita A.
    2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 1256 - 1260
  • [29] A Chinese Document Layout Analysis Based on Non-text Images
    Fu Xiaoling
    Li Xiaofeng
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 326 - 328
  • [30] A TEXT FEATURE SELECTION METHOD USING TFIDF BASED ON ENTROPY
    Song, Jiang
    Xu, Min
    Fan, Chuyi
    COMPUTATIONAL INTELLIGENCE: FOUNDATIONS AND APPLICATIONS: PROCEEDINGS OF THE 9TH INTERNATIONAL FLINS CONFERENCE, 2010, 4 : 962 - 967