Multi-script text versus non-text classification of regions in scene images

被引:11
|
作者
Sriman, Bowornrat [1 ]
Schomaker, Lambert [1 ]
机构
[1] Univ Groningen, Artificial Intelligence, Nijenborgh 9, NL-9747 AG Groningen, Netherlands
关键词
Text detection in scene images; Text/non-text classification; Color features; Color histogram autocorrelation; SCALE; RECOGNITION;
D O I
10.1016/j.jvcir.2019.04.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text versus non-text region classification is an essential but difficult step in scene-image analysis due to the considerable shape complexity of text and background patterns. There exists a high probability of confusion between background elements and letter parts. This paper proposes a feature-based classification of image blocks using the color autocorrelation histogram (CAH) and the scale-invariant feature transform (SIFT) algorithm, yielding a combined scale and color-invariant feature suitable for scene-text classification. For the evaluation, features were extracted from different color spaces, applying color-histogram autocorrelation. The color features are adjoined with a SIFT descriptor. Parameter tuning is performed and evaluated. For the classification, a standard nearest-neighbor (INN) and a support vector machine (SVM) were compared. The proposed method appears to perform robustly and is especially suitable for Asian scripts such as Kannada and Thai, where urban scene-text fonts are characterized by a high curvature and salient color variations. (C) 2019 Published by Elsevier Inc.
引用
收藏
页码:23 / 42
页数:20
相关论文
共 50 条
  • [31] AUTNT - A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN
    Tauseef Khan
    Ayatullah Faruk Mollah
    Multimedia Tools and Applications, 2019, 78 : 32159 - 32186
  • [32] Text/Non-Text Classification in Online Handwritten Documents with Recurrent Neural Networks
    Truyen Van Phan
    Nakagawa, Masaki
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 23 - 28
  • [33] Comparison of MRF and CRF for Text/Non-text Classification in Japanese Ink Documents
    Inatani, Soichiro
    Phan, Truyen Van
    Nakagawa, Masaki
    Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2014, 2014-December : 684 - 689
  • [34] Text/Non-text Classification in Online Handwritten Documents with Conditional Random Fields
    Delaye, Adrien
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2012, 321 : 514 - 521
  • [35] AUTNT - A component level dataset for text non-text classification and benchmarking with novel script invariant feature descriptors and D-CNN
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 32159 - 32186
  • [36] Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction
    Swaileh, Wassim
    Mohand, Kamel Ait
    Paquet, Thierry
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1241 - 1245
  • [37] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [38] Histograms of Stroke Widths for Multi-script Text Detection and Verification in Road Scenes
    Valdenegro-Toro, Matias
    Ploeger, Paul
    Eickeler, Stefan
    Konya, Iuliu
    IFAC PAPERSONLINE, 2016, 49 (15): : 100 - 107
  • [39] Text-independent writer recognition using multi-script handwritten texts
    Djeddi, Chawki
    Siddiqi, Imran
    Souici-Meslati, Labiba
    Ennaji, Abdellatif
    PATTERN RECOGNITION LETTERS, 2013, 34 (10) : 1196 - 1202
  • [40] Distance Transform-Based Stroke Feature Descriptor for Text Non-text Classification
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 189 - 200