Automatic script identification from document images using cluster-based templates

被引:112
|
作者
Hochberg, J
Kelly, P
Thomas, T
Kerns, L
机构
关键词
script identification; document analysis; optical character recognition;
D O I
10.1109/34.574802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.
引用
收藏
页码:176 / 181
页数:6
相关论文
共 50 条
  • [1] An Approach for Automatic Indic Script Identification from Handwritten Document Images
    Obaidullah, Sk. Md.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    ADVANCED COMPUTING AND SYSTEMS FOR SECURITY, VOL 2, 2016, 396 : 37 - 51
  • [2] Script and language identification from document images
    Peake, GS
    Tan, TN
    WORKSHOP ON DOCUMENT IMAGE ANALYSIS (DIA'97), PROCEEDINGS: IN COOPERATION WITH CVPR '97, 1997, : 10 - 17
  • [3] Script identification based on morphological reconstruction in document images
    Dhandra, B. V.
    Nagabhushan, P.
    Hangarge, Mallikarjun
    Hegadi, Ravindra
    Malemath, V. S.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 950 - +
  • [4] Indic script identification from handwritten document images
    Singh P.K.
    Sarkar R.
    Nasipuri M.
    International Journal of Intelligent Systems Technologies and Applications, 2019, 18 (03) : 303 - 321
  • [5] Numeral Script Identification from Handwritten Document Images
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 585 - 594
  • [6] Script Identification of Camera Based Bilingual Document Images Using SFTA Features
    Dhandra, B., V
    Mallappa, Satishkumar
    Mukarambi, Gururaj
    INTERNATIONAL JOURNAL OF TECHNOLOGY AND HUMAN INTERACTION, 2019, 15 (04) : 1 - 12
  • [7] Transform Based Approach for Indic Script Identification from Handwritten Document Images
    Obaidullah, Sk Md
    Karim, Rownaqul
    Shaikh, Sujal
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    2015 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATION AND NETWORKING (ICSCN), 2015,
  • [8] Automatic Depth Extraction from 2D Images Using a Cluster-Based Learning Framework
    Herrera, Jose L.
    del-Blanco, Carlos R.
    Garcia, Narciso
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3288 - 3299
  • [9] Feature Selection Using Harmony Search for Script Identification from Handwritten Document Images
    Singh, Pawan Kumar
    Das, Supratim
    Sarkar, Ram
    Nasipuri, Mita
    JOURNAL OF INTELLIGENT SYSTEMS, 2018, 27 (03) : 465 - 488
  • [10] Script and language identification for handwritten document images
    Judith Hochberg
    Kevin Bowers
    Michael Cannon
    Patrick Kelly
    International Journal on Document Analysis and Recognition, 1999, 2 (2-3) : 45 - 52