Automatic script identification from document images using cluster-based templates

被引:112
|
作者
Hochberg, J
Kelly, P
Thomas, T
Kerns, L
机构
关键词
script identification; document analysis; optical character recognition;
D O I
10.1109/34.574802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.
引用
收藏
页码:176 / 181
页数:6
相关论文
共 50 条
  • [21] A ROBUST SCRIPT IDENTIFICATION SYSTEM FOR HISTORICAL INDIAN DOCUMENT IMAGES
    Kavitha, S.
    Shivakumara, P.
    Kumar, G. Hemantha
    Tan, C. L.
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2015, 28 (04) : 283 - 300
  • [22] Cluster-based Language Model for Spoken Document Retrieval Using NMF-Based Document Clustering
    Hu, Xinhui
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 705 - 708
  • [23] Appearance based models in document script identification
    Vikram, T. N.
    Guru, D. S.
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 709 - +
  • [24] An automatic cluster-based approach for depth estimation of single 2D images
    Shoukat, Muhammad Awais
    Sargano, Allah Bux
    Habib, Zulfiqar
    You, Lihua
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [25] Font clustering and cluster identification in document images
    Öztürk, S
    Sankur, B
    Abak, AT
    JOURNAL OF ELECTRONIC IMAGING, 2001, 10 (02) : 418 - 430
  • [26] Script Identification from Camera Based Tri-Lingual Document
    Mukarambi, Gururaj
    Mallapa, Satishkumar
    Dhandra, B. V.
    2017 IEEE 3RD INTERNATIONAL CONFERENCE ON SENSING, SIGNAL PROCESSING AND SECURITY (ICSSS), 2017, : 214 - 217
  • [27] AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT
    Obaidullah, Sk Md
    Halder, Chayan
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2018, 31 (01) : 63 - 84
  • [28] FFBF: cluster-based Fuzzy Firefly Bayes Filter for noise identification and removal from grayscale images
    Kumar, S. Vijaya
    Nagaraju, C.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 1289 - 1311
  • [29] FFBF: cluster-based Fuzzy Firefly Bayes Filter for noise identification and removal from grayscale images
    S. Vijaya Kumar
    C. Nagaraju
    Cluster Computing, 2019, 22 : 1289 - 1311
  • [30] TSP and cluster-based solutions to the reassignment of document identifiers
    Blanco, Roi
    Barreiro, Alvaro
    INFORMATION RETRIEVAL, 2006, 9 (04): : 499 - 517