Automatic script identification from document images using cluster-based templates

被引:112
|
作者
Hochberg, J
Kelly, P
Thomas, T
Kerns, L
机构
关键词
script identification; document analysis; optical character recognition;
D O I
10.1109/34.574802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an automated script identification system for typeset document images. Templates for each script are created by clustering textual symbols from a training set. Symbols from new images are compared to the templates to find the best script. Our current system processes thirteen scripts with minimal preprocessing and high accuracy.
引用
收藏
页码:176 / 181
页数:6
相关论文
共 50 条
  • [31] Dimensionality Reduction and Feature Selection Methods for Script Identification on Document Images
    Poon, Bruce
    Rahman, Saami
    Amin, M. Ashraful
    Yan, Hong
    INFORMATION TECHNOLOGY IN INDUSTRY, 2014, 2 (01): : 1 - 5
  • [32] Word level Script and Language identification for Unconstrained handwritten document images
    Prasanthkumar, P., V
    Dileesh, E. D.
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 14 - 18
  • [33] TSP and cluster-based solutions to the reassignment of document identifiers
    Roi Blanco
    Álvaro Barreiro
    Information Retrieval, 2006, 9 : 499 - 517
  • [34] Speckle Reduction of OCT images using an Adaptive Cluster-based Filtering
    Adabi, Saba
    Rashedi, Elaheh
    Conforto, Silvia
    Mehregan, Darius
    Xu, Qiuyun
    Nasiriavanaki, Mohammadreza
    OPTICAL COHERENCE TOMOGRAPHY AND COHERENCE DOMAIN OPTICAL METHODS IN BIOMEDICINE XXI, 2017, 10053
  • [35] Cluster-based Sample Selection for Document Image Binarization
    Krantz, Amandus
    Westphal, Florian
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 47 - 52
  • [36] Indic Script Identification from Handwritten Document Images - An Unconstrained Block-level Approach
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 213 - 218
  • [37] Entropy Based Script Identification of a Multilingual Document Image
    Bashir, Rumaan
    Quadri, S. M. K.
    2014 INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2014, : 19 - 23
  • [38] Script Identification of Camera-based Images
    Li, Linlin
    Tan, Chew Lim
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 70 - 73
  • [39] Structural Feature Based Approach for Script Identification from Printed Indian Document
    Obaidullah, Sk Md
    Mondal, Anamika
    Roy, Kaushik
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 120 - 124
  • [40] A System for Handwritten Script Identification from Indian Document
    Obaidullah, Sk Md
    Das, Supratik Kundu
    Roy, Kaushik
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2013, 8 (01): : 1 - 12