Machine-printed Japanese document recognition

被引:13
|
作者
Srihari, SN [1 ]
Hong, T [1 ]
Srikantan, G [1 ]
机构
[1] HUGHES INFORMAT TECHNOL CORP, MARLBORO, MD 20774 USA
关键词
machine-printed document recognition; Japanese OCR; Japanese character image database;
D O I
10.1016/S0031-3203(96)00168-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cherry Blossom is a general-purpose Japanese document recognition system developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text regions into text lines and further into characters, and recognizes character images as characters in JIS code. Two feature sets, the Local Stroke Direction feature and the Gradient, Structural, and Concavity feature, are used for character classification. Two classification methods, the nearest neighbor classifier and the minimum error subspace method, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image database developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted from diverse document images. Results of our system on this dataset are also presented. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 50 条
  • [41] Pyramid Histogram of Oriented Gradient for Machine-printed/Handwritten and Arabic/Latin word discrimination
    Saidani, A.
    Echi, A. Kacem
    2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 267 - 272
  • [42] Distinction between handwritten and machine-printed text based on the bag of visual words model
    Zagoris, Konstantinos
    Pratikakis, Ioannis
    Antonacopoulos, Apostolos
    Gatos, Basilis
    Papamarkos, Nikos
    PATTERN RECOGNITION, 2014, 47 (03) : 1051 - 1062
  • [43] Separation of Handwritten and Machine-Printed Texts from Noisy Documents Using Contourlet Transform
    Sahare, Parul
    Dhok, Sanjay B.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 8159 - 8177
  • [44] Separation of Handwritten and Machine-Printed Texts from Noisy Documents Using Contourlet Transform
    Parul Sahare
    Sanjay B. Dhok
    Arabian Journal for Science and Engineering, 2018, 43 : 8159 - 8177
  • [45] SEGMENTATION OF TOUCHING CHARACTERS IN PRINTED DOCUMENT RECOGNITION
    LIANG, S
    SHRIDHAR, M
    AHMADI, M
    PATTERN RECOGNITION, 1994, 27 (06) : 825 - 840
  • [46] Robust table recognition for printed document images
    Liang, Qiaokang
    Peng, Jianzhong
    Li, Zhengwei
    Xie, Daqi
    Sun, Wei
    Wang, Yaonan
    Zhang, Dan
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2020, 17 (04) : 3203 - 3223
  • [47] Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths
    Nikolaou, Nikos
    Makridis, Michael
    Gatos, Basilis
    Stamatopoulos, Nikolaos
    Papamarkos, Nikos
    IMAGE AND VISION COMPUTING, 2010, 28 (04) : 590 - 604
  • [48] Classification of machine-printed and handwritten addresses on Korean mail piece images using geometric features
    Jang, SI
    Jeong, SH
    Nam, YS
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 383 - 386
  • [49] Distinction between Handwritten and Machine-Printed Characters with No Need to Locate Character or Text Line Position
    Koyama, Jumpei
    Kato, Masahiro
    Hirose, Akira
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 4044 - 4051
  • [50] Machine recognition of printed Kannada text
    Kumar, BV
    Ramakrishnan, AG
    DOCUMENT ANALYSIS SYSTEM V, PROCEEDINGS, 2002, 2423 : 37 - 48