Machine-printed Japanese document recognition

被引:13
|
作者
Srihari, SN [1 ]
Hong, T [1 ]
Srikantan, G [1 ]
机构
[1] HUGHES INFORMAT TECHNOL CORP, MARLBORO, MD 20774 USA
关键词
machine-printed document recognition; Japanese OCR; Japanese character image database;
D O I
10.1016/S0031-3203(96)00168-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cherry Blossom is a general-purpose Japanese document recognition system developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text regions into text lines and further into characters, and recognizes character images as characters in JIS code. Two feature sets, the Local Stroke Direction feature and the Gradient, Structural, and Concavity feature, are used for character classification. Two classification methods, the nearest neighbor classifier and the minimum error subspace method, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image database developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted from diverse document images. Results of our system on this dataset are also presented. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 50 条
  • [1] SEGMENTATION METHODS FOR RECOGNITION OF MACHINE-PRINTED CHARACTERS
    HOFFMAN, RL
    MCCULLOUGH, JW
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1971, 15 (02) : 153 - +
  • [2] A word spotting method for Farsi machine-printed document images
    Pourasad, Yaghoub
    Hassibi, Houshang
    Ghorbani, Azam
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2013, 21 (03) : 734 - 746
  • [3] Machine-printed Invoice Number based on Fuzzy Recognition
    Xian, Xiaoping
    GREEN POWER, MATERIALS AND MANUFACTURING TECHNOLOGY AND APPLICATIONS II, 2012, 214 : 705 - 710
  • [4] Forensic document examination with automatic separation of handwritten and machine-printed text
    Greening, C
    Sagar, VK
    Leedham, G
    HANDWRITING AND DRAWING RESEARCH: BASIC AND APPLIED ISSUES, 1996, : 509 - 520
  • [5] RECOGNITION OF HANDWRITTEN AND MACHINE-PRINTED TEXT FOR POSTAL ADDRESS INTERPRETATION
    SRIHARI, SN
    PATTERN RECOGNITION LETTERS, 1993, 14 (04) : 291 - 302
  • [6] Morphological approach to character recognition in machine-printed Persian words
    Timsari, B
    Fahimi, H
    DOCUMENT RECOGNITION III, 1996, 2660 : 184 - 191
  • [7] New statistical method for machine-printed Arabic character recognition
    Wang, H
    Ding, XQ
    Jin, JM
    Halmurat
    DOCUMENT RECOGNITION AND RETRIEVAL XII, 2005, 5676 : 127 - 135
  • [8] Arabic/Latin and Handwritten/Machine-printed Formula Classification and Recognition
    Ayeb, Kawther Khazri
    Echi, Afef Kacem
    Belaid, Abdel
    2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2017, : 90 - 94
  • [9] The CNN Based Machine-printed Traditional Mongolian Characters Recognition
    Hu, Hongwei
    Wei, Hongxi
    Liu, Zhenyu
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 3937 - 3941
  • [10] Convolutional Neural Network for Machine-Printed Traditional Mongolian Font Recognition
    Wei, Hongxi
    Wen, Ya
    Wang, Weiyuan
    Gao, Guanglai
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 265 - 274