Machine-printed Japanese document recognition

被引:13
|
作者
Srihari, SN [1 ]
Hong, T [1 ]
Srikantan, G [1 ]
机构
[1] HUGHES INFORMAT TECHNOL CORP, MARLBORO, MD 20774 USA
关键词
machine-printed document recognition; Japanese OCR; Japanese character image database;
D O I
10.1016/S0031-3203(96)00168-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cherry Blossom is a general-purpose Japanese document recognition system developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text regions into text lines and further into characters, and recognizes character images as characters in JIS code. Two feature sets, the Local Stroke Direction feature and the Gradient, Structural, and Concavity feature, are used for character classification. Two classification methods, the nearest neighbor classifier and the minimum error subspace method, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image database developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted from diverse document images. Results of our system on this dataset are also presented. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 50 条
  • [31] Robust shared feature learning for script and handwritten/machine-printed identification
    Feng, Ziyong
    Yang, Zhaoyang
    Jin, Lianwen
    Huang, Shuangping
    Sun, Jun
    PATTERN RECOGNITION LETTERS, 2017, 100 : 6 - 13
  • [32] Classification of machine-printed and handwritten texts using character block layout variance
    Fan, KC
    Wang, LS
    Tu, YT
    PATTERN RECOGNITION, 1998, 31 (09) : 1275 - 1284
  • [33] LOCAL-SPECTRUM-BASED DISTINCTION BETWEEN HANDWRITTEN AND MACHINE-PRINTED CHARACTERS
    Koyama, J.
    Hirose, A.
    Kato, M.
    2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 2008, : 1021 - 1024
  • [34] Separation of Machine-Printed and Handwritten Texts in Noisy Documents using Wavelet Transform
    Sahare, Parul
    Dhok, Sanjay B.
    IETE TECHNICAL REVIEW, 2019, 36 (04) : 341 - 361
  • [35] Connected Component Level Discrimination of Handwritten and Machine-Printed Text Using Eigenfaces
    Pinson, Samuel J.
    Barrett, William A.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1394 - 1398
  • [36] Discrimination of machine-printed from handwritten text using simple structural characteristics
    Kavallieratou, E
    Stamatatos, S
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, : 437 - 440
  • [37] Printed Arabic document recognition system
    Jin, JM
    Wang, H
    Ding, XQ
    Peng, LR
    DOCUMENT RECOGNITION AND RETRIEVAL XII, 2005, 5676 : 48 - 55
  • [38] Research on the Identification of Hand-Painted and Machine-Printed Thangka Using CBIR
    Pan, Chunhua
    Cao, Yi
    Ren, Jinglong
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 34 (02): : 1081 - 1091
  • [39] Identifying Machine-Printed and Handwritten Texts using DropRegion and Deep Convolutional Network
    Yang, Zhaoyang
    Jin, Lianwen
    Feng, Ziyong
    Sun, Jun
    Zhou, Weiying
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1150 - 1155
  • [40] Hand-Written and Machine-Printed Text Classification in Architecture, Engineering & Construction Documents
    Das, Supriya
    Banerjee, Purnendu
    Seraogi, Bhagesh
    Majumder, Himadri
    Roy, Rahul
    Mukkamala, Srinivas
    Chaudhuri, B. B.
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 546 - 551