Machine-printed Japanese document recognition

被引:13
|
作者
Srihari, SN [1 ]
Hong, T [1 ]
Srikantan, G [1 ]
机构
[1] HUGHES INFORMAT TECHNOL CORP, MARLBORO, MD 20774 USA
关键词
machine-printed document recognition; Japanese OCR; Japanese character image database;
D O I
10.1016/S0031-3203(96)00168-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cherry Blossom is a general-purpose Japanese document recognition system developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text regions into text lines and further into characters, and recognizes character images as characters in JIS code. Two feature sets, the Local Stroke Direction feature and the Gradient, Structural, and Concavity feature, are used for character classification. Two classification methods, the nearest neighbor classifier and the minimum error subspace method, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image database developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted from diverse document images. Results of our system on this dataset are also presented. (C) 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 50 条
  • [21] Farsi/Arabic Handwritten from Machine-Printed Words Discrimination
    Mozaffari, Saeed
    Bahar, Parnia
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 698 - 703
  • [22] Handwritten and Machine-Printed Text Discrimination Using a Template Matching Approach
    Emambakhsh, Mehryar
    He, Yulan
    Nabney, Ian
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 399 - 404
  • [23] Identification of Machine-printed and Handwritten Words in Arabic and Latin Scripts
    Saidani, A.
    Echi, A. Kacem
    Belaid, A.
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 798 - 802
  • [24] A recognition method of machine-printed monetary amounts based on the two-dimensional segmentation and the bottom-up parsing
    Koga, M
    Mine, R
    Sako, H
    Fujisawa, H
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 968 - 971
  • [25] An efficient word segmentation technique for historical and degraded machine-printed documents
    Makridis, M.
    Nikolaou, N.
    Gatos, B.
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 178 - +
  • [26] A binary-tree-based OCR technique for machine-printed characters
    Gatos, B
    Papamarkos, N
    Chamzas, C
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 1997, 10 (04) : 403 - 412
  • [27] Pairwise coupling for machine recognition of hand-printed Japanese characters
    Roth, V
    Tsuda, K
    2001 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2001, : 1120 - 1125
  • [28] TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text
    Medhat, Fady
    Mohammadi, Mahnaz
    Jaf, Sardar
    Willcocks, Chris G.
    Breckon, Toby P.
    Matthews, Peter
    McGough, Andrew Stephen
    Theodoropoulos, Georgios
    Obara, Boguslaw
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2986 - 2994
  • [29] Offline OCR System for Machine-Printed Turkish Using Template Matching
    Ahmed, Dena Rafaa
    Nordin, Md Jan
    MATERIAL AND MANUFACTURING TECHNOLOGY II, PTS 1 AND 2, 2012, 341-342 : 565 - +
  • [30] Retrieval of machine-printed Latin documents through Word Shape Coding
    Lu, Shijian
    Tan, Chew Lim
    PATTERN RECOGNITION, 2008, 41 (05) : 1799 - 1809