History of the Tesseract OCR Engine: What Worked and What Didn't How to Build a World-Class OCR Engine in Less Than 20 Years

被引：19

作者：

Smith, Ray ^{[1
]}

机构：

[1] Google Inc, Mountain View, CA 94043 USA

来源：

DOCUMENT RECOGNITION AND RETRIEVAL XX | 2013年 / 8658卷

关键词：

OCR; Machine learning; Structural pattern recognition; Multi-language OCR; RECOGNITION; MODELS;

D O I：

10.1117/12.2010051

中图分类号：

O43 [光学];

学科分类号：

070207 ; 0803 ;

摘要：

This paper describes the development history of the Tesseract OCR engine, and compares the methods to general changes in the field over a similar time period. Emphasis is placed on the lessons learned with the goal of providing a primer for those interested in OCR research.

引用

页数：12